THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
CONTRIBUTORS TO THIS V O L U M E
...
98 downloads
1298 Views
20MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
CONTRIBUTORS TO THIS V O L U M E
Harley A . Bernbach
Richard 8.Bogartz Kenneth R. Laughery Marvin Levine Michael I . Posner Leo Postman Allan R . Wagner
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
EDITEDBY GORDON H. BOWER STANFORD UNIVERSITY, STANFORD, CALIFORNIA AND
JANET TAYLOR SPENCE
UNIVERSITY OF TEXAS,AUSTIN,TEXAS
Volume 3
1969 ACADEMIC PRESS
New York
0
London
COPYRIGHT
0 1969, BY ACADEMIC PRESS, INC.
ALL RIGHTS RESERVED NO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM, BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANY OTHER MEANS, WITHOUT WRITTEN PERMISSION FROM THE PUBLISHERS.
ACADEMIC PRESS, INC. 111 Fifth Avenue, New York, New York 10003
United K i n g d o m E d i t i o n published by ACADEMIC PRESS, INC. (LONDON) LTD. Berkeley Square House, London W1X 6BA
LIBRARY OF CONGRESSCATALOG CARD NUMBER:66-30104
PRINTED I N THE UNITED STATES O F AMERICA
LIST OF CONTRIBUTORS Harley A. Bernbach, Cornell University, Ithaca, New York Richard S. Bogartz, University of Illinois, Urbana, Illinois Kenneth R. Laughery, State University of New York at Buffalo, Buffalo, New York Marvin Levine, State University of New York at Stony Brook, Stony Brook, New York Michael I. Posner, University of Oregon, Eugene, Oregon Leo Postman, University of California, Berkeley, California Allan R. Wagner, Yale University, New Haven, Connecticut
V
This Page Intentionally Left Blank
PREFACE This is the third volume of the annual serial publication “The Psychology of Learning and Motivation,” and its format is similar to the first two volumes. Ideally these volumes are to provide a forum in which a contributor can pull together the several facets of his research around a single problem or theory, providing thereby a sustained and integrated characterization of his recent research and its import. The contributions typically involve the reporting of new experimental results along with selective reviewing of results previously scattered throughout the professional journals. I n the midst of the scientific knowledge explosion, such collections of summary papers, in which prominent investigators provide an overview of the present state of their research, can serve a unique and valuable function not presently served by current journals or by annual reviews of an entire field. The aim of science, after all, is comprehension, and our comprehension of the thrust of another man’s research is materially improved if he is permitted “elbow room” to expand upon the context of his experimental efforts-the background, intuitive hunches, and speculative relations between his data and other phenomena of the science. I n this regard, contributors to these volumes are allowed considerably more leeway and freedom than in current edited journals to tell their research stories as they wish, emphasizing what they believe to be exciting and significant. The editors have hoped to be eclectic in taste and range of coverage of the various subdivisions and topics within the areas spanned by learning and motivation. The main criteria for inviting a contribution is that the editors felt that the investigator had something new and interesting to write about. To this end, contributions have been invited from a number of eminent investigators spanning a broadly diverse range of topics, with the date of the contributions to be chosen by the investigator. The vicissitudes of acceptances and self-selected deadlines by contributors produce some unintentional nonrandomness in coverage within particular volumes; thus, Volume 3 is more heavily weighted on the side of human learning and information processing, while the projected Volume 4 will equalize matters by having more articles on motivation, conditioning, and animal learning. The range of topics in these volumes will Vii
viii
Preface
vary about as much as the range in random samples of five to seven articles from any issue of current experimental and theoretical journals in psychology.
GORDONH. BOWER JANET T. SPENCE September 1969
CONTENTS
................................................
v
..........................................................
vii
Contents of Previous Volumes ........................................
... xm
List of Contributors Preface
STIMULUS SELECTION AND A “MODIFIED CONTINUITY THEORY” Allan R . Wagner
. .
I Introduction ................................................ I1. The Research Strategy ....................................... I11 Cue Validity and Stimulus Selection ............................ IV . Theoretical Alternatives ...................................... V. An Experimental Evaluation of Modified Continuity Theory ....... VI . Concluding Comments ....................................... References .................................................
1 2 4 25 34 38 40
ABSTRACTION AND THE PROCESS OF RECOGNITION Michacl I . Posner
. . . .
I Introduction ................................................ I1 Stimulus Examination ....................................... I11 Past Experience ............................................. IV Visual Representation in Memory ............................... V. Separating the Visual and Name Codes of Prior Stimulation ....... VI . Summary and Conclusions .................................... References .................................................
44 47 56 74 84 94 96
NEO-NONCONTINUITY THEORY Marvin Levine
. . .
I Introduction ................................................ I1 Probingfor H s .............................................. I11. The Dynamics of H Testing ................................... IV Discussion .................................................. V . Appendix ................................................... References ................................................. ix
101 103 105 122 127 132
Contents
X
COMPUTER SIMULATION OF SHORT-TERM MEMORY: A COMPONENT-DECAY MODEL Kenneth R . Laughery
.
................................................
I Introduction I1 The Model-An Overview ..................................... I11 The Model-A Detailed Description ............................ TX A Sample Simulation ........................................ V Some Simulation Results ..................................... VI Discussion and Conclusions ................................... References
. . . . .
.................................................
135 138 139 174 182 188 197
REPLICATION PROCESSES IN HUMAN MEMORY AND LEARNING Harley A . Bernbach
. Introduction ................................................ . Bwic Properties of the Theory ................................. . Serial-Position Effects in Short-Term Memory ................... . Some Other Short-Term Memory Tasks ......................... . Repeated Presentations and Learning .......................... . Some Evidence for Rehearsal Processes ......................... . Concluding Remarks ......................................... References .................................................
I I1 I11 IV V VI VII
201 202 206 215 223 231 236 237
EXPERIMENTAL ANALYSIS OF LEARNING TO LEARN Leo Postman
. . . .
................................................
I Introduction I1 The Role of Warm.Up ........................................ I11 Two-Stage Analysis of Nonspecific Transfer ..................... N . Whole versus Part Learning .................................. V Acquisition of Transfer Skills .................................. VI The Effects of Practice on Recall ............................... VII Conclusions References
. .
................................................. .................................................
241 242 256 263 273 285 296 296
SHORT-TERM MEMORY IN BINARY PREDICTION BY CHILDREN:
SOME STOCHASTIC INFORMATION PROCESSING MODELS Richard S. Bogartz I. Single Alternation ........................................... I1. A Model for Single Alternation ................................ 111. Data ...................................................... IV. Extension to the Effects of Intertrial Interval Duration ...........
300 312 329 34 1
Contents
. Extension t o Interpolated Events .............................. . Extension to Markov Event Sequences ......................... . Noncontingent Event Sequences ............................... . Conclusions and Directions ................................... References .................................................
xi
V VI VII VIII
356 363 373 386 389
..................................................... SubjeCtInde;z: .....................................................
393
AuthorIdx
398
This Page Intentionally Left Blank
CONTENTS OF PREVIOUS VOLUMES Volume 1 Partial Reinforcement Effects on Vigor and Persistence ABRAMAMSEL
A Sequential Hypothesis of Instrumental Learning E. J. CAPALDI Satiation and Curiosity HARRYFOWLER
A Multicomponent Theory of the Memory Trace GORDONBOWER Organization and Memory GEORGEMANDLER AUTHOR INDEX-SUBJECT
INDEX
Volume 2 Incentive Theory and Changes in Reward FRANK A. LOGAN Shift in Activity and the Concept of Persisting Tendency DAVID BIRCH Human Memory: A Proposed System and Its Control Processes R. C. ATKINSON AND R. M. SHIFFRIN Mediation and Conceptual Behavior HOWARD K. KENDLER AND TRACYS. KENDLER AUTHOR INDEX-SUBJECT
INDEX
...
This Page Intentionally Left Blank
THE PSYCHOLOGY OF LEARNING AND MOTIVATION Advances in Research and Theory
VOLUME 3
This Page Intentionally Left Blank
STIMULUS SELECTION AND ‘‘MODIFIED CONTINUITY THEORY’ ’’ Allan R. Wagner YALE UNIVERSITY NEW HAVEN. CONNECTICUT
I. Introduction ............................................. 11. The Research Strategy. ................................... 111. Cue Validity and Stimulus Selection.. ....................... A. The Basic Experiment. ................................ B. Confirming Data ...................................... C. An Empirical Extension.. .............................. I V. Theoretical Alternatives. .................................. A. AttentionalTheory .................................... B. Modified Continuity Theory.. ........................... V. An Experimental Evaluation of Modified Continuity Theory.. V I. Concluding Comments. ................................... References ..............................................
..
1 2 4 5 14 20 25 26 28 34 38 40
I. Introduction
It is rarely, if ever, the case in a learning situation that only a single descriptive feature of the environment offers information concerning the availability of reward or the occasions for reinforcement. It is generally possible to identify a number of “elements,” “dimensions,” or “attributes” of the situation, each of which has some degree of correlation with the signaled event, and to which it is known that the subject could be trained to respond discriminatively. A persistent question that has been associated with an uncommon degree of controversy (see, e.g., Goodrich, Ross, & Wagner, 1961; Mackintosh, 1965a; Trabasso & Bower, 1968) concerns the degree to whichSs make use of, or learn about, such multiple “cues.” For example, does S learn about each cue as though it were the only cue available, or does X “focus)’on only a portion, or on a single one, of the potential cues? There has rarely been any real issue over the fact that the availability of one cue in the environment may reduce the amount that S learns about the other available cues (e.g., Spence, 1936), and there is ample evidence (e.g., Hughes & North, 1959)that S may learn about more than a single cue at once. Thus, it is tempting to agree with Bruner, Matter, and Papanek (1955)that it is after all an empirical question to determine Preparation of this paper and the research reported were supported in part by National Science Foundation Grant GB-6534. 1
2
Allan R. Wagner
the range of cues responded to in a given situation. However, data concerning the degree of focusing or stimulus selection that characterizes common learning situations have been viewed as crucial for determining whether “attention” or some “attention-like” construct is to be awarded a prominent role in an adequate theory of learning (e.g., Lashley, 1942; Mackintosh, 1965a). It can be argued (see Wagner, 1969a) that few experimental designs have been particularly relevant for judging the usefulness of attentional theory because of the lack of experimental control over the schedule of stimulation and reinforcement which 8s receive. There is little question, however, that there has been a recent wave of sympathy for attentional theory (e.g., Sutherland, 1964; Trabasso & Bower, 1968; Zeaman & House, 1963; Lovejoy, 1965; Mackintosh, 1965a), based in part on the apparent pervasiveness of stimulus-selection effects. I n the present chapter, data will be presented which support the contention that stimulus selection is a potent effect, even in experimental situations that allow considerable control over the schedules of stimulation and reinforcement to which S is exposed. Yet, it will be questioned whether or not such data demand an “attentional” interpretation. A theoretical alternative suggested by Kamin (1968, 1969) and elaborated by Wagner (1969a) as a “modified continuity theory” will be discussed, and new data will be presented for which this approach appears to offer a relatively unique account. Although at this time the theory must be regarded as especially tentative and incomplete, opportunity will be taken to suggest its potential usefulness in interpreting several phenomena that have been in search of theoretical integration.
II. The Research Strategy There are a number of common experimental procedures for evaluating the degree of focusing or stimulus selection. Perhaps the most obvious involves a comparison of the degree of learning exhibited to some stimulus when it is the only relevant cue available, as compared to a condition in which there are additional relevant cues. The degree to which the presence of additional relevant cues reduces the apparent amount learned is taken to indicate the degree of stimulus selection (e.g.. Sutherland & Andelman, 1967; Lovejoy & Russell, 1967). Another procedure involves holding constant the number of relevant cues, but varying in some manner the saliency of the redundant cues. It is thereby possible to evaluate the decrement in behavioral control acquired by a cue as a function of the increasing potency of the alternative cues (e.g., Lawrence, 1950 ; Mackintosh, 196513). It is also possible to examine the correlations between the amounts learned about several redundant cues: If there is intersubject variability in the learning rates with respect to one such
Stimulus Selection and a “Modified Continuity Theory”
3
cue, will there be concomitant variability in the learning with respect to other available cues such that it appears that the more S learns about one cue the less it learns about others (e.g., Sutherland & Holgate, 1966; Sutherland, 1966)? The present research strategy follows in this basic line, but the essential question has been whether or not the learning that occurs with respect to one cue is dependent upon the validity of other concomitant cues. The problem to which the research has been addressed can best be elaborated at this point by considering a simple experimental paradigm. Suppose a cue (X) is sometimes followed by a reinforcing stimulus (US) in a classical conditioning situation, or by the availability of reward, consequent to some response, in an instrumental learning situation. Also suppose, however, that X is never present alone but always in compound with a second cue (A) and furthermore, that, in the general case, A and the reinforcing event are each free to occur with some frequency in the absence of X. Given such a training situation, it is possible to construct one contingency matrix describing the frequencies with which reinforcement and nonreinforcement occur in relationship to the presence and absence of A, and a second matrix describing the frequencies with which reinforcement and nonreinforcement occur in relationship to the presence and absence of X. It is then possible to ask which of the frequencies contained in the two matrixes are important in determining the degree of learning that occurs with respect to X, i.e., its “associative strength” or “signal value.” Nonselective treatments of associative learning (e.g., Hull, 1943, 1950) have conventionally emphasized the importance of the frequencies contained in only one row of the X matrix, i.e., the number of occasions on which X has been experienced and then “paired” with reinforcement, and the number of occasions on which X has been experienced and then not “paired” with reinforcement. These, respective, conditioning and extinction experiences, are assumed to determine the degree to which X will come to be responded to as though it signaled reinforcement. Other theorists (e.g., Tolman and Brunswick, 1935; Rescorla, 1966) would challenge this emphasis with the suggestion that all of the frequencies in the X matrix may be important, including the number of occasions on which reinforcement and nonreinforcement have been experienced in the absence of X. Rescorla (1968) has, in fact, shown that as the probability of reinforcement in the absence of a cue approaches the probability of reinforcement in the presence of that cue, the apparent associative strength of the cue becomes negligible, regardless of the frequency of reinforcement in the presence of the cue. Such evidence suggests that the signal value of X may depend upon the correlation between
4
Allan R. Wagner
the presence and absence of X and the presence and absence of reinforcement, i.e., upon the validity of X in predicting reinforcement. The research to be described will comment on the above question, but the major issue in the present context involves whether or not some portion of the A matrix may be of significance, independent of the X matrix, in determining the signal value of X. Since X occurs only in compound with A, will S’s experiences with respect to the presence and absence of A influence what is learned about X ? If there were some stimulus-selection process so that X competed with A for the behavioral effects resulting from reinforcement and nonreinforcement, such might be expected. Arelevant attentional view has been voiced by Sutherland (e.g., 1964) and Mackintosh (e.g., 1965a). According t o this position, the acquisition of behavioral control is assumed t o be mediated by a stimulus-selection mechanism (stimulus analyzer), such that if an appropriate analyzer is not “switched in,” a cue will be ineffective in acquiring new associative tendencies as a result of reinforcement or nonreinforcement. The likelihood that an analyzer appropriate to a cue will be switched in is assumed to depend upon the validity of that cue (“on differences [in the outputs of the stimulus analyzer] being consistently associated with the subsequent occurrence of events of importance to the animal.” Sutherland, 1964, p. 57). It is assumed also, however, to depend inversely upon the validity of other concomitant cues, since the subject is assumed to be capable of attending simultaneously to only a limited number of cues, i.e., of having only a limited number of analyzers switched in. Thus, in the case of the paradigmatic example, it should be expected that all of the frequencies in the A and X matrixes would be important in determining the signal value of X : The degree to which a subject will attend to, and hence learn about, the X cue should depend upon the relative correlations of A and X with the occurrence of reinforcement.
111. Cue Validity and Stimulus Selection The following section describes a series of interrelated experiments designed to evaluate the proposition that the signal value of a cue depends upon the relative validity of other available cues. The studies, as will be seen, employed several different training situations, including both classical and instrumental conditioning. Each environment was selected by virtue of its relatively frequent experimental usage, and by virtue of allowing reasonably good experimental control over the conditions of stimulation and reinforcement, as compared, for example, with a selective learning situation. Such control, which is critical to the problem under investigation, was no doubt better afforded in those studies that employed classical conditioning than in those that
Stimulus Selection and a “Modified Continuity Theory”
5
involved instrumental learning. Yet, it was deemed advantageous to employ some variety of common learning situations. If there were an apparent selectivity in whether a cue would come to be reacted to as a signal, depending on the validity of other available cues, it would be important t o determine whether such effects were rather general, or were peculiar to the idiosyncratic characteristics of one experimental situation or to the choice of one referent behavior.
A. THEBASICEXPERIMENT The first experiments to be described closely adhered to the paradigm presented in Section 11. I n the initial study (Wagner, 1969b),the signaled event was the availability of food reward in an operant conditioning situation. Thirty-six rats were shaped to bar-press on a Variable Interval 20-second reward schedule. For all Ss, the VI reward schedule was then arranged to be in effect only during irregularly occurring 1-minute intervals during daily 2-hour training sessions. Twenty such occasions in each session were signaled by a simultaneous compound (AX) consisting of a 2500-Hz tone (A) and the 3/second flashing of two relatively bright chamber lamps ( X ) . The Ss were divided into three treatment groups, distinguished b j whether or not the A cue was ever presented alone, and by whether or not reward was available if A were presented alone. For Group I, A was presented alone daily on 20 1-minute occasions, randomly interspersed in sequence with the compound occasions, and reward was then available on the same VI schedule as during the compound. For Group 11,A was never presented alone. Finally, for Group 111,A was presented alone on the same scheduled occasions as in Group I, but reward was not then available. At all other times, A and X were absent and reward was unavailable for all Ss. Training was continued for 25 sessions, but testing was begun on Day 7 , by which time bar-pressing performance of the three groups had stabilized. Testing involved the nonreinforced presentation of the X element alone, once every second day in place of a normally occurring compound trial. There were thus 10 occasions, distributed over the last 19 %hour sessions, in which the light cue, which was otherwise presented only in the AX compound for all groups, was presented alone for 1 minute. The principle question was whether or not responding to X alone would differ in the three groups. If the signal value of a cue treated such as X depends only upon the number of occasions on which reward does or does not occur in the presence of the cue, there should have been no differences. If, however, those events that occur in the absence of X also influence the signal value of X, the groups might have been expected to
Allan R. Wagner
6
differ in their test trial responding to X. More specifically, if the signal value of X is dependent on the relative validities of A and X in the manner suggested by Sutherland (1964), the degree of responding to X should have been ordered Group I < Group I1 < Group 111. Figure 1 summarizes, for each of the three training conditions, the probability of reinforcement in the presence of the AX compound, in the presence of the A cue when occurring alone, and in the absence of both A and X [represented as “not (A or X)”]. Also indicated are the separate correlations of A and X with reinforcement. As may be seen, for Group I, A was perfectly valid, but the occurrence of reward in the presence of A
PROBABILITY OF REINFORCEMENT
CORRELATION WITH REINFORCEMENT
IR!El AX
A
-
AUX
m A
X
II
m
1.0
FIG.1. Representation of three training conditions, I, 11, and 111, in which an isolable stimulus element X is presented only in compound with a second element A. The elements are arranged to have different correlations with reinforcement in the several conditions by virtue of whether or not A is presented alone, and whether or not A alone is reinforced. Reinforcement never occurs in the absence of an element (A U X).
alone, i e . , in the absence of X, depreciated the validity of X. Thus, X was experienced in Group I in compound with a cue A which was more valid than itself. In Group 11, A and X were equally valid. For Group 111, in which A was presented alone but nonreinforced, the validity of A was depreciated, such that X was experienced in compound with a cue of lesser validity. Thus, if the signal value of X is dependent on the validity of X relative to that of the concomitant A cue, the degree of responding to X alone should increase from Condition I through Condition 111. Figure 2 describes the mean bar-pressing performance of the three groups in the presence of the various stimuli on the test days. The major comparison of interest involves the response rate to the X test element in the three groups. As may be seen in Fig. 2, this rate was least in Group I, next highest in Group 11, and highest in Group 111. The difference in
Stimulus Selection and a “Modified Continuity Theory”
7
rate of X-responding between adjacent groups was statistically reliable at p < .05, as evaluated by the Mann-Whitney U test, and the difference between Groups I and I11 was reliable a t p < .002. There are other comparisons that might be made in assessing the signal value of X in the three groups, and for that reason response rate to the AX compound and to the A element, when available, as well as in the absence of A and X (obtained during the 1-minute periods just preceding each stimulus occasion) have been included in Fig. 2. Quite simply, however, there are no comparisons that would modify the conclusion that X was reacted to as a relatively poor signal for the availability of reward in Group I, as a better signal in Group 11, and as the best signal in Group GROUPS
I
II
m
STIMULI
FIG.2. Mean number of bar-presses to elements and compounds for three groups following discriminated bar-pressing treatments corresponding to those described in Fig. 1. The major comparison of interest involves the X element, which was presented alone only during testing.
111.I n fact, it should be noted that the low rate of responding to X in Group 1 occurred in spite of the fact that this condition promoted significantly higher levels of responding to the compound as compared to the other two conditions, and somewhat, although not significantly, higher responding in the absence of A and X. These findings support the notion that, on reinforced AX trials, A and X must necessarily compete for the acquisition of associative strength, perhaps by differentially gaining S’s “attention,” and that the outcome
Allan R.Wagner
8
of this competition depends upon the relative correlations of A and X with reinforcement. Although this view allows for the ordering of the signal value of X that was observed in all three groups of the last experiment, there is an alternative that must be considered. Mindful of the results thus far, it will again be useful to refer to Fig. 1. Note that in Conditions I1 and 111, X had the same validity ;thus, it is reasonable to attribute the differences obtained between these two conditions to the different validities of A relative to the invariant X validity. Condition I differs from the other two conditions, however, not only in that the validity of X is here less
PROeABlLlTY OF REINFORCEMENT
CORRELATION WITH REINFORCEMENT
m
__
AX
A
AUX
A
X
FIG.3. Representation of three training conditions, I and I1 being identical to those similarly identified in Fig. 1, and 1/11bearing relationships to each of the other two or a result of there being some probability ( p ) of reinforcement in the absence of any cue.
than that of A, but also in that the absolute validity of X is lower than in Conditions I1 and 111. Perhaps it is only the latter fact that reduced the signal value of X in this condition. One way to evaluate this possibility is to institute a comparison condition that is similar to Condition I in the sense that reinforcements occur in the absence of the AX compound, but in which such reinforcements are not signaled by A. In this condition, the validity of X would be reduced as it is in Condition I, but the validity of A would be equally reduced. The A cue would not then be in any better position to compete with X than in Condition 11,in which the validities of the two cues are likewise identical. Figure 3 schematizes this treatment condition (I/II),along with Conditions I and 11, in a manner similar to Fig. 1. As already indicated, Condition 1/11is similar t o Condition I1 in that the validities of A and X are equal. If the number of reinforcements in the absence of the compound, and hence in the absence of X , is made the same in Conditions
Stimulus Selection and a “Modified Continuity Theory”
9
1/11and I, then Condition 1/11is also similar to Condition I in that the validities of X are equal in the two conditions. If in the previous study it was of importance, in diminishing the signal value of X in Treatment I, that A was of greater validity than X, then the signal value of X should also be less in Treatment I than in Treatment 1/11. An experiment conducted in collaboration with Miss Carol Dweck was designed to evaluate the three treatments schematized in Fig. 3. The necessity of administering “unsignaled” reinforcements under the experimenter’s control in Condition 1/11made it desirable t o employ a classical conditioning situation in preference to an instrumental learning situation. I n this study a Conditioned Emotional Response (CER) procedure was used. Seventy-two rats initially received eight 2-hour sessions of bar-press training on a VI schedule. The Ss were then divided into three groups of 24 and given six 2-hour sessions of CER training in the same chamber, but with access to the bar and food cup prohibited by a Plexiglas panel. I n each CER training session, all Ss received eight .5-second 1-mA shocks, each preceded by a 30-second AX compound CS consisting of the same light and tone as described in the previous experiment. For half of the Ss in each condition, the light was designated as the X element, and for the remainder the tone was so designated. The treatments differed in that Group I received an additional eight shocks daily, each preceded by a 30-second A element and interspersed in an irregular fashion among the compound trials ; Group 1/11received the same eight additional shocks at the same times in the session as Group I, but the shocks were unsignaled; and Group I1 received no such shocks in the absence of the AX compound. Following CER training, all Ss received one session in which access to the bar and reward was again permitted, and finally a test session in which the AX compound and the X element were each presented twice while S was allowed to bar-press. During the test session, neither the compound nor the X trials were reinforced. Figure 4 summarizes the test session effectiveness of both the AX compound and the X element for each of the three groups. The data are presented in the form of “percentage suppression,” which is the percentage by which s’s bar-pressing rate, estimated from the presignal periods, was reduced during the presentation of the signal [(pre-CS rate - CS rate)/pre-CS rate]. Thus, zero suppression indicates the same number of bar-presses during the 30-second CS period as during the preceding 30second A U X period, while 100% suppression indicates a complete cessation of bar-pressing during the CS period. As in the previous study, X was less effective in Group I than in Group 11.Of major interest here, however, is the fact that X was also less effective
10
Allan R.Wagner
in Group I than in Group 1/11.The difference in mean suppression between Group I as compared to Group 1/11and Group I1 were both statistically reliable with p < .01 ( t = 2.88, and t = 3.80, respectively). Although Group 1/11evidenced somewhat less suppression to X than did Group 11,this difference did not approach statistical reliability. These data suggest that, under the experimental conditions employed, the major determinant of the decrement in signal value associated with GROUPS
I I/II n I
AX
@
TEST STIMULI
FIG.4. Mean percentage suppression to the AX compound and the X element alone for three groups following CER treatments corresponding t o those described in Fig. 3.
Condition I as compared with Condition I1 is not the difference in absolute validities of the X element, but rather that in Condition I the X element was arranged to be less valid than A. When both A and X were reduced in validity in Condition 1/11,there was little decrement in the effectiveness of X. When X, but not A, was reduced in validity in Condition I, there was a marked reduction in the effectiveness of X. Before leaving these data, two potential concerns should be addressed, one involving the analysis of the findings, the other involving the relationship of the findings to other available data. First, the data were presented in terms of a suppression score, comparing responding in the presence and absence of the CS. Such scoring is conventional in the CER situation, and hence has been followed here. However, this method of analysis must be considered suspect in the present experiment since in Group 1/11shocks were administered in the
Stimulus Selection and a “Modified Continuity Theory”
11
absence of any CS. Perhaps the pre-CS pressing rate of Group 1/11 was sufficiently different from that of the other two groups so as to make a comparison of percentage suppression among the groups hazardous to interpret. I n fact, the groups did not differ notably in pre-CS response rates, and simple comparisons of the mean number of responses during the X stimulus leads to the same conclusions as does comparison of percentage suppression: Group I responded significantly more (mean = 7.8 responses), i.e., suppressed less, than did either Group 1/11 (5.1 responses) or Group I1 (4.7 responses), while the latter two groups again did not differ significantly. Second, the comparison of Groups 1/11 and I1 did not offer much evidence for the kind of effect reported by Rescorla (e.g., 1968) in which the signal value of a cue was shown to be reduced by the presentation of “free” shocks within the experimental environment. It should simply be recognized that, in comparison with the procedures followed by Rescorla (e.g., 1968)the present experiment did not entail a very powerful manipulation to produce such effects : X was always followed by reinforcement, and the density of reinforcement in the absence of X was comparatively small. This design characteristic, of course, allowed the major point of the study : The deficit in the signal value of X in Group I as compared to Group I1 must be understood primarily in terms of the relatively higher validity of the competing A cue rather than in terms of a simple reduction in the validity of X. Two further, unpublished, evaluations of the treatments thus far described have been conducted by Mrs. Maria Saavedra in the context of eyelid conditioning with the rabbit. The data generally replicated the above results, although there was somewhat greater evidence for a decremental effect of reducing CS validity via unsignaled USs. I n the first experiment, 48 male New Zealand white rabbits were divided into three groups and received treatments corresponding to Conditions I, 1/11,and I1 as described in Fig. 3. Each AX-conditioning trial involved a 1100-mseccompound CS consisting of 20/second light flashes generated by a strobe lamp, directed so as to reflect relatively homogeneously from the surfaces of the experimental chamber surrounding the restrained subject, and a 3160-Hz tone, 10 dB above the intertrial noise level. Each compound CS overlapped and terminated with a 100msec, 4.5-mA-shock US delivered through two stainless steel sutures, one above and the other below the orbit of the eye. All Ss received 32 such trials on Day 1 of training and 80 such trials on Day 2, with a mean interval of 4 minutes between compound trials. Group I1Ss received no other stimulation during training. Group I Ss received an additional 32 trials on Day 1 and 80 trials on Day 2 , irregularly interspersed among the AX trials, in which the light CS (A) was also presented alone and
Allan R. Wagner
12
reinforced. Group 1/11 received the same additional reinforcements as Group I but, in this case, unsignaled. Immediately following training, each S received 16 test trials with the tone CS (X)presented alone for the first time. In testing, the mean intertrial interval was 4 minutes, and all trials were reinforced. During training and testing, a conditioned response was identified as an anticipatory eyelid closure occurring during the first 1000 msec of the CS and was graphically recorded as a result of the turning of a micropotentiometer affixed t o Ss head and communicating with the eyelid via surgical thread. GROUPS
AX+
At
n
I/II
I
@
IL
0
AX t
4x t
0
STIMULI
FIG.5. Median percentage conditioned eyeblink responses to the AX compound and t o the A and X elements alone for three groups following classical conditioning treatments corresponding to those described in Fig. 3.
Figure 5 summarizes the test trial responding of the three groups to the X element. Also included for comparison are the percentages of conditioned responding to the AX compound and to the A element, when appropriate, during the immediately preceding 64-minute segment of training. As in the prior studies, there was less responding to X in Group I than in Group I1 and, as in the Dweck experiment, less responding to X in Group I than in Group 1/11. The observed difference between Group I and I1 was statistically reliable ( p < .05) according to a Mann-Whitney U test. However, although the level of responding of Group 1/11 was observed t o fall between that of Groups I and 11, it was not significantly different from that of either of the latter groups. In order to further evaluate the treatments of interest within the eyelid conditioning situation, Saavedra conducted a second experiment
Stimulus Selection and a “Modified Continuity Theory”
13
identical t o that just described with the exceptions that (1)an additional day of training, identical to Day 2 , was administered prior t o testing, and (2) a group corresponding t o Condition 111 was also included. I n the latter group, A was presented alone on the same occasions as in Group I, but was then nonreinforced. Thus, the experiment offered a comparison of all four treatment conditions described in Figs. 1 and 3. As in the previous study, there were 16 rabbits in each group. GROUPS
‘ a
III
II
I/II
I 90
Y
80
gz
70
0
8 Y F
50
60 50 40
g30
5
20
2
10
-
5 AX+
At
0
AX+
A-
@
STIMULI
FIG.6. Median percentage conditioned eyeblink responses to the AX compound and to the A and X elements alone for four groups following classical conditioning treatments corresponding to those described in Figs. 1 and 3.
Figure 6 summarizes the test trial responding to the X element in each of the four groups. For comparison, percentage-conditioned responding t o the AX compound and t o the A element, when appropriate, over the last 64-minute block of training, are also presented in a manner identical t o Fig. 5. As in each of the previous experiments in which similar comparisons were allowed, responding to X was least in Group I, greater in Group 1/11,still greater in Group 11,and greatest in Group 111. The Jonckheere (1954) test of the null hypothesis, against the ordered alternative that I < 1/11 < I1 < 111, yielded a probability of less than .002. The combined results from the four experiments produce considerable confidence that the signal value of a cue treated such as X is not dependent simply upon the number of reinforcements and nonreinforcements experienced in its presence, or simply upon the correlation between the
Allen R.Wagner
14
presence and absence of X, and the presence and absence of reinforcement. I n all four treatments, X had an identical number of reinforcements in its presence, yet the degree of responding to X alone was clearly different inthedifferent treatments. I n Treatment I as compared to Treatment 1/11,and in Treatment I1 as compared to Treatment 111, the validities of X were equal, yet, again, the degree of responding to X alone was markedly different. It appears that when X is experienced only in compound with A the signal value acquired by X is also dependent upon the relative validity of A in a manner consistent with stimulus-selection interpretations.
B. CONFIRMINGDATA A second series of studies from our laboratory has been reported in detail elsewhere (Wagner,Logan, Haberlandt, & Price, 1968).The studies are important in the present context in that they reaffirm the empirical generalization based on the preceding investigations that the signal value of a potential discriminative cue may depend upon the relative validity of other concomitant stimuli. They also add interesting data concerning the modification of the signal value of a cue when its validity is held constant but the validity of the concomitant stimuli is changed.
mi
PROBABILITY OF REINFORCEMENT
-
CORRELATED
UNCORRELATED
FIG.7. Representation of two experimental treatments in which an isolable stimulus X is presented only in compound with two values, A, and A2, of a second cue dimension. The probabilities of reinforcement are arranged such that the occurrence of Al is a better predictor of reinforcement than is X in the Correlated condition, but not in the Uncorrelated condition.
This work involved three variations of the same experimental design. I n each experiment, two stimulus compounds (A,X and A,X) were formed from a constant visual element (X) and either of two auditory elements (A, or A2).The two compounds were presented equally often, with half of the total presentations scheduled to be followed by reinforcement, and reinforcement was withheld in the absence of the compounds.
Stimulus Selection and a “Modified Continuity Theory”
15
The X component thus always had the same number of reinforcements and nonreinforcements in its presence and the same correlation with reinforcement (validity), its presence being associated with a 50% reinforcement schedule and its absence with no reinforcement. Two treatments within each experiment differed with respect to the validity of the auditory components of the compounds. I n the Correlated condition, the compound containing A, always announced reinforcement, whereas the compound containing A, was always nonreinforced. I n the Uncorrelated condition, the compounds containing A, and A, were each equally as often associated with reinforcement and nonreinforcement. Figure 7 describes the probability of reinforcement under the several stimulus conditions in the two treatments. As may be seen, in the Uncorrelated treatment the occurrence of A , or A, provided no better basis than the occurrence of X for predicting reinforcement, whereas in the Correlated treatment the occurrence of A, was a more (in fact, perfectly) valid predictor of reinforcement. As in the previous studies, the question was whether, upon testing, the observed signal value of X alone would be less in the Correlated group for which the more valid competing cue had been available. Experiment 1 employed rats in a discrete-trial bar-pressing situation, the compounds signaling the availability of food reward. Illumination of two ceiling lamps served as the X cue, and moderately intense tones of 1000 and 2500 Hz served as the A cues. For all subjects, a compound was scheduled t o occur approximately once per minute, with A , X and A,X each presented on 50% of the trials according to an irregular sequence. A compound was terminated by a bar-press, or in the absence of a response, after 5 seconds. Data were collected from eight subjects assigned initially to the Correlated training condition in which a bar-press was rewarded in the presence of A , X but not A,X, and from eight subjects assigned initially to the Uncorrelated condition in which reinforcement was obtainable in the presence of both A , X and A,X according to a random 50% schedule. After 40 minutes of the fifth or sixth 2-hour training session, the regular sequence of trials was interrupted every six trials for test presentations of either A , , A,, or X alone in balanced orders until eight trials with each were run. As in the case of the compounds, the element was terminated a t the end of 5 seconds, or by a response, but all test trials were nonreinforced. Following the testing described, half ofthe subjects in each group were continued on the same program they experienced during Stage I, while half were switched t o the opposite program. After eight 1.5-hour daily training sessions in Stage 11, a second test sequence was administered similar to that after Stage I .
16
Allan R. Wagner
Figure 8 depicts the mean percentage responses on X-alone trials during Stage I and Stage I1 test sessions for only those subjects in Experiment I that received both Correlated and Uncorrelated training in the two orders. Also presented for comparison are the similar percentages on A, and A, alone trials, as well as on compound training trials within the same sessions €or the two groups. During Stage I, Correlated training resulted in appreciably less Xresponding than did Uncorrelated training. Not only was the size of this difference substantial, but it was strikingly stable across subjects. There was no overlap in the distribution of number of X responses in the two total groups of eight subjects receiving the different treatments, an observation that is associated with a chance probability of less than .005 (Fisher's exact test).
TEST STIMULI
FIG.8. Mean percentage bar-press responses to elements and compounds in two separate test sessions for two groups of' rats receiving first Uncorrelated and then Correlated training, or first Correlated and then Uncorrelated training as described in Fig. 7. After Wagner et al. (1968) Experiment 1.
During Stage 11, those subjects in the two groups that were continued under their original training schedule (and whose data are not included in Fig. 8) continued to respond to X in a manner similar to that in Stage I. I n contrast, the subjects in the two shifted groups evidenced a change in X-responding in accordance with the schedule change. As may be seen in Fig. 8, those subjects shifted from Correlated to Uncorrelated training greatly increased their number of responses to X, while subjects shifted
Stimulus Selection and a “Modified Continuity Theory”
17
from Uncorrelated to Correlated training sizeably decreased their number of responses to X. Of the eight shifted subjects for whom X scores were available following both Correlated and Uncorrelated training, all eight gave fewer X responses following Correlated training than following Uncorrelated training ( p = .008, sign test) regardless of the order of experience with the two schedules. Experiment 2 made use of a CER procedure in which the compounds signaled the occurrence of unavoidable electric shock. One per second flashes, generated by an overhead strobe lamp, served as the X cue and moderately intense 4000- and 400-Hz tones served as the A cues. STAGE I ~
-
STAGE I[
~~~
UNCORRELATED
100
5
~~~~
~
~~
CORRELATED
80 60
W
$40
2
20
W
0
2 z
8
100
(r
w
a 80
5 Y
60 40
20
0
A,
A,
AIX
AzX
@
A1
Ap
AIX
AX ,
TEST STIMULI
FIG. 9. Mean percentage suppression to elements and compounds in two separate test sessions for two groups of rats receiving first Uncorrelated and then Correlated, or first Correlated and then Uncorrelated, C E R training as described in Fig. 7. After Wagner et a.?. (1968) Experiment 2.
Rats were first trained to bar-press on a VI 1-minute food reinforcement schedule and then subjected to CER training during daily 1.5-hour bar-pressing sessions in which the food schedule remained in effect. At first four, then eight, conditioning trials were included in each of the CER training sessions. A CER training trial consisted of a 3-minute presentation of an AX compound which on 50% of the occasions terminated with a .&second, 1-mA foot shock. In each block of four trials, two A,X and two A,X compounds were irregularly presented in the various possible orders :
18
Allan R. Wagner
Under Correlated training the two A,X trials were followed by shock and the two A,X trials were not, whereas under Uncorrelated training one A,X and one A,X trial were followed by shock. Each of the four subjects was sequentially subjected to both Correlated and Uncorrelated training, with two subjects receiving Correlated experience first, and two Uncorrelated experience first. The length of any phase varied between 12 and 44 days in the several subjects. During the last two training days in each phase, three test trials were included in which X was presented alone for 3-minute periods and nonreinforced, as well as one similar trial with each of A, and A,. Figure 9 depicts in a manner similar to the earlier CER results the mean percentage suppression on X-alone trials during each of the two test sessions for those subjects that received Correlated and Uncorrelated training in each of the two orders. Also presented are the similar mean percentages suppression on A,- and A,-alone trials, as well as on compound trials within the same sessions for the two groups. I n Stage I, it may be seen that Uncorrelated training was followed by nearly complete suppression to X alone. I n comparison, following Correlated training, there was no suppression observed to X alone. An interesting feature of Stage-I behavior is that under Uncorrelated training when the visual (X) and auditory (A) cues were equally valid, the auditory cues alone produced no suppression, and in view of the sizeable suppression to X, apparently contributed little, if a t all, to the suppression observed on compound trials. The X cue employed would, in this event, generally be described as considerably more “salient” than the available A cues. It still may be observed, however, that Correlated training during this stage, with A,X reinforced and A,X nonreinforced, eliminated any tendency for Y alone to produce suppression. Stage-I1traaing in which the treatments were reversed for all subjects was not continued for a sufficient duration either to equalize A, and A, suppression in the Uncorrelated condition, or to produce as marked a discrimination in the Correlated condition as had been obtained during Stage I . Nonetheless, both subjects shifted from Uncorrelated to Correlated training showed a concomitant decrease in suppression to X, while both subjects shifted from Correlated t o Uncorrelated training showed an increase in suppression to X. Considering the paired observations from all four subjects, the greater mean suppression to X following Uncorrelated as compared to Correlated training was highly significant ( t = 60.65, df = 3, p < .001). Experiment 3 involved classical conditioning of eyelid closure in the rabbit under general conditions similar to those of the Saavedra experiments described earlier. The X cue consisted of 20/second light flashes while A, and A, consisted of a train of 12/second clicks and a 2400-Hz
Stimulus Selection and a “Modified Continuity Theory”
19
tone. A conditioning trial involved a 600-msec presentation of an AX compound CS, which on 50% of the occasions overlapped and terminated with a 100-msec US. Sixty-four conditioning trials, half with A , X and half with A,X in an irregular order, were administered in each session with a mean intertrial interval of 60 seconds. Under Uncorrelated training, 500/, of both the A , X and A,X trials were reinforced, while under Correlated training all A , X but no A,X trials were reinforced. STAGE I
STAGE
II
100
z
80
$
60
0
E 40
3n
20
W
ra
f w
100
80 a
60
2
y
U
40
20
TEST STIMULI
FIG.10. Mean percentage conditioned eyeblink responses to elements and compounds in two separate test sessions for groups of rabbits receiving first Uncorrelated and then Correlated, or first Correlated and then Uncorrelated conditioning, as described in Fig. 7. After Wagner et al. (1968) Experiment 3.
Eight subjects were sequentially subjected to both Correlated and Uncorrelated training, the first stage consisting of 19 daily sessions and the second of 18 such sessions. Four subjects began with Correlated and four subjects with Uncorrelated training. During the last session, under each condition, eight nonreinforced test trials were included with each of X, A,, and A, alone in a balanced order. Figure 10 presents, in a manner similar t o Figs. 8 and 9, the mean percentage CRs on X-alone trials during Stage-I and Stage-I1 test sessions under the two treatment conditions, as well as the similar percentages on A,- and A,-alone test trials and on compound training trials within the same session. As may be seen in Fig. 10, there was less responding to X in Stage I following Correlated than following Uncorrelated training. There was no
20
Allan R.Wagner
overlap in the distributions of number of X responses under the two treatments and the lower mean responding associated with Correlated training was statistically significant ( t = 3.719, df = 6, p < .01). Similar to Experiment 2, A,- and A,-responding was less than X-responding in the Uncorrelated condition in Stage I. Yet, the absolute level of X-responding was diminished in the Correlated Condition, in which the auditory cues allowed a better prediction of US occurrence than did X. By the end of Stage I1 it may be seen that the subjects shifted from Correlated to Uncorrelated training were responding a t a high level to both compounds. Concomitant with this shift was an increase in responding to X but a decrease to A,. Although the subjects shifted from Uncorrelated to Correlated training had not, by the end of the 18postshift sessions, attained either a strong discrimination between the compounds, or an appreciable elevation in A,-responding, there was, nevertheless, a relatively small but consistent decrease in number of CRs to X alone. Ignoring the order of testing, each of the eight subjects gave a larger number of CRs to X following Uncorrelated than following Correlated training ( p = .008, sign test). The results of the above studies clearly indicate that a partially reinforced cue is much less likely to be an effective stimulus in isolation when it has been experienced in compounds containing elements more highly correlated with reinforcement than when it has been experienced in similar compounds that do not contain more valid elements. They also indicate that the effect is not restricted to a simple failure of relatively invalid cues to acquire signal value. A previously responded-to cue will also lose its signal value when its own correlation with reinforcement is unchanged, but other cues are made more valid. The potency of the reported effects may be judged from the fact that, in all three studies, every subject that was tested following both training conditions responded less t o X when the A cues were more highly correlated with reinforcement than when X and the A cues were equally valid. C. AN EMPIRICAL EXTENSION The preceding studies indicate that the signal value of a cue varies with the validity of those stimuli with which the cue occurs in compound during training. Granted this fact, it is likely that, under certain conditions, the signal value of a cue will also vary with the validity of available training stimuli other than those with which the cue occurs in compound. The plausibility of this more general possibility can presently be appreciated in terms of primitive attentional notions, although it is also implied by an alternative interpretation t o be discussed later. It will again be useful to consider a training example. Suppose S is trained with more than a single compound, e.g., AB and AX, each having
Stimulus Selection and a “Modified Continuity Theory”
21
a unique element (B and X ) but also a common element (A). On the basis of the work thus far, it would be expected that the validity of B would influence S’s learning about A, since a portion of the A presentations are in compound with B . It is possible that what is learned about A on the AB occasions may then, in turn, influence what S will learn about X on the AX occasions. By adopting an attentional language, it might be proposed that the validity of B will influence the degree to which S will attend to A, which may then influence the degree to which S will attend to A rather than X on the AX occasions. Thus, what signal value is acquired by a cue such as X may depend upon the validity of another cue, such as B, with which it never occurs in compound. PROBABILITY OF REINFORCEMENT 1
I
A
X
CORRELATED
0
UNCORRELATED
0
B
FIG.1 1 . Representation of two experimental treatments in which an isolahle stimulus X is presented only in compound with a second stimulus A, which is also presented in compound with two values, l3, and B,, of a third cue dimension. The Correlated and Uncorrelated treatments do not differ with respect t o the correlations of A or X with reinforcement, but ignoring the A X occasions, are identical to the treatments described in Fig. 7 , which could be expected t o influence the signal value of A.
An investigation conducted with the collaboration of Mr. Gerd Lehmann provides an instructive demonstration of the point under consideration. The experimental treatments in this study are summarized in Fig. 11 and again involved eyelid conditioning in the rabbit, employing the same general procedures as used by Saavedra and by Wagner et al. (1968).
As indicated, during training all Ss received experience with three separate compound CSs, AX, A B , , and AB,, each signalling an electricshock US with some probability. For all Ss, the AX compound occurred an equal number of times and was always reinforced. Likewise, for all Ss half of the AB trials involved AB, and half AB,, and half of the total AB trials were reinforced. Reinforcement never occurred in the absence of an AX or AB CS. The only difference between the two groups of 8s was in the way in which reinforcement was distributed between the AB,
22
Allan R. Wagner
and AB, trials: For Correlated Ss all of the AB, but none of the AB, trials were reinforced, whereas €or the Uncorrelated Ss 50% of the AB, and 50% of the AB, trials were reinforced. As a result of this design, A as well as X was paired with the same number of reinforcements and had an equivalent correlation with reinforcement in the two groups. The groups differed, however, in the validity of the B cues. If the A X occasions are ignored, the two treatments were identical to the Correlated and Uncorrelated treatments in the Wagner et al. experiments reviewed in Section 111,C, such that it was to be expected that A would compete less favorably with the B cues in the Correlated than in the Uncorrelated condition. If, as a consequence, A were also rendered less able to compete with the X cue on the A X occasions (as by being generally less attended to), X would be expected to acquire a greater signal value in the Correlated than in the Uncorrelated treatment. The specific experimental procedures involved 2 days of training, with 144 total trials on Day 1 and 648 total trials on Day 2, and a mean intertrial interval of 30 seconds. The experiment was run in two replications which differed only in the distribution of compound trials in the first session. I n Replication 1, all Day-1 trials were with the A X compound, whereas in Replication 2, each block of 72 trials on Day 1 included 24 trials with each of AX, AB,, and AB,. I n both replications, each Day-2 block of 72 trials included 24 AX, AB,, and AB, trials according to an irregular sequence. There were 32 subjects in Replication 1 and 24 in Replication 2, half of each receiving the Correlated, and half the Uncorrelated treatment. The X element during training was a 3160-Hz tone, while the A element was provided by the vibration of a commercial hand massager strapped in contact with the rabbit’s chest. For half the Ss in each experimental group, B, consisted of 20/second flashes o f a strobe lamp and B, consisted of the uninterrupted illumination of an incandescent lamp of approximately equal intensity. For the remaining Ss in each group, the nature of B, and B, was reversed. Immediately following the completion of training, all Xs received a series of test trials which included 12 reinforced trials with the AX compound and 12 reinforced trials with X alone. Figures 12 and 13 summarize for Replications 1 and 2, respectively, the percentage conditioned responses to the AX compound and to the X element alone during the test sequences. Also included for comparison are the percentage CRs to the AB, and the AB, compounds during the immediately preceding terminal block of 24 training trials with each. It may be noted that in both replications the Correlated and Uncorrelated groups attained similar high levels of responding to the A X compound.
Stimulus Selection and a “Modified Continuity Theory”
23
It also may be noted that the Correlated groups acquired a sizeable discrimination between AB, and AB,, whereas the Uncorrelated groups responded similarly t o these two compounds. The replications differed only in that the responding of the Uncorrelated group was consistently higher to AB, and AB, in Replication 2 than in Replication 1 in which training to these compounds was delayed until Day 2. Of primary interest is the observation that, in both replications, X alone was more frequently responded to in the Correlated than in the UNCORRELATED
CORRELATED
B
90
0
y
80
Q
t 70 D
60
$ a
50 40
0
g
30
a
5
W
2
20
10 A X + AB,+ ABz-
@
AX +
STIMULI
FIG.12. Mean percentage conditioned eyeblink responses to the AX, AB,, and AB, compounds, and to the X element alone in Replication 1 of an investigation employing classical conditioning treatments corresponding to those described in Fig. 11.
Uncorrelated group. This difference was statistically reliable in both Replications 1 and 2 (t = 2.30, df = 30,p < .05; t = 2.91, df = 22,p < .01, respectively). Under the conditions of this study, the signal value of a cue could be shown to vary with the validity of training stimuli with which the cue never occurred in compound. A question that remains involves the generality of the phenomenon. One might judge that the conditions that allowed this effect were rather special, apparently depending upon yet a third cue which consistently occurred in compound with the separate cues. However, in nearly any learning situation, it is possible to identify certain “incidental” or “irrelevant” cues which are common t o any of the stimulus occasions. These cues are presumably responsible, for example, for instances of cross-modal generalization in which the presentation of a cue which itself shares no apparent descriptive characteristics
24
Allan R. Wagner
with a training stimulus still occasions the trained response. Perhaps these incidental cues, since they occupy the same functional place held by the A cue in the Lehmann experiment, rather generally ensure that the signal value of a cue varies with the validity of other available cues. I n fact, one would expect there to be less likelihood of observing such effects when there is not an experimentally controlled and isolable common cue, as there was in the Lehmann experiment. This expectation CORRELATED
UNCORRELATED
I
AX+ A B , t AB2-
@
AX+
AB,+AB,+-
@
STIMULI
FIG.13. Mean percentage conditioned eyeblink responses to the AX, AI3,, and AB, compounds, and the X element alone in Replication 2 of an investigation employing classical conditioning treatments corresponding t o those described in Fig. 11.
is obvious from the point of view of minimizing experimental variability and maximizing the probability that the common cues will be salient and effective. However, notice also that if it is proposed that separate elementary stimuli can be viewed as different. “compounds” sharing a common cue, as a result of the ubiquitous occurrence of “incidental cues,” a test presentation of one of the elements alone is impossible unless the incidental cues in question are experimentally dispensable. Thus, in the absence of this characteristic, detection of the effect under discussion would require some indicant of the signal value of a cue other than the degree of responding to that cue alone. An investigation by Honig ( 1 969) was conceptually very similar to the Lehmann study, with the major exceptions that there was no experimentally manipulable common cue and that the signal value of the target cue was evaluated via a generalization test. It is, consequently, an informative companion to the Lehmann study.
Stimulus Selection and a “Modified Continuity Theory”
25
Honig employed pigeons in a key-pecking situation, and the signaled event was the availability of food reward on a VI schedule. Basically, two groups of pigeons were trained with three different stimuli illuminating the key on different trials. The key was either (1)white with vertical dark lines, (2) green and unlined, or (3) blue and unlined. The three stimuli can be identified, respectively, with the AX, AB,, and AB, compounds in the Lehmann study, insofar as the reward schedule was always in effect for both groups when the lined, white key was presented, but was in effect according to either a “Correlated” or “Uncorrelated” schedule when the green and blue keys were presented. That is, for one group reward was always available in the presence of the green key, but never in the presence of the blue key, while for the other group reward was available on 50% of both the green and blue occasions. To complete the identification with the features of the Lehmann study, it is proposed that the vertical lines corresponded to the X cue, green and blue hues t o the B, and B, cues, and that those incidental features of an illuminated key, common to all three stimulus occasions occupied the place of the A cue. Following training, Honig tested both groups with a white, lined key, but with the angular orientation of the line systematically varied. As the orientation of the line was changed from the vertical training value, the “Correlated” group evidenced a greater decrement in responding than did the “Uncorrelated” group. Apparently the line cue (and hence its angularity) was a more important determinant of responding in the “Correlated” group than in the “Uncorrelated” group, as a result of the differential training validity of the two hues with which the lines never occurred in compound. The correspondence between the Lehmann and Honig findings extends somewhat further. I n order to afford a more direct comparison of the two experiments, Lehmann also included in his test series a group of AX and X trials in which the frequency of the tone was changed from its training value of 3160 Hz t o either 1700 or 5600 Hz. I n both the case of the compound and the element trials, the generalization gradient produced by such variation was steeper in the Correlated than in the Uncorrelated groups (Wagner, 1968). It may occur rather generally that one relatively valid cue in an experimental situation can help to deny signal value to those cues that are common to any stimulus occasion, and can thereby exert a pervasive influence on the signal value that can be acquired by other independent cues in that same situation.
IV. Theoretical Alternatives Granted a controlled history of pairing and correlation with reinforcement, a cue will, or will not, come t o be reacted t o as a signal, depending
26
Allan R. Wagner
on the validity of other training cues. The data that have been presented indicate that available cues somehow compete for the acquisition of signal value, such that the presence of a relatively more valid cue may deprive a cue of acquiring or maintaining signal value. This phenomenon was demonstrated over an appreciable range of simple experimental environments including both classical and instrumental training. I n the absence of plausible arguments for the involvement of uncontrolled variation in the conditions of stimulus exposure or reinforcement (e.g., Spence, 1940; Wagner, 1969a))such evidence of stimulus selection clearly points t o an inadequacy of the simple conditioning-extinction approach of continuity theory (Spence, 1936; Hull, 1943, 1950). The conventional alternative to simple conditioning-extinction theory is an attentional interpretation (e.g., Lashley, 1942 ; Sutherland, 1964 ; Mackintosh, 1965a), and such an alternative has been kept in the foreground during the preceding discussions. Reasoning from this basis has a familiar and intuitive quality and allows a convenient rationalization of stimulus-selection effects. Yet, it would be a mistake to conclude that the occurrence of stimulus selection requires the adoption of an attention-like construct, or even that current theories embodying such a construct are adequate to account for the present data. I n fact, it should be recognized that prevailing attentional methods are ill-suited to the special conditions of classical conditioning, so that they may be judged to be either inapplicable to this situation from which the bulk of the present data was drawn, or as requiring profound auxiliary assumptions. Furthermore, it can be shown that a relatively modest amendment to the basic tenets of conditioning-extinction theory will allow for the kind of stimulus-selection effects that have been observed. A. ATTENTIONAL THEORY Attentional theory, particularly as voiced by Sutherland, seems to be in a comfortable position with respect to the present findings, since it has been proposed (e.g., Sutherland, 1964) that the likelihood of attending t o a cue will vary directly with the validity of that cue, and inversely with the validity of other concomitant cues. Thus, what is learned with respect to a cue should depend not only on the number of reinforcements and nonreinforcements in its presence (as stressed by conditioning-extinction theory) but also upon the correlation of that and other available cues with reinforcement. Sutherland’s proposal, however, simply invites a rephrasing of the critical theoretical questions. The relative validity of a cue is an attribute that the experimenter can assess over the course of training or over the course of some segment of training. The task of any theory of associative
Stimulus Selection and a “Modified Continuity Theory”
27
learning is to specify adequately the trial-by-trial or moment-by-moment experiences that lead the subject eventually to behave as though it had also been busily computing and comparing correlation coefficients. The subject must, somehow, acquire an appreciation of the empirical validities, and an adequate theory must indicate how this cumulatively takes place. It would be relatively unsatisfactory, for example, to assume that the signal value of a cue remains static until S has had some extended opportunity to assess the probability of reinforcement in the presence and absence of the several cues. The critical question for attentional theory then becomes: What are the trial-by-trial or moment-by-moment experiences that cause X to “attend to” a cue in a manner that is consistent with the cue’s relative correlation with reinforcement? As Trabasso and Bower (1968)have pointed out, all of the trial-by-trial attentional models that have been developed have a similar form. The stochastic models proposed by Zeaman and House (1963) and by Lovejoy (1965,1966),incorporating a mediating, attending response, as advocated by Sutherland (1964), provide excellent realizations of such theory, but the hypothesis-testing models of Levine (1966) and Restle (1962) are also special cases of this form of theory. These models were developed for selective learning and concept identification tasks and, appropriate to such situations, each functions according to the notion that S’s probability of reinforcement depends upon the cues to which it is attending. In an instrumental situation it can be assumed that ifX attends to a cue dimension and keys its response to the members of that dimension, it will experience a schedule of rewards that is limited by the empirical validity of the cues involved. Thus, for example, in a two-choice selective learning situation, i f S keys its response to a perfectly valid cue dimension in which one member is always associated with reward and the other with nonreward, it can receive reward on each trial, whereas if it keys its response to a less-valid dimension it will experience some smaller percentage of rewarded trials. Now if it is assumed (e.g., Zeaman & House, 1963; Lovejoy, 1965, 1966) that each reward increases, and each nonreward decreases, the subsequent likelihood of the attending response made on that trial, the strengths of the various attending responses will be modified in a trial-by-trial fashion so as ultimately to reflect the validity of the associated cues. I n classical conditioning, the experimenter controls the application of the rewarding or aversive events presented without regard to S’s behavior. Thus, X has traditionally been viewed in such a situation as having no influence on its reward schedule. To consistently follow this line of reasoning, one would also have to assume that S has no influence
28
Allan R. Wagner
on its reward schedule as a result of “attending to” different cues. I n an eyelid-conditioning situation, the presentation of the US is an aversive event and its absence can be described as relatively rewarding. However, regardless of the cues to which S might be assumed to attend, it would receive exactly the same schedule of presence and absence of the US. To this extent, Ss in this situation cannot be considered to be preferentially rewarded for attending to the most valid cues. In order to apply existing attentional models to a classical conditioning situation, so as to account for the influence of cue validity, it is necessary to assume that there is still some utility in attending to the relevant cues. Since the schedule of US exposures is itself invariant, i t must be proposed that there is some utility in receiving a warning as to when these exposures will occur, so that attending to those cues most highly correlated with US occurrence would still be most likely to be rewarded. Perhaps, for example, S can make a preparatory response which reduces (however modestly) the noxiousness of the aversive US, or is relatively rewarded by refraining from anticipatory behavior in the absence of the US. Then, as in any instrumental situation, S’s reward schedule would be limited by the validity of the cues to which this behavior was keyed, i.e., to which S attended. There is precedence (e.g., Perkins, 1955; Wagner, Thomas, & Norton, 1967) for assuming, within classical conditioning, the operation of utilitarian preparatory behaviors, of which the experimenter-selected “CR” may or may not be a part. There is also evidence (e.g., Seligman, 1968; Weiss, 1968) that given the same schedule of noxious stimulation, different physiological consequences of importance to S will occur, depending upon the validity of the available signals. Thus, it is hardly a radical assumption to propose that S is differentially rewarded for attending to valid, as compared to relatively invalid, cues in classical conditioning. Perhaps it would even be useful to conceptualize the source of differential reward as more general than overt preparatory behaviors, as, for example, involving the “confirmation of an expectancy.” Nonetheless, the attribution of a pervasive instrumental process to classical conditioning is a major theoretical assumption with systematic implications considerably beyond the issues of primary concern to attentional theory. However useful the assumption may be in the present context, it must necessarily be evaluated in terms of its full theoretical consequences.
B. MODIFIEDCONTINUITYTHEORY There are, of course, other potential solutions t o the theoretical problems posed by the present data. It may be possible to develop a trialby-trial attentional model for classical conditioning that does not involve an instrumental reward process to account for the influence of cue
Stimulus Selection and a “Modified Continuity Theory”
29
validity. It is also possible, however, without benefit of an attentional construct, to modify the conditioning-extinction tenets of continuity theory to allow for the facts of stimulus selection. At least it appears that one such approach (Wagner, 1969b‘ holds some promise of being capable of accounting in a trial-by-trial fashion for the influence of cue validity. The specific modification in theory that has been posed (Wagner, 196913) was suggested by certain observations of Kamin (1968, 1969). I n commenting on variations in the increment in signal value accruing to a component CS as a result of instances in which a compound containing that component had been paired with a US, Kamin noted: “. . . perhaps for an association to occur, it is necessary that the US ... is unpredicted . . .” (1969) ; “. . . whenever the animal does not ‘confidently expect’ the US, delivery of the US on such a trial is to some degree ‘surprising’ and the result is some increment in the association between the US and whatever CS is present during the trial.” (1968). The left-hand portion of Fig. 14 is drawn from the values (in Hubs) given in Table I of Hull’s Principles of Behavior (1943), expressing the
P
v)
x
I -
a OO X SIGNAL VALUE, PRIOR TO AX i
Ib 20
3b 40 5b 6b
70
so
$0
I I0
AX SIGNAL VALUE,PRIOR TO AX+
FIG.14. Representation of two assumptions concerning the size of the increments in signal value accruing t o a cue (X) as a consequence of a reinforced trial in which that cue is in compound with another cue (A). To the left is a common assumption, in which the size of the increments vary with the signal value of X . To the right is a modified assumption, in which the size of the increments vary with the signal value of A X .
assumed increments in habit strength (ASH,)resulting from a reinforced trial, depending on the level of habit strength (sHr)existing prior t o the reinforcement. For purposes of this example, Hull assumed that the increment per reinforcement was equal to one-tenth of the potential habit strength as yet unformed, but neither the specific value of this parameter nor the scale units are important here.
30
Allan R. Wagner
The ordinate and abscissa in Fig. 14 have been labeled so as to be consistent with the present terminology, and the function has been taken (as it can be) to be applicable to the condition in which stimulus X is reinforced in compound with stimulus A. This function still expresses the common Hullian assumption that the increments in associative strength per reinforcement are not constant, but decrease with the accumulation of associative strength. This notion may be as appropriately phrased in the present context by saying that increments in the signal value of an X cue resulting from an AX reinforcement are not constant, but are related t o the degree that reinforcement is already “predictable” or, conversely, is “surprising.” To this extent, the prior quotations from Kamin reflect a common Hullian proposition. However, it is crucial to note that the Hullian proposition as indicated t o the left in Fig. 14 is that the increment in the signal value of X consequent to an AX reinforcement is a function of the signal value of X, i.e., a function of the degree to which reinforcement is already “predictable” solely from the occurrence of X in the AX compound. The major modification that Kamin may be interpreted t o have suggested (1968, 1969) is that the increment in the signal value of X resulting from an AX reinforcement is a function of the signal value of A X , i.e., a function of the degree to which reinforcement is “predictable” o n the basis of the entire conjiguration of cues among which X i s included. This modification has been represented in the graph to the right in Fig. 14. It is also the case that the shift in terminology from “habit strength” to “signal value” and “degree of predictability” involves a departure from strict Hullian theory (e.g.,Hull, 1943; Spence, 1956), but in a way that is common t o other conditioning-extinction approaches (e.g., Estes & Burke, 1953). That is, “signal value” is meant to represent the total associative learning with respect to a cue, including the learning that occurs as a result of nonreinforced exposures to the cue. It is, therefore, perhaps more equivalent to the Hullian sE, than sH,, a t least insofar as it also reflects the decremental, “inhibitory” influence of nonreinforcement (e.g., Spence, 1956). A general statement of this position is of the following form: When a configuration of stimuli is followed by reinforcement there will be an increment in the signal value of a component (the component will be more reacted to as a signal for reinforcement), the amount of which will be a direct function of the degree t o which the combination of components is not already maximally behaved toward as a signal for reinforcement. Kamin (1968, 1969) has discussed certain advantages of such an assumption in dealing with his work on “blocking” and in dealing with the Pavlovian phenomenon of “overshadowing.” It is also capable of
Stimulus Selection and a “Modified Continuity Theory”
31
accounting for some of the effects of relative cue validity that have been reported in this paper. If the increment in the signal value of X resulting from the pairing of AX with reinforcement is dependent, as described, on the existent signal value of the AX configuration, it is clear that to maximize the cumulative value of X it is beneficial to minimize the contribution of A to AX’s signal value prior to the designated pairing. Thus, in the series of experiments reported in Section 111, A, it would simply be argued that each reinforced A trial, as randomly interspersed among the AX trials (Treatment I),served to increase AX’Ssignal value on the following reinforced compound trials, and thereby to decrease the X increments relative to those in the comparison treatment (11)without additional A trials. I n contrast, it would be argued that each nonreinforced A trial, as randomly interspersed among the AX trials (Treatment 111),served to decrease AX’s signal value on the following reinforced compound trials, and thereby to increase the X increments relative to those in the comparison treatment. A potential advantage of this theory is that it suggests that with a given set of cue validities, as computed over an entire experimental treatment, the sequencing of s’s experience, should have definite influences on the signal value of the various cues. Thus, if a given number of reinforced A-alone trials were to be administered in conjunctive training with reinforced AX trials, these trials would be most effective in minimizing the signal value accruing t o X if they were all administered prior to the AX trials, so as to maximally increase the signal value of the AX configuration when presented. Alternatively, administering all such reinforced A-alone trials after the AX trials should not influence the signal value of the AX configuration during its training and hence should not reduce X’s signal value. Kamin has compared these extreme alternatives, with the expected results (1968), although it remains to be determined how such alternatives compare with other conceivable arrapgements, including the presently employed randomized sequencing of A and AX trials. If a given number of nonreinforced A-alone trials were to be administered along with reinforced AX trials, it would not be predicted that administering all of the A-alone trials prior t o the AX trials would be maximally effective in increasing the cumulative signal value of X unless A began training with a high signal value. The influential effect of nonreinforced A trials would generally depend upon decreasing prior to AX reinforced trials the signal value of A that had been acquired on previous AX trials. Predictions, however, concerning how best to distribute such nonreinforced A trials would require a quantitative model for the conceptual schema under discussion.
32
Allan R. Wagner
This interpretation also requires some specification of the decrement in associative strength that should be expected to accrue to a cue as a result of a nonreinforced compound exposure. Consistency argues for an assumption symmetrical to the acquisition assumption. For example : When a configuration of stimuli is not followed by reinforcement there will be a decrement in the signal value of a component (the component will be less reacted to as a signal for reinforcement), the amount of which will be a direct function of the degree to which the combination of components is behaved toward as a signal for reinforcement. According to this schema, learning with respect to a cue depends basically upon the conditioning and extinction experiences in its presence. The important modification in conventional theory is the assumption that the magnitude of the resulting increments and decrements depends upon the prevailing signal value of the total configuration of stimuli in which the cue is imbedded. The greater the degree to which the aggregate of cues is reacted to as signaling reinforcement, the smaller the increment resulting from a reinforcement and the greater the decrement resulting from a nonreinforcement. It has been indicated how certain manipulations that change the validity of stimuli experienced in compound with a cue should change the prevailing signal value of the compounds and, hence, according to this approach, also change the expected size of the increments in signal value accruing to the cue from trial to trial. However, it is known that reducing the validity of a cue by giving reinforcement in the absence of that cue, and in the absence of any other experimenter-manipulated cues with which it occurs in compound, also reduces the signal value of the cue. This effect was seen in comparing Groups 1/11and I1 of the Saavedra experiments and has been demonstrated more dramatically by Rescorla (1968). How is this effect to be accounted for? Unless this dependence of the signal value of a cue on the absolute correlation between that cue and reinforcement can be met with a conditioning-extinction interpretation, there is little apparent advantage in acknowledging that under some conditions the theory could handle variations attributable to the relative correlations of several cues. The way in which this question can be approached is similar to the discussion in Section III,C, of the relationship between the Lehmann study and the Honig (1969) investigation. That is, one can appeal to the influence of uncontrolled “incidental” cues or, as they may as well be termed in this context, “situational” cues. When X is placed in an experimental environment and administered reinforcements, many incidental aspects of the environment may acquire signal value because of their occasional contiguous relationship with reinforcement. When a CS is presented, it is necessarily experienced “in compound” with these
Stimulus Selection and a “Modified Continuity Theory”
33
situational cues. Now, it need only be noted that when reinforcements are administered in the absence of the CS, in addition to those administered in the presence of the CS, so as to depreciate the validity of the CS, the signal value of the situational cues should be increased. Then, following the same reasoning as in the case in which the several cues in a compound are experimentally isolable, the greater the signal value of the situational cues the greater should be the aggregate signal value of the CS plus situational cues, and the lower should be the increments in signal value to the CS as a result of reinforcement. This is to say that, in general, S is sensitive to the correlation between a cue and reinforcement (e.g., Rescorla, 1966, 1967, 1968) not as a consequence of some complex experiental contrast between the probabilities of reinforcement in the presence and absence of the cue, but rather as a consequence of the resulting trial-by-trial signal value of the aggregate of cues with which that cue is presented. If the remaining cues, which may be experimentally isolable in some instances or situational in others, have appreciable signal value a reinforcement will have little incremental effect. On the other hand, if the remaining cues have appreciable signal value a nonreinforcement will have considerable decremental effect. Thus, granted equal schedules of reinforcement and nonreinforcement in the presence of a cue, in comparison with a so-called “random control procedure” (e.g., Rescorla, 1967), a positive correlation between a cue and reinforcement should ensure relatively large increments and small decrements in signal value, whereas a negative correlation between a cue and reinforcement should ensure relatively small increments and large decrements. It is notable that Rescorla (1 969) has acknowledged a similar reading of Kamin (1968, 1969) as that presented here and elsewhere (Wagner, 1969b) and has drawn essentially the same implications for a general modification of cmditioning-extinction theory to account for the effects of cue validity. I here may be differences in the Wagner and the Rescorla proposals as, for example, in Rescorla’s relative stress on the conditioning of “inhibition” when nonreinforcement occurs in the context of cues which in aggregate are reacted to as signaling reinforcement. However, a t the present stage of development of the theory, the differences are relatively inconsequential in view of the major communalities involved. I n spite, however, of the apparent usefdness of this theoretical approach, its proper evaluation in some instances must await specific quantitat,ive assumptions. For example, in the absence of quantitative assumptions, one cannot deduce the effects of the Correlated versus Uncorrelated treatments described in Section III,C. An apparently crucial factor in producing the obtained results would be that on those AX trials that were reinforced the prevailing signal value
34
Allan R. Wagner
of the compound would have been greater in the Correlated treatment, in which the A cue involved (A,)was always reinforced, than in the Uncorrelated treatment, in which the two A cues involved on different trials (A, and A,) were otherwise nonreinforced. Thus, one would expect greater increments in the signal value of the X cue to result from the reinforced compound trials in the Uncorrelated as compared to the Correlated treatment. However, considering those AX trials that were nonreinforced, the prevailing signal value of the compound should have been less in the Correlated treatment, in which the A cue involved (A2)was always nonreinforced, than in the Uncorrelated treatment, in which the two A cues involved on different trials (A, and A,) were otherwise reinforced. According to the nonreinforcement proposition that has been suggested, one would therefore expect greater decrements in the signal value of the X cue to result from the nonreinforced compound trials in the Uncorrelated as compared to the Correlated treatment. I n order to account for the overall higher level of responding to X alone in the Uncorrelated condition, it would have to be argued that the greater increments enjoyed by that group on reinforced trials more than offset the greater decrements occurring on nonreinforced trials. Such an assumption could only be made on an ad hoc basis a t this time, as could the necessary assumptions that would allow for the shift in performance that occurred when training was changed from Uncorrelated to Correlated, or from Correlated to Uncorrelated. The Correlated and Uncorrelated treatments were relatively powerful in demonstrating the occurrence of stimulus selection, but in the absence of quantitative specificity more analytically simple studies are presently required to evaluate the potential usefulness of this theoretical approach. Still, it should be added, granted that the Correlated and Uncorrelated treatments have the effect that they do upon a common cue in the compounds involved, the data from the Lehmann study, as described in Section III,C, are predictable. I n that study, it may be recalled, all 8 s experienced AX, AB,, and AB, compounds, with AX always reinforced and the latter two compounds on either Correlated or Uncorrelated schedules. To the degree that Correlated, as compared to Uncorrelated, training reduced the signal value of A, it should have reduced the aggregate signal value of the AX compound during training, which in turn should have allowed greater increments t o accrue t o X as a result of the AX reinforcements.
V. An Experimental Evaluation of Modified Continuity Theory The theory that has been proposed as an alternative to attentional theory appears to account, in a relatively parsimonious fashion, for a
Stimulus Selection and a “Modified Continuity Theory”
35
number of facts concerning stimulus selection. A survey of the relevant data, however, indicates a lack of critical information concerning the adequacy, or the necessity, of the nonreinforcement assumption. That is, the bulk of the studies reported here, in addition to the investigations of Kaniin (1968, 1969), bear primarily on the assumption that the increments in associative strength consequent to reinforcement depend upon the strength of the compound involved. While there are data (e.g., Rescorla 8: LoLordo, 1965) that are encouraging to some symmetrical assumption in the case of nonreinforcement, there have been no relevant studies with an analytic simplicity comparabie to the studies reviewed in Section II1,A. A previously unpublished study, conducted with the assistance of Maria Saavedra and Gerd Lehmann, was designed to evaluate the nonreinforcement assumption. The study is of special interest since, in this instance, attentional theory appears to lead to predictions very different from those of the present theory. Thus, the investigation may also be seen as providing an initial test of the two interpretations, either one of which might be presumed capable of accounting, with various degrees of efficiency, for the data thus far reviewed. The format of the investigation was very simple : Train Ss to respond to some cue X ; then combine X with a second cue which has been made t o have either relatively high or relatively low associative strength and nonreinforce the compound; finally, test for the responding to X to determine the degree of decrement resulting from the nonreinforced compound trials. According to the nonreinforcement assumption of modified continuity theory, a greater decrement should be observed in that condition in which the nonreinforced compound has the greater associative strength, i.e., in that condition in which X is nonreinforced in compound with the relatively strong cue as compared t o the relatively weak cue. The study involved eyelid conditioning with the rabbit, and generally employed parameters similar to those in previous investigations, including a 1000-msecCS-US interval, a loo-msec, 4.5-mA, electric-shock US, delivered across the orbit of the eye, and a variable intertrial interval averaging 1 minute. Tliirty-six Ss were first conditiclned to three separate stimulus elements, which will be referred t o as A, B, and X. I n each block of 68 acquisition trials there were 32 A, 4 B, and 32 X trials, irregularly ordered, in which the respective cues were presented alone and reinforced. Two such blocks were administered on Day 1 and five on Day 2 of training. The A and B cues, by virtue of the different numbers of reinforcements in their presence, were designed to have different associative strengths, i.e., A was designed to have a relatively high signal value and B a
Allan R. Wagner
36
relatively low signal value, by the end of acquisition. For half of the Ss, A was the 20/second flashing of a strobe lamp, and B the vibration of a commercial hand massager, as used in the earlier Lehman study. For the remaining Ss, the nature of the cues designated as A and B was reversed. Following the notation in t h s previous studies, X was the target cue, and for all Ss it was a 3160-Hz tone, 10 dB above the intertrial noise level. It should be appreciated that during acquisition all Ss received not only the same number of reinforcements in the presence of X, but ACQUISITION A L L COMPONENTS
$
EXTINCTION
RE ACQUISITION
SELECTED COMPOUND
90
n
80
z 0 t 0 z
60
-
s W
70
W
VJ
2 z
W
V LL
a w
1 2 3 4 5 6 7 BLOCKS OF 32 A, 4 B,and 32 X TRIALS
1
2
3
4
1
2
3
4
BLOCKS OF S TRIALS
FIG.15. Mean percentage conditioned eyeblink responses during three training phases involving acquisition to each of three separate component CSs, extinction with one of two compounds formed from the acquisition components, and reacquisition t o the component common to the two extinction compounds.
also the same number of reinforcements in the absence of X, i.e., in the presence of A and B. Thus, X had a common validity during acquisition for all Ss. Immediately following acquisition, S was assigned to one of two treatment conditions and administered 32 extinction trials in which X was presented and nonreinforced. For 18 of the Ss X was presented during extinction in compound with Ss’ A cue, while for the remaining 18 Ss it was presented in compound with the B cue. Assignment of Ss was such that which specific cue had been designated as A and B was counterbalanced, and the acquisition performance to each of the A, B, and X cues had similar means and variances in the two conditions. On the 32 trials immediately following the extinction phase, X was
Stimulus Selection and a “Modified Continuity Theory”
37
again presented alone to all Ss and was reinforced. Comparison of 8s’ responding during this reacquisition phase with the level of responding a t the end of original acquisition allowed a determination of the decremental effects suffered as a result of the intervening extinction, with either of the two compounds containing X. Figure 15 presents the mean percentages of conditioned-eyelid responding to the several CSs during the three phases of the experiment. The acquisition functions that summarize the responding of all 36 Ss to the three components prior t o differential treatment indicate that there was appreciable acquisition to the X cue and, importantly, different amounts of acquisition to the A and B cues : Ss reached a much higher level of responding to the more frequently occurring A cue than t o the less frequently occurring B cue. It is thus possible to conclude that the different numbers of training trials had produced different degrees of associative strength to these two cues prior t o the extinction phase. That there was somewhat mme responding to A than t o X, although both were equally reinforced, is inconsequential, attributable apparently to the different specific stimuli involved. Further evidence that A and B attained different associative strengths may be seen in the extinction functions of Pig. 15. The group that received X in compound with the A cue responded more during extinction than the group that received X in compound with the presumably weaker B cue. The data of major interest, however, are depicted in the reacquisition functions of Fig. 15, which summarize the subsequent responding to X alone in each of the two treatment groups. As is apparent, there was less responding t o X following the AX extinction than following the BX extinction condition. The group.in which the 32 nonreinforced exposures to X involved a relatively strong compound containing the A cue experienced a greater decrement in responding to X than did the group in which the same nonreinforced exposures t o X involved a relatively weak compound containing the B cue. The statistical reliability of these findings was evaluated by determining for each S the difference in percentage responding to X during the final 32 acquisition trials and the 32 reacquisition trials. A Mann-Whitney U test of the difference between the two groups in the size of the decrements involved yielded a probability value of less than .002. That these findings cannot be accounted for in terms of conventional conditioning-extinction theory, but are anticipated by the present modified version of such theory, should be obvious. It was further suggested, however, that the investigation might also reflect on the adequacy of attentional theory. It seems, in fact, that attentional theory would predict results just the opposite of those obtained, i.e., that there would
38
Allan R. Wagner
be a greater decrement in responding to X as a result of extinction with the BX compound than with the AX compound. Attentional theory proposes that to profit from a training experience with X , S must attend to X . This phenomenon should obtain whether the training experience is with reinforcement or with nonreinforcement. Thus, the degree to which responding to X should have extinguished during the treatment phase, as S had the opportunity to learn that reinforcement did not follow X , should have varied with the degree of attention to X. Therefore, it need only be asked: I n which condition should S have been more expected to attend to X during the compound extinction trials? The answer appears to be that S should have been more likely to attend to X during extinction with BX than during extinction with AX. The differential frequencies of reinforcement in the presence and absence of A, as compared to the presence and absence of B, should have made all Ss more likely to attend to A than to B by the end of acquisition if any difference were to be expected. Now, since the likelihood of attending to one cue on a trial is assumed to be inversely related to the likelihood of attending to the other cues with which it is in compound (e.g., Sutherland, 1964; Mackintosh, 1965a), X should have been more attended to when in compound with B than when in compound with A. Thus, the fact that Ss in the AX rather than the BX condition appeased to learn better that reinforcement did not follow X , i.e., to extinguish more their responding to X, seems t o directly contradict the expectation from attentional theory.
VI. Concluding Comments When concomitantly faced with a number of features of the environment, each of which offers information concerning the occasions for reinforcement or nonreinforcement, S does not learn about each as though it were the only cue available. With various experimental arsangements, it is possible to demonstrate the operation of some acquisition-limiting process such that it appears that the more S learns about one cue the less it learns about other concomitant cues. A reasonable theoretical response to such observation is to propose a limited attentional capacity, so that attending to one cue can be assumed to reduce the likelihood of attending to, and hence learning about, other potential cues. However intuitively appealing this proposal, it need be more widely recognized that it brings with it certain difficult systematic problems, as indicated in Section IV, and that there are other possibilities, the exploration of which may have a salutary effect upon the state of our theory. For example, the case has been made in detail elsewhere (Wagner, 1969a)
Stimulus Selection and a “Modified Continuity Theory”
39
that in many situations in which S has appreciable control over its experiences, such as in a selective learning or concept identification task, the acquisition-limiting process may lack all theoretical subtlety. That is, as S learns with respect to one cue dimension, it may simply no longer present itself with a schedule of stimulus exposures and reinforcements efficient to learning about the other available dimensions. The alternative acquisition-limiting rule that has been entertained in the present chapter represents a modest, if far-reaching, modification in conditioning-extinction theory : To the degree that an aggregate of cues is already maximally behaved toward as signaling reinforcement, further reinforcement of the compound will not increase this tendency for any of the component cues ;to the degree that an aggregate of cues is already maximally behaved toward as signaling nonreinforcement, further nonreinforcement of the compound will not increase this tendency for any of the component cues. A number of advantages of this proposal have been discussed, in addition to the fact that it allows for an interpretation of various stimulusselection findings. It was indicated, for example, how it can account for the sensitivity of 8s’ behavior to the correlation between a cue and reinforcement. The reasoning involved in this instance, i.e., the assumption that the effects of reinforcement or nonreinforcement of a cue depend largely upon the signal value of the contextual stimuli with which the cue is compounded has, in fact, implications for many research areas. Contrast phenomena, for example, should be the rule, as reinforcement experiences in the presence of one experimental cue should modify the signal value of the contextual stimuli and thereby modify the effects of the reinforcement and nonreinforcement of a second cue in the same context. Not the least advantage of the theory is its testability. It is possible in many redundant cue situations to assess S’s reaction to the aggregate of cues prior to reinforcement, in order to predict the consequences of reinforcement, whereas it might be impossible to specify which cue S was attending to, in order to similarly predict the consequences of reinforcement from attentional theory. Thus, where the approach is inadequate should be especially easy to reveal. No doubt, inadequacies in this specific theory will quickly become apparent, although the approach was seen to fare quite well, especially in comparison with attentional theory, in the initial experimental evaluation described in Section V. Most important, perhaps, a t this time, is an appreciation that the phenomenon of stimulus selection, as here demonstrated, rather than simply helping to close an old issue, may provide an opening t o more adequate systematic treatments, whether of an attentional or nonattentional variety.
40
Allan R.Wagner REFERENCES
Bruner, J., Matter, J., & Papanek, M. L. Breadth of learning as a function of drive level and mechanization. Psychological Review, 1955, 62, 1-10. Estes, W. K., & Burke, C. J. A theory of stimulus variability in learning. Psychological Review, 1953, 60, 276-286. Goodrich, K. P., Ross, L. E., & Wagner, A. R. An examination of selected aspects of the continuity and noncontinuity positions in discrimination learning. Psychological Review, 1961, 11, 105-117. Honig, W. K. Attention factors governing the slope of the generalization gradient. I n R. M. Gilbert & N. S. Sutherland (Eds.), Animal discrimination learning. New York: Academic Press, 1969. Hughes, C. L., & North, A. J. Effect of introducing a partial correlation between a critical cue and a previously irrelevant cue. Journal of Comparative and P h y s i o l o g i ~ lPsychology, 1959, 52, 126-128. Hull, C. L. Principles of behavior. New York: Appleton-Century-Crofts, 1943. Hull, C. L. Simple qualitative discrimination learning. Psychological Review, 1950, 57, 303-313. Jonckheere, A. R. A distribution-free k-sample test against ordered alternatives. Biometrika, 1954, 41, 133-145. Kamin, L. J. Attention-like processes in classical conditioning. I n M. R. Jones (Ed.), M i a m i Symposium in the prediction of behavior: Aversive stimulation. Miami: University of Miami Press, 1968. Kamin, L. J. Predictability, surprise, attention and conditioning. In B. Campbell & R. Church (Eds.), Punishment and aversive behavior. New York: AppletonCentury-Crofts, 1969. Lashley, K. S. An examination of the “continuity theory” as applied to discriminative learning. Journal of General Psychology, 1942,26, 241-265. Lawrence, D. H. Acquired distinctiveness of cues :11.Selective association in a constant stimulus situation. Journal of Experimental Psychology, 1950, 40, 175-187. Levine, M. Hypothesis behavior by humans during discrimination learning. Journal of Experimental Psychology, 1966, 71, 331-338. Lovejoy, E. P. An attention theory of discrimination learning. Journal of mathematical Psychology, 1965, 2, 342-362. Lovejoy, E . P. An analysis of the overlearning reversal effect. Psychological Review, 1966, 73, 87-103. Lovejoy, E. P., & Russell, D. G. Suppression of learning about a hard cue by the presence of an easy cue. Psychonomic Science, 1967, 8, 365-366. Mackintosh, N. J. Selective attention in animal discrimination learning. Psychological Bulletin, 1965, 64, 124-150. (a) Mackintosh, N. J. Incidental cue learning in rats. Quarterly Journal of Experimental Psychology, 1965,17, 292-300. (b) Perkins, C. C., Jr. The stimulus conditions which follow learned responses. Psychological Review, 1955, 62, 341-348. Rescorla, R. A. Predictability and number of pairings in Pavlovian fear conditioning. Psychonornic Science, 1966, 4, 383-384. Rescorla, R. A. Pavlovian conditioning and its proper control procedures. Psychological Review, 1967, 74, 71-80. Rescorla, R. A. Probability of shock in the presence and absence of the CS in fear conditioning. Journal of Comparative and Physiological Psychology, 1968,66,1-5.
Stimulus Selection and a “Modified Continuity Theory”
41
Rescorla, R. A. Conditioned inhibition of fear. I n W. Honig & N. Mackintosh (Eds.), Fundamental issues in associative learning. Halifax, N.S. : Dalhousie University Press, 1969. Rescorla, R. A., & LoLordo, V. M. Inhibition of avoidance behavior. Journal of Comparative and Physiological Psychology, 1965, 59, 406-412. Restle, F. The selection of strategies in cue learning. Psychological Review, 1962, 69, 11-19. Seligman, M. E. P. Chronic fear produced by unpredictable electric shock. Journal of Comparative and Physiological Psychology, 1968, 66, 402-41 1. Spence, K. W. The nature of discrimination learning in animals. Psychological Review, 1936, 43, 427-449. Spence, K. W. Continuous versus non-continuous interpretations of discrimination learning. Psychological Review, 1940, 47, 271-288. Spence, K. W. Behavior theory and conditioning. New Haven, Conn. : Yale University Press, 1956. Sutherland, N. S. Visual discrimination in animals. British Medical Bulletin, 1964, 20,54-59. Sutherland, N. S. Partial reinforcement and breadth of learning. Quarterly Journal of Experimental Psychology, 1966, 18,289-301. Sutherland, N. S., & Andelman, L. Learning with one and two cues. Psychonmnic Science, 1967, 7, 107-108. Sutherland, N. S., & Holgate, V. Two cue discrimination learning in rats. Journal of Comparative and Physiological Psychology, 1966, 61, 198-207. Tolman, E. C., & Brunswik, E. The organism and the causal texture of the environment. Psychological Review, 1935, 42, 43-77. Trabasso, T., & Bower, G. H. Attention in learning. New York: Wiley, 1968. Wagner, A. R. Incidental stimuli and discrimination learning. I n G. Gilbert & N. S. Sutherland (Eds.), Discrimination learning. New York: Academic Press, 1969. (a) Wagner, A. R. Stimulus validity and stimulus selection. I n W. Honig & N. Mackintosh (Eds.), Fundamental issues in associative learning. Halifax, N.S. : Dalhousie University Press, 1969. (b) Wagner, A. R., Logan, F. A., Haberlandt, K., & Price, T. Stimulus selection in animal discrimination learning. Journal of Experimental Psychology, 1968, 76, 17 1-180. Wagner, A. R., Thomas, E., & Norton, T. Conditioning with electrical stimulation of motor cortex : Evidence of a possible source of motivation. Journal of Comparative and Physiological Psychology, 1967, 64, 191-199. Weiss, J. M. Effects of predictable and unpredictable shock in development of gastrointestinal lesions in rats. Proceedings of the 76th Annual Convention, American Psychological Associatio:.,, 1968, 263-264. Zeaman, D., & House, B. The role of attention in retardates discrimination learning. In N. P. Ellis (Ed.),Handbook of mental deficiency. New York: McGrawHill, 1963.
This Page Intentionally Left Blank
ABSTRACTION AND THE PROCESS O F RECOGNITION' Michael I . Posner UNIVERSITY O F OREGON EUGENE. OREGON
I. Introduction .............................................
A . Levels of Processing ................................... B. Abstraction .......................................... C Generation ........................................... D Recognition .......................................... E Chapter Contents...................................... I1. Stimulus Examination .................................... A . VisualMatching ....................................... B . Role of Familiarity .................................... C . UnitsofProcessing .................................... D Serial and Parallel Processes ............................ I11. Past Experience .......................................... A . Analog Operations ..................................... B . Schema Formation .................................... IV Visual Representation in Memory ........................... A . ChangesoverTime .................................... B . Manipulating Attention ................................ C Rehearsal ............................................ D Generation ........................................... E . Rehearsal and Long-Term Memory ...................... V . Separating the Visual and Name Codes of Prior Stimulation .... A . Multiletter Arrays ..................................... €3 . Manipulating the Name Code ........................... C . Manipulating the Visual Code ........................... D . Searching Visual and Name Codes ....................... E . Summary ............................................ VI . Summary and Conclusions ................................. References ............................................
. . . .
.
. .
44 44 45 46 46 47 47 48 49 53 54 56 56 61 74 74 76 77 80 83 84 85 87 89 92 94 94 96
Research described in this chapter was supported by NSF Grants GI3 3939 and GB 5960 and by the Advanced Research Projects Agency of the Department of Defense monitored by the Air Force Office of Scientific Research under Contract No . F 44620.67.C.0099 . It was written while the author was on a National Science Foundation Senior Postdoctoral Fellowship at the Applied Psychology Research Unit of the Medical Research Council, Cambridge, England . The work described involves collaboration with a number of colleagues and students at the University of Oregon . I am particularly grateful to Dr Steven Keele for his help a t every stage of the research 43
.
.
44
Michael I. Posner
I. Introduction The full richness of stimulus experieiice is not available in normal memory. Our retention of previous events is not as vivid or complete as the original perception. One reason for this is that selective attention leads to the storage of some aspects of a scene rather than others. Even stimuli that are processed may lose specificity as more general classifications are achieved. As stimuli undergo successive stages of encoding, each stage produces a record which can be read for some period of time later as a memory. Most students of perception consider the ability t o select and to abstract stimuli as an achievement which underlies complex cognitive development (Bruner, 1957 ;Flavell, 1963).The absence of such abstract codings is a deficit frequently found to accompany brain damage (Goldstein, 1948). On the other hand, the inability to recapture the details of stimulus experience has usually been considered as a failure of memory. I n attempting to understand the relationship between perception and memory, it is necessary to emphasize the utility of the abstractive quality of memory for normal cognition, as has been done recently in a remarkable case history of a man with a nearly complete inability to forget (Luria, 1967). This chapter is concerned with the successive stages of processing that are involved in the encoding of simple stimuli and with the record that each stage produces. These issues lie within the areas of perception and memory, respectively. Because the recognition of stimuli is impossible without stored information, it will also be necessary to consider learning of trace systems applicable to the classification of patterns never before seen. The topics discussed in the chapter as a whole are those that might be involved in a task such as recognizing a handwritten “A”. A. LEVELS OF PROCESSING
At one stage, the visual pattern “A” is coded as a set of lines forming a unified but unfamiliar figure which is not different from an infinite number of line combinations of similar complexity that are not letters. It appears possible to isolate this early level of perception by appropriate experiments (Hochberg, 1968 ; Posner & Mitchell, 1967). As encoding proceeds, past experience is brought into contact with the new input. This checking against stored information requires a measurable period of time and can thus be analyzed by experiments (Sternberg, 1967a). The trace system (abstract idea) representing past visual experience with the letter “A” is in turn connected t o the name of the letter (A). There are also superordinate classifications, such as letter or vowel, to which both the name A and the visual pattern “A” logically belong. Figure 1 schematizes these levels of analysis. It must not be assumed that these
Abstraction and the Process of Recognition
45
levels are steps in a serial chain or that each successive code obliterates the last. Rather, the analysis of how Ss pass from one code to another and what remains of previous codes are the empirical questions with which this chapter is concerned. Stimulus
Abstract'
idea
Name
Rule tion
FIG.1. A general outline of levels of processing reviewed in this chapter. The two processes of abstraction and generation are viewed as connecting the different coding levels.
B. ABSTRACTION The process of moving from the top to the bottom of Fig. 1 may be called abstraction. I n psychological research, the term abstraction has been used in two different ways. One sense of abstraction involves the selection of certain portions or aspects of an experience. A second sense refers to the classification of a stimulus into a wider or more inclusive superordinate category. The first sense of abstraction has primarily been applied to the study of visual stimulation (Humphrey, 1951). For example, Kulpe (Humphrey, 1951) studied the abstraction of the attribute size from complex materials which also varied in color, form, and number. This sense of bbstraction is related to the idea of an abstract representation or composite photograph which includes the common elements and eliminates the differences among separate visual experiences (Woodworth, 1938). The second sense of abstraction has been used primarily with the investigation of object names. For example, Pollack (1963) studied classification of the names dog, goat, and so on, into the superordinate category animal. This sense of abstraction does not involve selection of any physical aspect of the stimulus, but rather a relationship between a particular stimulus name and another broader category name. Ribot ( 1 899) called attention to the abstraction that connects visual patterns with their names. He conceived of abstraction as a continuous process which begins with specific visual patterns or scenes and continues
46
Michael I. Posner
to complex semantic categories such as “justice” or “liberty.” He saw the name of an object as an intermediate form of abstraction lying between more specific visual experiences and more general abstract words. In recent years, studies of word recognition and short-term memory have discussed the relationship between the visual and acoustic levels of processing. It has been shown in many situations (Conrad, 1964; Hochberg, 1968) that a visual letter is translated into acoustic or articulatory form in the process of representing it in memory. While it now seems unlikely that this is an obligatory transformation (Neisser, 1967 ; Posner, 1967), it certainly appears to be a frequent mode ofprocessing. The codes used at each successive level of Fig. 1 stand for an increasing variety of individual instances at lower levels. In this sense, both selection and classification serve as means of abstraction. More generally, abstraction can be thought of as a process involving information reduction (Posner, 1964a) which produces encodings of increasingly greater generality.
C. GENERATION It is also possible to proceed from a more abstract level of information to a more specific one. This process will be called generation. One can provide an abstract word and ask for an enumeration of specific instances that are subordinate to it. Experiments of this type were undertaken at Wiirzburg and continued in the work of Otto Selz (Mandler & Mandler, 1964). Recent studies have taken advantage of such generation to account for the ability of Ss to recall a large number of individual words when they are instances of well-learned categories (Cohen, 1963). Just as the process of abstraction can relate a visual pattern to its name, so it may be possible to produce a visual memory code from a letter name. Experiments relating to this question are introduced in Section IV.
D. RECOGNITION The method of studying levels of processing used in this chapter involves the recognition of identity. The recognition task may be as simple as indicating whether or not two simultaneous visual stimuli are physically identical or as complex as deciding whether or not two letters are both consonants. The recognition of identity has considerable intrinsic interest. Locke considered it one of the basic cognitive operations (Reeves, 1965), a view that still has advocates (Miller, Galanter, & Pribram, 1960; Stevens, 1966). Moreover, a recognition procedure has the practical advantage that, regardless of the complexity of the cognitive operations involved in the decisions, the output requirement can be a simple binary choice. I n most of the studies reported here, response
Abstraction and the Process of Recognition
47
speed will be used as a basis for inferring the stages involved in accomplishing the task.
E. CHAPTERCONTENTS The experiments presented in this chapter will deal with the processes of abstraction and generation, using the recognition of letter stimuli. The sections correspond roughly t o the levels of analysis shown in Fig. 1. Section TI deals with matching simultaneous visual patterns. The approach is to examine levels of abstraction a t which it would be logically possible for recognition to be free of the effects of past experience concerning the stimulus names. I n Section 111, experiments are presented that seek to explore the development of a trace system relating new visual input to past visual experience. As a result of naming visual letters, Ss are able to develop a record of both the visual input and the name. Section I V explores the visual memory code of a single letter. Consideration is given both to visual codes that result from visual stimulation and to those that are generated from the letter name. Section V presents studies that manipulate separately the visual and the name components of letter arrays. I n the final section, an effort is made t o summarize the findings by considering the general utility of the framework that these experiments provide. The chapter is based primarily upon published studies so that emphasis can be given to speculations concerning integration of the experimental results.
11. Stimulus Examination Neisser and Beller (1965) use the term stimulus examination to refer to a task in which Ss look for a target that has a single physical form, for example, looking €or an “A.” The requirement for Ss t o use stored information concerning the target letter indicates that the task involves memory for the target in addition to stimulus examination. From a theoretical view, there is an even more primitive task. This occurs when a pair of target items are exposed simultaneously and the job of S is merely to indicate whether or not they are identical, basing his judgment upon their physical form. Bruner (1957) suggested that this task differs from other perceptual tasks in that it may be free from familiarity effects. At least it is possible to perform an experiment in which neither target item has been seen before. There are several questions of crucial interest in defining the level of processing involved in this task. First, does the matching proceed prior to naming? Second, is the match affected by the familiarity of the forms and, if so, in what way? Finally, what are the structural details of such matches? What are the elements or units that are being matched? Are
48
Michael I. Posner
these units handled serially or in parallel? The remainder of this section is devoted t o studies that bear upon these questions. A. VISUALMATCHING Subjects can establish the identity of pairs of stimuli even when the patterns do not lend themselves to verbal labels. If two nonsense patterns are exposed simultaneously, it is possible for Ss to say rapidly whether or not they are identical. I n a psychophysical test, Ss are able to make hundreds of accurate comparative judgments along a single sensory dimension (Woodworth, 1938), but can identify relatively few stimuli on an absolute basis (Attneave, 1959). Thus, many matching tasks can be based on other than a verbal code. The basis of the match is less obvious when highly familiar letter stimuli are used. 1. Physical and Name Matches
Posner and Mitchell (1967) described several experiments in which Ss were required t o decide as rapidly as possible whether two simultaneous visual letters were the same or different. The response was indicated by pressing one of two keys and latencies were recorded. Experiments were conducted in which “same”was defined either as being physically identical (e.g., AA), or as having only the same name (e.g., Aa). Responses to physically identical pairs were about 70-100 msec faster than t o pairs having only the same name. This was true both when comparisons were made between separate experiments using the two levels of instruction and when the two types of pairs were examined within a single namelevel experiment. Moreover, a “different” response t o a stimulus pair such as AB was about 70-100 msec faster in an experiment using physical identity instruction than in one requiring matches based on the name. Thus, the same stimulus-response combination (AB-different) gave rather different response times (RTs) depending upon the instructions used to define “same.” These studies showed clearly that Ss can match stimuli based upon physical identity faster than those based upon the letter name. 2 . Independence of Matches
Two lines of converging evidence suggest that the physical match is not influenced by the name of the letter. When Ss were instructed t o respond on the basis of physical identity, “different” responses to pairs such as Aa or Bb, which had the same name, were not longer than “different” responses to pairs that did not have the same name. Despite a lifetime of calling “A” and “a” by the same name, there was no interference of that overlearned habit in physical matching.
Abstraction and the Process of Recognition
49
The second line of evidence comes from an experiment by Chase and Posner (1965).In one condition of this experiment, Ss received a display consisting of a single visual letter (target)surrounded by a circle containing one to four additional letters (array). They were to respond “yes” as rapidly as possible when the target was contained in the array. Sometimes the letters used in array and target were visually confusable (e.g., OQGD, and so on) and at other times they were visually distinct but had similar names (e.g., BCDEP, and so on). The data showed clearly that visual similarity had a marked affect on matching speed but that auditory similarity did not. The effect of visual similarity was greatly reduced when either the array or target was in memory. Matches in which both array and target are present seem t o depend upon visual factors much more than when either is in memory. This would be the case if the target letter were being matched visually to each array letter, rather than being read first and then matched to the array letters.
B. ROLEOF FAMILIARITY A continuing problem in psychology concerns the effect of familiarity upon the process of perception. There can be no doubt that familiarity affects the ability of Ss to name words or letters, since the ability to name is itself a product of learning. Indeed, any time Ss are required to search their memory in order to match an incoming stimulus with stored information, it is logically necessary for past experience to be involved to at least some degree. If the perceptual matching task reviewed in the last section goes on at a level prior to naming, it could be at a stage of perception that is not dependent on past experience. Hochberg (1968) investigated this point in a series of studies. He found that matching upright letters was no faster than matching upside down letters as long as the letters were exposed side by side. When there was a distance between the pair of letters, upright letters were superior. Since in his studies matching physically identical letters (AA)was faster than name matching (Aa) only when the letters were adjacent and not when they were split, it seems fair to argue that under his conditions Ss were using visual information for the adjacent matches and name information when the field was split. From this evidence, he concluded that familiarity did not affect perceptual matches as long as it was unnecessary to place information in memory. Only when the material had to be identified, or when storage of the stimulus was necessary, was familiarity effective. In an even more drastic operation, Posner and Mitchell (1967) compared the physical matches for letter stimuli and Gibson figures. The SS were college students with years of exposure to letters and with 3 days of practice in the letter-matching task. Nevertheless, during the first hour of exposure to Gibson figures,Ss gave physical identity reaction
50
Michael I. Posner
times (RTs) which were as fast as for letters. Once again familiarity had no effect upon the speed of physical matches. It would be tempting to conclude from these data that familiarity never affects this early stage of processing and that the physical match is a situation that always occurs prior to the familiarity operation. Such a conclusion, however, seems unwarranted in light of the results obtained in two recent master’s theses. Eichelman (1968) compared the time required to match pairs of letter strings consisting of 1,2, 4,or 6 elements which either were nonsense or formed meaningful words. The strings consisted of uppercase letters typed horizontally with normal spacing and presented one string on top of the other. In the word condition, both stimuli formed familiar English words, even if they were not identical. The nonsense strings were obtained by scrambling the same letters that appeared in the words. The two strings always had exactly the same number of letters. If the strings were not identical, either 1, 2, 4 or 6 letters could be different. The strings subtended a visual angle of less than 2” and were exposed for 1 second. The Ss were instructed to respond “ same” if the two strings were physically identical and “different” if they were not. All responses were to be made as rapidly as possible. The results of the study are shown in Fig. 2. There is a highly significant difference between the two curves which increases with the number of letters. The clear effect of familiarity on multiletter strings seems to conflict with the results for a single letter. Moreover, Hochberg (1968) reported that he obtained no difference in matching speeds between nonsense and meaningful strings. Hochberg’s results, however, come primarily from tasks in which the strings were exposed vertically and thus do not look familiar even when they formed words. Since Mewhort (1966) has shown that even small changes in spacing reduce effects of familiarity in tachistoscopic recognition tasks, it appears that Hochberg’s procedure is not a critical test of this question. Another possible explanation for the Eichelman results could be that Ss read the words and matched at the name level rather than at the physical level. Since he did not have strings that varied in physical form (e.g., one uppercase and one lowercase), it is impossible to eliminate this explanation completely.2 However, Eichelman analyzed the time required to respond “different” as a function of the number of letters that differed between the two strings. RT was a decreasing linear function of the number of different letters and these functions were Eichelman has now run this condition with four letter strings. Physically identical matches both for words and nonsense strings were significantly faster than name matches. This indicates that physica.1 matches in this task were not based upon reading the words and confirms the effect of familiarity upon multiletter arrays.
Abstraction and the Process of Recognition
51
nearly identical in form for nonsense and meaningful materials. This finding, together with a lack of any relationship in the data between RT and word familiarity, makes it seem doubtful that the match was being made on the basis of word names. Cox (1967) also studied a visual matching task. I n his case, the stimuli were large complex nonsense patterns which were exposed on slides. Cox measured the time for Ss t o respond “same” or “different” to a pair. Prior to his RT task, he pretrained five of his groups either with the stimulus pairs they were to see or with other pairs. Relevant pretraining
I
2
3
4
5
6
String length (letters)
FIG.2. 1tT for responding that two strings are physically identical. The stimuli are words or nonsense strings made from the same letters. (After Eichelman, 1968.)
improved performance significantly over no pretraining and over irrelevant pretraining controls. The pretraining was equally effective whether or not it involved the use of verbal labels as part of the training. Cox’s study shows once again that familiarization can be effective in visual matches. Two points should be stressed from this study. First, specific training on names was not more effective than mere visual exposure. This finding suggests that matching was not being enhanced by names and agrees with arguments presented previously (Section II,A,1). Second, unlike the small and relatively simple letter stimuli used by Hochberg and Posner, Cox’s stimuli were large, nonfoveal, and complex. His times were quite long. This second point suggests that these stimuli, like
52
Michael I. Posner
Eichelman’s letter strings, consisted of a number of more elementary units. So far we have considered only matching tasks that involve a RT measure. A number of investigators have studied the role of familiarity upon recognition in tasks that used thresholds as the dependent variable. Such studies are not always clearly related to the level of stimulus examination. Many of them require S to name or identify the stimulus (e.g., Gibson, Bishop, Schiff, & Smith, 1964),a task that clearly requires the use of past experience in the naming process. One relevant technique, however, is to study “same-different” judgment thresholds. R,obinson, Brown, & Hayes (1964) compared the effects of familiarity on threshold for matching and for naming simultaneous foveal letters. The stimuli were letter pairs presented in the familiar orientation or rotated. The orientation of the letters had no effect upon the threshold for determining whether they were the same or different, but did affect the energy required to name them. This study shows essentially the same results as studies using RT for single letter pairs. Another technique used to study stimulus examination involves informing S of what he is to see and then using his report about the number and clarity of the letters that he does see as a dependent variable (Haber, 1965; Hershenson, 1969). This technique has recently been employed by Hershenson (1969) to investigate the effect of varying the familiarity of sequences of seven letters. Hershenson varied familiarity by increasing the approximation of his letter strings to normal English. He found familiarity had large effects when Ss had t o name the letters. It had a reduced but significant effect upon their reports about the clarity of the letters when they knew what they were to see. It is impossible to tell whether such reports rest entirely on the input, or whether they also involve contact between the input and past experience. Hershenson did attempt to introduce a converging operation t o indicate that the reports were influenced primarily by visual factors. He found that the perceptual reports indicated that letters near the fixation point were seen most clearly, while identification was best for letters on the left. It should be noted that the results of Hershenson’s study are in agreement with those obtained by Eichelman for multiletter strings using the matching task. While comparing procedures involves many difficult problems, there seems to be considerable consistency in what is found. For single letters familiarity has no effect on simultaneous matching. This has been found using a threshold procedure (Robinson et al., 1964) and the RT method (Hochberg, 1968; Posner & Mitchell, 1967). For multiple letter strings, familiarity does affect matching. This is shown by the RT procedure (Eichelman, 1968)and by the method of perceptual report (Hershenson,
Abstraction and the Process of Recognition
53
1969). Cox’s study using nonsense forms seems to agree quite well with the letter studies in this regard. Of course, when Ss are actually asked to identify the stimulus there is no doubt that familiarity matters both for single letters (Hochberg, 1968; Robinson et al., 1964) and for letter strings (Hershenson, 1969; Gibson et al., 1964).
C. UNITSOF PROCESSING Familiarity seems to affect matching of letter strings and complex figures but not of single letters or simple forms. If a letter is thought of as a processing unit, it would be possible to argue that familiarity helps the integration of units, but does not change perception of individual units. Of course, the question of a unit of processing is one that is very complex. Neisser (1967) suggests that the units involved in processing visual material can be anything from a small section or segment of a letter to many words, depending upon the task. Some perceptual theorists (e.g., Hebb, 1949) have proposed that line slopes and vertices serve as units within simple line figures. Data from neurophysiological research (Hubel & Wiesel, 1965), disappearance under conditions of fixed retinal position (Hebb, 1963), and the analysis of grouping (Beck, 1966) have all suggested that simple line slants may serve as units which are combined in the analysis of more complex forms. Eichelman (1968) attempted to determine, in a matching task, if a single letter could be thought of as a bundle of slope units. He ran experiments in which S s received pairs of stimuli selected either from four capital letters (ABCE) or from four line slants (horizontal, vertical, and left and right oblique). A small population of items was used in order t o reduce the problem of discriminability. The results of his experiments showed that regardless of whether the stimulus pair was simultaneous or successive and of whether stimulus populations were run in blocks or mixed randomly, the letter stimuli were matched at least as rapidly as the single line slopes. These data seem to indicate that a letter cannot be thought of as an integrated bundle of line slopes for the purpose of simultaneous matching. Moreover, the data from matching strings of letters show that the letter serves as a reasonable unit for multiletter matching tasks, particularly with nonsense material. This is indicated by the high degree of linearity for matching nonsense strings shown in Fig. 2. One might argue that letters are dealt with initially as individual slants and are integrated only with practice. However, the fact that Gibson forms showed performance equal to the letters suggests that compact figures of the complexity of letters are not divided into units in the matching task, even when they are unfamiliar.
54
Michael I. Posner
D. SERIAL AND PARALLEL PROCESSES An alternative to considering the letter as a unit in perceptual matches is t o think of the subelements being matched in parallel. I n this case, the line elements would still be separate features of the letter but the various elements would be matched a t the same time. I n the letter matching situation, it is impossible to distinguish between these two hypotheses since there are complex and unspecified correlations between different line slopes which vary with the population of letters in the list. Until recently, there was little reason t o suppose that discriminable attributes of a figure could be dealt with in parallel in a visual matching task. Studies relevant to this proposition have been performed by Egeth (1966), Lindsay and Lindsay (1966), and Nickerson (1967). These studies have compared physical identity matches when only one attribute of the stimulus complex was relevant (e.g., form) with matches when more than one attribute was relevant (e.g., form and color). The attributes used were elements of the overall pattern in the sense that they represented quite discriminable aspects which could easily be separated from the overall configuration (e.g., color, size, and so on). Egeth (1966) showed that the time for “same” matches tended to increase with the number of relevant attributes. It was also shown that “different” RTs were a decreasing function of the number of aspects of the stimuli that were different (Nickerson, 1967). Taken together, these findings led to the view that separate aspects of a figure were processed serially. Lindsay and Lindsay ( 1966) studied matching of some figures that occurred relatively rarely and others that occurred more often. Their findings suggested that relatively unfamiliar patterns were matched serially, attribute by attribute, but when a figure appeared frequently it could be matched as a whole. Thus, they suggested that serial processing was correct, but that the unit of processing changed with experience. Hawkins (1967, 1969) has performed an extensive series of investigations using simultaneous matching of stimuli that could vary in form, size, and color. He eliminated a problem that had arisen in Egeth’s experiments by never having irrelevant attributes. He compared matching of stimuli with only one attribute with matches in which there were two attributes. Moreover, he was careful to eliminate correlations between the attributes so that, in the case of a two-attribute match, uncertainty concerning the state of one attribute was not reduced by information gained about the other attribute. His results are shown in Table I. The left side compares RTs for matching color alone, size alone, and color plus size. The data are shown individually for four Ss. It is clear that the time t o respond “same” to the combined attributes
55
Abstraction and the Process of Recognition
is no longer than for the greater of the two individual attributes. Similar data for form and size are shown in Table I (right side). Here there is a small increase in the time required to handle the attributes together, but much less than would be expected by a serial model. Control conditions were run in order to show that the addition of a second attribute did not decrease the discriminability of the individual attributes. Taken by itself. this evidence shows clear support either for considering color plus size as a unit or for parallel processing of the two separate attributes. TABLE I
SAMERTs
AND
ERROR RATESFOR SINGLEAND DOUBLE-ATTRIBUTE SIMULTANEOUS VISUALMATCHING”
I
Attri butc ________-
’S
Color
Size
Color arid sizo
Attribute ~
S
Form
Size
Form and size
1 2 3 4
516 507 490 347 465 2.8
592 532 488 400 503 3.9
606 542 516 402 518 2.8
I
1 2 3 4
x Porcorit crror
360 381 444 373 390 1.0
467 515 562 479 506 4.4
464 514 567 478 506 2.0
it Pcrcorit crror I
a Lcft columris are for color arid size while right columns arc for form and size. Values are givcri i n milliseconds. (After Hawkins, 1967.)
Hawkins also found, in agreement with Egeth and Nickerson, that the number of attributes that differ between the two patterns affects the RJ. Hawkins used this fact to indicate that the matches were not performed as a unit since, if they were, the response “different” would depend only on the most difficult attribute. Thus, the separate attributes operate together for a “same” judgment but cannot be considered as a unit for a “different” judgment. If attributes can be matched in parallel, the rate of matching cannof be used to define a unit for all tasks. A letter may give matching RTs no longer than individual slopes, but the slopes may still serve as a subelement of the letter for purposes other than matching. I n terms of matching speed, a letter does appear to serve as a unit, but this need not conflict with the physiological and behavioral evidence, which has been cited, for regarding line slopes as more fundamental constituents in the formation of the letter. Moreover, the parallel processing standpoint makes it less paradoxical that relatively unfamiliar figures (Gibson forms) can
56
Michael I. Posner
be matched as fast as much more familiar figures (letters). The elements of these stimuli may, in fact, be identical. Finally, it will be argued in later sections of this chapter that parallel processing is a frequent characteristic of matching tasks at many levels.
111. Past Experience When a letter has to be named, past experience is surely involved. Sternberg (1967a) suggests that there are two separable aspects to the question of contact between present visual stimulation and stored information. The first concerns operations, such as smoothing or normalization, which may be performed upon the input character to prepare it for contact with stored information. The second concerns the type of representation that serves to store past visual experience with the input character. In order to separate these two components, Sternberg presented Ss with a digit superimposed upon a random noise field. The Ss’ task was to determine if the character was one of a small set of positive instances given earlier. It was observed that the presence of noise affected both the slope and intercept of the function relating RT to the size of the positive set. Sternberg made two inferences from this result. First, prior to the match, Ss abstracted the input character from some of the noise provided by the background (intercept effect). Second, the match involved the visual characteristics of the input character rather than merely its name. This second point was inferred from the effect that the noise had upon the slope. The slope effects were rather small and present only early in training. Nonetheless, Sternberg’s elegant experiment does provide a rationale for separating two components of recognition. In this section, these two aspects of recognition are taken as a starting point for the development of experimental techniques to study each in isolation. I n Section III,A, operations relevant to varying input characteristics are reviewed. In order to study operations that might be involved in preparing the input for contact with stored information, rate of matching is observed as a function of the similarity between two simultaneous input characters. I n Section III,B, experiments are presented that attempt to understand the functional characteristics of the representational system that stores information used to identify new visual input. These experiments involve identification of patterns never before seen.
A. ANALOG OPERATIONS In Section 11,it was argued that Ss could match two identical letters prior to obtaining their names. The same seems to apply to two letters that are not identical but highly similar. such as Cc. The name of the letter C is no more familiar than is the name of the letter A. Accordingly,
Abstraction and the Process of Recognition
57
it would not be expected that RTs for Cc would be faster than for Aa unless the match were based on something other than the name. Experiments (Posner & Mitchell, 1967) show that Xs instructed t o respond “same” t o two letters having the same name respond much faster to a highly similar pair (e.g., Cc) than to one that is not very similar (e.g., Aa). Moreover, the time t o respond t o a pair like Cc is reliably longer than for its physically identical controls (e.g., CC and cc). This might be explained by supposing that a similar pair is matched sometimes as if it were identical and sometimes as if it had only the same name. However, the distribution of RTs appear to rule this out. Since pairs like Cc seem to be matched slower than physically identical pairs and faster than those having oniy the name in common, they have been called analog matches (Posner & Mitchell, 1967). The idea is that analog matching depends upon operations like size variation or rotation, which can be performed within the visual system and need not require contact with past experience. With letters it is not possible to vary similarity systematically. However, by substituting nonsense material for letters, one can show that the speed of visual matching is a function of the degree of similarity between members of the stimulus pair. Posner and Mitchell (1967) illustrated this with a study using Gibson forms. Gibson forms are simple figures which have the general characteristics of letters, but are not familiar (Gibson, 1965). The 8 s were taught to call pairs of Gibson forms by the same name. Three stimulus pairs were similar, differing in continuity, size, or rotation, respectively, while the fourth pair was unrelated. I n the case of differences in size and rotation and the unrelated pair, the “same” RTs were significantly longer than their physically identical controls. I n general, the “same” RTs seemed t o increase as the similarity within the pair decreased. This suggests that the “same” R T for a pair of figures to which X has learned t o give a common name is a function of their degree of physical similarity. In order to obtain a quantitative relationship between similarity and RT, Posner (1964a) taught 8 s to call pairs of nonsense dot patterns by the same name. The patterns varied systematically in similarity. They were created by applying a statistical rule to one member of the pair in order to create distortions 01 varying degrees from the original (Posner, 1964b).3After Ss had learned the names t o a rigid criterion, The prototypes consisted of eight randomly placed dots in a 64-cell matrix. The distortions ranged from exact similarity (0 bits/dot) to unrelated ( 6 bits/dot). The distortion rules resembled a random walk. The uncertainty calculations are based upon the probability of a dot moving to each cell and are related to the logarithm of the average distance moved by a dot over a sample of distortions (see Posner, 1964b; Posner et al., 1967).
Michael I. Posner
58
pairs of patterns were presented simultaneously in a “same-different’’ R T task. The degree of similarity was linearly related to RT in the range from 0 to 4 bits/dot, and then leveled off. This relationship is indicated in Fig. 3. Why are patterns that are similar matched more rapidly? One reason might be that the names are learned better during the learning
OEXP I EXP II
0
I
2
3
4
5
6
Level of distortion(bits/dot)
FIG.3. RT for responding that two dot patterns have the “same” name as a function of the similarity (level of distohon) of the two patterns. The test follows a learning procedure in which Ss are taught to associate a common name with pairs of patterns. (After Posner, 1964a.)
task. This is plausible in the dot pattern study, but must be viewed in light of the data obtained with Cc in the experiments cited above. It would be hard to argue that forms C and c are better learned than A and a. Yet the same relationship between RT and similarity is present as in the dot patterns. It seems reasonable, therefore, t o suppose that the similar patterns are matched by an analog process like that discussed for letters. This process is performed on the present input rather than upon stored material. Presumably, the criterion for accepting two
Abstraction and the Process of Recognition
59
patterns as “same” depends upon the general similarity between members of pairs assigned a common response during learning. Next consider the patterns that are completely distorted ( 6 bitsldot). These patterns are no more similar to each other than patterns forming pairs for which “different” would be the correct response. How can they be matched? One possibility is that through the learning process they come to look alike. This was checked by having Ss make psychophysical ratings of the similarity of patterns which either were or were not given the same name during the prior learning task. There was no difference between the ratings, indicating that patterns given the same name did not come to appear more similar. Thus, if the patterns are not matched on the basis of similar appearance, they must be identified first and the identifications matched. It is not clear a t exactly what level the identification is made. However, since it must rest on material that has been stored in memory, it will be considered as equivalent to a “name” match although it might be based upon some association other than the name (Sternberg, 1967a). It seems necessary, therefore, to separate two processes that are used to match nonidentical patterns. The analog processes involve operations on the input information only, while the name matches depend upon stored information. It remains to be determined how these two processes operate. One possibility is that Ss always attempt to match by an analog process first and only then turn to identification (serial model). Another possibility is that both proceed simultaneously (parallel model). Many sub-versions of each model are possible. The parallel model is rather attractive because it helps to explain why Fig. 3 levels off a t an intermediate degree of similarity. Suppose Ss attempt both analog and name matches. As the patterns become less similar the analog matches increase in RT, but they have as an upper limit the time necessary for Ss t o perform a name match. It is not clear why the function should appear to go down, but the down turn is not significant. The criterion used earlier to support a parallel model of simultaneous matching was that two attributes could be matched as fast as a single attribute. In Section II,D, it was shown that this could occur in some simultaneous matching tasks (Hawkins, 1967). Applying this logic t o physical and name matches. Posner and Mitchell (1967) have shown that elimination of the name matches has relatively little effect on the physical identity matches. The effect that was present might be explained on the basis of a general increase in RTs when the overall task is increased in difficulty (Gottsdanker, Broadbent, & Van Sant, 1963). Either a serial or parallel model could accommodate this result. Recently, however, an experiment was designed to test the reverse. The time to make a name match was measured, using lists that either contained or did not contain
Michael I. Posner
60
physical identity pairs. The results shown in Table I1 indicate that the average time to respond “same” at the name level was only 12 msec faster for a list that had no physical matches than for one in which there was an equal number of physical and name matches. The 12-msec difference was not significant. Moreover, the “different” trials which were identical in the two lists were actually somewhat longer in the pure name list. Error rates for the “different” responses were 8% for the pure list and 8.4% for the mixed list. TABLE I1
MEANRTs
FOR
SAMEAND DIFFERENT RESPONSES“ Mixed list Same
Pure list
s
Same
Different
Physical
Name
Different
619 626 612 575 626 624 499 690 609
652 716 658 625 607 672 502 692 640
564 536 585 502 536 548 425 570 533
615 663 687 557 591 662 52 1 674 621
622 665 650 600 563 656 490 672 615
Pure lists contain only name identity trials. Mixed lists contain half physical identity and half name identity “sames.” Each RT is based on the mean of a t least 20 trials. Values are given in milliseconds.
I n the study reported above, the mixed list contained only physically identical and name pairs. I n an earlier study (Posner & Mitchell, 1967), lists were used with and without analog matches (e.g., Cc). The results also showed no effect of the presence of analog matches on the name RTs. These studies indicate that physical and analog matches are not a serial stage through which Ss much pass prior to name matching. It seems reasonable to conclude that the two kinds of matching are essentially parallel processes with their own time constants. This conclusion will have important applications in Sections IV and V. We have now considered two processes of recognition which go beyond the matching of identical patterns. The analog matches involve two physically present forms which are equated entirely on the basis of their visual similarity. The name matches require that a form be identified on the basis of past experience. The latter task presumably involves
Abstraction and the Process of Recognition
61
matching the input form with stored information acquired from past experience with letters. The stored visual information is in turn associated with a name. The next section considers studies of the way in which such stored information is obtained and represented in memory. B. SCHEMA FORMATION People are exposed t o dozens of different visual letter patterns that have the same name. When they encounter any of them, or a new version never before experienced, they can usually provide the name. This human capacity poses two related problems. First, how can man store so many different experiences in a way that will provide economic use of what must be a finite memory capacity (Oldfield, 1954)?Second, how do people use this stored information in the process of pattern recognition (Uhr, 1966)? Attempts to answer questions of this general type have a long history in philosophy (Price, 1953) and psychology (Reeves, 1965). John Locke speculated on the capacity of the mind to abstract from separate experiences a composite representation which stood for the individual instances. This doctrine of abstract ideas has frequently been criticized. For example, Bishop Berkeley criticized this notion based on his inability to imagine a triangle that was neither equilateral, isosceles, nor scalene, but all of these and none a t once. Berkeley’s argument challenged the position that there is an image that serves t o represent the abstract idea. Nevertheless, the idea of abstract representation in some form has persisted. The neurologist, Henry Head, was among the first to recognize the importance of a representational system for storing information that took account of individual perception but was not identical with it. Head’s notion was based primarily upon evidence from mechanisms of postural adjustment and was not subject t o the criticism Berkeley had raised. Nevertheless, it seems that Head’s notion had its origin in the philosophical doctrine of abstract ideas (Riese, 1965). Head named his abstractive construct a “schema” and Sir Frederic Bartlett introduced the idea of schema formation into psychology in his classic book, Remembering (1932). Many writers have attempted to work within the general framework that Bartlett developed. Prominent among them was Oldfield (1954), who clarified the idea of schema formation and provided an account of how its use would provide an economical system for storage of information. Oldfield proposed that the schema represented the commonalities among successive presentations of stimuli and that retention involved storage of these commonalities plus the departures that were characteristic of the individual instances. Thus, those aspects of experience in
62
Michael I. Posner
common among separate percepts need not be stored on each occasion. Despite the importance of this notion, Oldfield himself remarked upon the difficulty of converting it into laboratory operations. Only in the last several years have there been any concerted efforts to do so. 1. Evidence for Schema Formation Most studies of schema formation have used some form of random visual patterns. The basic pattern can be called a prototype. A number of transformations of the prototype are constructed, either by applying systematic operations (e.g.,rotation, reversal) or by some form of random walk of the points that deforms the original. I n the latter case, the prototype represents the central tendency of a set of distortions with which S may actually be presented. This basic method has employed nonsense polygons (Attneave, 1957), metric figures (Fitts, Weinstein, Rappoport, Anderson, & Leonard, 1956), and dot patterns (Posner, Goldsmith, & Welton, 1967). A number of studies have shown that Ss can learn to discriminate patterns that are distortions of one central tendency from sets that are distortions of another. For example, Evans and his colleagues have shown that Ss are able to learn to separate patterns of one central tendency from those of another and can do this even without receiving knowledge of results (Evans, 1967; Edmonds, Mueller, & Evans, 1966). I n order to obtain a quantitative analysis of the learning of sets of distortions, Posner et al. (1967) used nonsense and meaningful dot patterns varying in level of distortion from an original prototype pattern. As the level of distortion increased, the similarity of the patterns to the prototype and to each other d e c r e a ~ e d Figure .~ 4 shows two nonsense prototypes (upper row), two 5-bits/dot (middle row) and two 7.7-bitsIdot (lower row) distortions of prototype A. The 7.7 distortions are more dissimilar to the prototype and to each other than are the Level-5 distortions. I n order to obtain some idea of the perceived distance between a prototype and its distortions, psychophysical scales were obtained relating the level of distortion to magnitude estimates of perceived distance. The solid line in Fig. 5 indicates the relationship between level of distortion and perceived distance of the distorted pattern from the prototype. The scale shown on the abscissa of Fig. 5 includes the range from complete identity (0 bits/dot) to complete lack of relation (9.6 bits/dot).The 7.7-bit distortions have a mean distance of nearly 75 units. A good estimate of the distance between any two distortions of the same prototype can be obtained simply by using the perceived distance of the more distorted member of that pair from the prototype (Posner, 1966). Thus a 7.7-bit distortion will be about 75 units distance both from the The prototypes consisted of nine dots randomly located in a 900-cell matrix.
Abstraction and the Process of Recognition
63
origin and from the mean of other 7.7-bit distortions of the same prototype. These patterns can be used to compare the rate of learning of tight concepts, in which the patterns are highly similar (low levels of distortion from the prototype), with the learning of loose concepts, in which the patterns are quite different in appearance (high levels of distortions). I n one such experiment (Posner et al., 1967), Ss learned to classify three distortions of each of four different prototypes. Each distortion of a Prototype A
Prototwe B
I
L
. I " =.
I
5 A
.
FIG.4. Two random-dot pattern prototypes (upper row), together with Level 5 bits/dot (middle row) and Level 7.7 bits/dot (lower row) distortions. These patterns are formed by use of distortion rules (Posner et al., 1967) and are similar to those used in studies described in the text.
prototype was associated with the same overt response in a pairedassociate learning task. The dotted line in Fig. 5 gives the number of errors to criterion during learning as a function of the level of distortion of the patterns that S saw. When the patterns were a t low levels of distortion, so that they looked similar to each other, learning was fast but it became increasingly difficult as the level of distortion increased. It is of interest to contrast the linear relationship between level of distortion and perceived distance with the positively accelerated learning
Michael I. Posner
64
function (both shown in Fig. 5). Small differences in similarity can be noted and enter into similarity ratings, but they have little effect when Ss are trying to learn the classification. The ability of human subjects to learn to identify patterns that are instances of different central tendencies does not it,self provide evidence that the learning of such classifications involves the abstraction of a schema. Somewhat more direct evidence in favor of the unique role of
I40
120
100
-Similarity ratings o--* Leorning a r e s
8o
s
._ 8 c ._ b 60 ,o
e e
5 40
20
0
I
0
I I I 2 4 6 Level of distortion (bitsldot)
I
8
)
FIG.5 . Median estimates of perceived distance from the prototype (solid line and left ordinate),and rate of classification learning (broken line and rignt ordinate) as a function of the level of distortion. (After Posner et al., 1967.)
the schema or central tendency has been shown in studies by Attneave (1957) and Hinsey (1963). They demonstrated that pretraining on the schema (prototype) of a set of patterns could facilitate later pairedassociate learning of those patterns. Hinsey (1963) further showed that pretraining on the prototype pattern is superior to pretraining on one of the peripheral patterns. Thus, in these studies there appears t o be something unique about prototype or schema patterns which facilitates learning the entire set.
Abstraction and the Process of Recognition
65
One series of studies (Posner & Keele, 1968, 1969) has sought to investigate this question rather directly. The stimuli were nonsense pattterns consisting of nine dots of the same type as those shown in Fig. 4. Subjects learned t o associate four different distortions of each prototype with a single key-press. This was done by a standard paired-association technique (Posner & Keele, 1968). The Ss were then transferred to a list of patterns which consisted of the following : prototypes they had never seen before, old distortions which they had just finished learning, and control patterns which were within the learned category. Some of the control patterns (Level 5 ) were selected so that their distances from the stored patterns were approximately equal to the distance of the prototype from the stored patterns. If one considers the stored patterns as the circumference of a circle, the prototype would be TABLE I11 PERCENT ERRORS AND RT TO CLASSIFYTRANSFER PATTERNS OF VARYING TO CLASSIFYA SET OF DISTORTIONS TYPESAFTER LEARNING
Percent error RT (seconds)
Memorized patterns
Schema
New level 5
New level 7.7
New patterns
13 2.01
14.9 2.28
26.9 2.53
38.3 2.87
3.21
in the center of that circle. Since the patterns differ from one another in many ways, the space is actually multidimensional. It is possible, therefore, to find patterns that have the same mean distance from the four stored patterns, as does the prototype, but which are not themselves the prototype. These patterns serve as controls. Thus, in terms of similarity to each of the four stored instances taken one at a time, the control patterns are equal to the prototype. The difference between the control patterns and the prototype is that the prototype tends to share the particular features that are common to the set of individual instances. Thus, it represents commonalities among the memorized patterns which comprise, in Oldfield’s terms, the “schema.” The results of the transfer task obtained in our study are shown in Table 111.Two features of these data are particularly striking. Of primary importance is the fact that the prototype patterns are correctly classified significantly more than any of the control patterns, even those that have been selected to represent, the same distance relationships. Thus, whatever process underlies the classification of the prototype patterns seems to be unique to the prototype and is not characteristic of every pattern
66
Michael I. Posner
within the learned categories. This indicates that the process of classifying patterns does not rely solely upon the distance of the new pattern from a particular stored exemplar. Instead, it depends upon the distance of the new pattern from the category of stored information that represents all the exemplars. I n this. study, the prototypes are classified, on the whole, about as well as the patterns that S actually has memorized. This last point has not been true in all studies using these materials and seems to depend heavily upon the learning process and upon the particular set of patterns sampled. The data show that the prototype or schema pattern has a higher probability of correct classification than other new patterns within the learned concept. While this suggestion is consistent with the idea of stimulus generalization, it is more explicit. It singles out the prototype of the pattern as unique. There are a t least two ways to explain the superior classification of the prototype. One possibility is quite consistent with the theoretical notion of schema formation. It suggests that the abstraction process that underlies classification of the schema occurs during learning. The second process is more consistent with the storage of individual traces. It proposes that the schema is recognized through the mediation of the individual stored patterns. One way of distinguishing between these two theoretical positions is to observe what happens to the memorized patterns and to the schema classification over time. Experiments of this type have been run (Posner & Keele, 1969; Strange, Keeney, Kessel, & Jenkins, 1968). These studies were carried out exactly as described above, except that some groups were returned to the experiment after a I-week delay between learning and pattern recognition. The results of these studies indicated that after 1 week's delay the schema pattern was recognized at least as well as the particular stored patterns that the S had learned. This was true in all the studies. Moreover, while the stored patterns underwent a significant loss over the week's delay, correct classification of the schema showed no loss and, in some cases, there was a slight gain. Since this experiment involved both classification of the schema and also remembering which switch is associated with a particular category, it is remarkable that schema classification showed no decline in accuracy over the interval. I n order t o explain why a delay increases classification errors for the memorized patterns but not the schema, it could be argued that Ss classify the schema based upon information from the whole series of stored exemplars. Even if each stored exemplar were noisier as a result of decay during the interval, the overall judgment based on the set could
Abstraction and the Process of Recognition
65
still be reliable. However, analysis of the RTs for classification of the schema does not support the view that such classification is more complex. It seems reasonable that the extraction of information concerning the central tendency takes place during learning and that schema classification is not mediated by individual patterns. Additional evidence on this point is provided in Section III,B,3. Another approach to the study of schema formation has used systematic transformation rules (Gibson, 1965; Pick, 1965). These studies have a different methodology from those described above. The general requirement is that Ss be able to determine whether a pattern is a prototype or a transformation of it (rotation, size change, and so on), Pick (1965) found that transfer of the same transformations with different prototypes was superior to transfer of the prototype with new transformations. Two things should be borne in mind about her results. First, prototype transfer was usually positive when the classification task required memory, but not when a simultaneous condition was used. This is similar t o the distinction made earlier between analog matches based on simcltaneous visual information and identifications that require access to memory. If a match does not require S to identify the pattern there is little reason to expect that familiarity with the prototype will matter. Second, the use of systematic distortion rules (transformations) gives Ss a cue that is not present in the random distortions introduced in the studies cited above. The importance of learning to utilize different rules of transformation is emphasized in the Pick study, but cannot be used to explain the results obtained with random distortions.
2. Role of Variability
The preceding section attempted to show that information concerning the schema pattern is abstracted during the process of learning. However, it is clear that information concerning individual patterns must be stored as well. Otherwise, it would be impossible to explain why old distortions which had been memorized are classified better than new 7.7-bits/dot distortions, although both are the same distance from the prototype. Therefore, it would be improper to characterize the processes going on during learning of sets of distortions as merely the abstraction of the prototype or schema. What beside the schema is involved in pattern recognition? Attneave (1957) suggested that Ss learn the relative variability or distribution of the exemplars as well as the schema. Recently, Dukes and Bevan (1967)
68
Michael I. Posner
compared a group that was given four repetitions of a single facial pose with one that saw four different poses of the same face. They found that the repetition condition fostered recognition of the learned pose, but the high variability group did better in recognizing new poses. This study suggests that variability aids in pattern recognition. However, since there was no control of the distance of the new poses from the learned pose, this could have resulted from an increased probability of being shown a pose similar to one of those previously learned. The variability of instances during the learning task may have two quite different effects. First, it may vary the efficiency with which the common features are abstracted. Second, it may vary the criterion concerning which patterns should be classified as instances of the category. Podell (1958) tried to separate the role of variability in these two processes. She taught Ss to classify stimuli varying around a single prototype. Her conditions included high variety (12 patterns) and low variety ( 2 patterns), and two sets of instructions. I n the active instructions, Ss were told to look for common features so they could classify new patterns. In the unintentional instructions, they were told to rate the pattern on the basis of aesthetic appeal. After learning, S s were required to write out a definition of the common elements. With the unintentional instructions, low variety led to more recall of common elements than high variety, but the reverse was true with the active set. In addition, low variety led to a good discrimination between old and new patterns, but high variety did not. In the Dukes and Bevan (1 967) and Podell ( 1 958) studies, variety was manipulated by the number of different stimuli presented. There was no control of the degree of similarity among the instances, or between the stored instances and the prototype or the new patterns. Perhaps this helps to explain conflicts involving the role of variety in subsequent pattern recognition. Posner and Keele (1968) manipulated variety by changing the level of distortion from prototype dot pattJerns. The prototypes were in the form of a triangle, the letters F and M, and a nonsense pattern. Subjects were taught either a low variety (tight concept) or a high variety (loose concept). After learning, they were transferred to new, severely distorted patterns and required to classify them into one of the four previously learned categories. The new patterns had the same overall distance from the memorized instances regardless of the variety condition. The results of this study showed that the high-variety (loose concept) group did significantly better in transfer than the low-variety (tight concept) group, but it was not clear exactly how the superior performance occurred. It could be argued that the advantage of the loose concept was primarily in the kinds of criteria that Ss set for the admission of a par-
Abstraction and the Process of Recognition
69
ticular pattern into one of the meaningful categories. The use af three familiar prototypes and one random category within the same list could have contributed to this. There was a strong tendency for Ss with the tight concepts to classify patterns about which they were unsure into the nonsense category. This factor may have led Ss learning the tight category to appear less able to make classifications based on their previous learning than in fact they really were. A recent unpublished study (Keele, Fentress, & Posner, 1968) has suggested that one can separate the question of discriminability from that of criterion. In this study, nonsense dot patterns were used as stimuli. Subjects were exposed to four successive distortions of the same pattern. These distortions could be at a high level of variability (7.7 bitsldot) or at a moderate level of variability (5 bitsldot). After exposure to the four distortions, the S received a single test pattern from the same category as the presented patterns or from a new category. His task was to say whether or not the test pattern was a member of the same class as the four instances just seen. The average similarity between the test pattern and the presented patterns was constant regardless of variability condition. The results showed a significant tendency for higher variability patterns to lead to more “yes” responses regardless of whether the test pattern was a member of the class or was not. Higher variability led to a less strict criterion for considering a test pattern as a member of the class. On the other hand, the overall sensitivity of judgments (d’) was greater for the low-variability condition. The value of d’ was 1.13 for patterns of moderate variability (Level 5 ) and only .68 for those of high variability (Level 7.7). This difference was also significant. These results suggest that a major function of increased variability is in changing the criterion for acceptance of a new pattern as a member of the category. I n some situations high variability leads to more correct classifications, but the low-variability learning may still be superior in terms of overall discriminability. One interesting possibility is that the d‘ measure could be related primarily to the ability ofSs t o extract and retain the central tendency, while B (criterion) would be more related to the dispersion of the individual patterns. If this were the case, the experiment outlined above would suggest that low variability aided Ss in abstracting the prototype, but led to a conservative criterion. This agrees with Podell’s data €or the unintentional instructions, but not for the active-search set. It does not seem possible to reconcile all of the data available. However, the use of d’ and B parameters to separate the role of variability in abstracting the schema from its role in setting concept boundaries may provide an opportunity to understand more of the functional details of the trace system involved in pattern recognition.
70
Michael I. Posner
3. Recognition
The data introduced thus far in this section suggests that Ss abstract a representation that is sensitive to the commonalities among the patterns they have classified. They are also able to develop, from experience, a notion of the criterion for acceptance into the category. The abstract idea refers t o the level of processing (see Fig. 1)which serves as a description of previous visual experience with a pattern. It may be characterized by a schema (central tendency) as well as by information from the separate experiences. I n one sense it is similar to what others have meant by a “pattern recognizer” (Selfridge & Neisser, 1960; Uhr, 1966), since it serves as the basis for establishing the identity of new input. On the other hand, it may also serve as an internal representation, which can be activated centrally as well as by new input (see Generation, Section IV,D). I n a functional sense the abstract idea is like the visual gnostic units postulated by Konorski (1967), since this level of processing may serve as a stage both in abstraction and in generation. I n the case of a letter the abstract idea would be tied to the name of the letter, and activation of the abstract idea would serve to produce the associated letter name. I n the absence of a name, activation of the abstract idea may itself serve as “recognition,” as for example, when we “recognize” a face for which we cannot produce the name. What happens when the visual stimulus is something that has never been seen before! Studies of pattern recognition bear on this question. Earlier (Section III,B,l), it was pointed out that the time to classify a prototype that S had never seen before was about the same as the time t o classify an instance that he had memorized. This finding suggests that the prototype is classified directly, rather than being mediated by a series of individual stored patterns. As the distance of the new pattern from the prototype increases, so does the RT. Thus, for patterns that have never been experienced, the efficiency of classification is a function of distance from the schema. A recent unpublished study (Frost, 1968) supports the view that the schema pattern, even though it has not been seen before, is classified directly rather than being mediated by memorized patterns. Frost used the same random-dot patterns that have been discussed previously. The experiment was a recognition memory study (Shepard & Teghtsoonian, 1961). When a pattern was first shown the correct classification was “new” while if repeated the correct response was “old.” Since it is difficult to assimilate the complex random patterns, particularly a t a rate of presentation of 2 seconds per pattern, the number of errors in correct recognition was relatively high. Sets of from 6 to 12 patterns were -,elected from the same prototype and shown consecutively. The prototype pattern was presented only once within its set. Over
Abstraction and the Process of Recognition
71
all conditions the proportion of correct “old” responses was .67. The proportion of “old” responses to new patterns was .27 and to prototypes .66. The prototypes were identified incorrectly as ‘‘old’’with the same frequency (and the same confidence) as patterns that had been presented previously in the list. I n fact, only if the pattern had been presented in the immediately preceding trial was its probability of eliciting an “old” response greater than for the prototype. The recognition memory task requires S to say whether or not he has seen this particular pattern, rather than to classify the pattern into a category. I n the classification task, it is possible that S’s response to the prototype is mediated by information concerning individual memorized instances which fell within that category. I n that case, he would not “recognize” the stimulus in the sense of having a feeling of familiarity. If S could discriminate between a stimulus which was actually familiar and one he knew how to classify, he would have rated the prototype pattern as “new” even though he recognized that it fell within a common category. However, the data indicate that Ss were not able t o make such a discrimination and that the prototype was seen as having occurred before. It should be possible to provide a more detailed analysis of the structural basis of the process relating new input to information abstracted from past experience. A preliminary effort t o do this was included as part of the pattern recognition studies discussed earlier in this section (Posner & Keele, 1968, 1969). Subjects in these studies had learned four categories, each consisting of four 7.7-bitsIdot distortions of random prototypes. After learning and pattern recognition, they were presented with partial information from one of the four prototypes. The partial information consisted of one, three, five, seven, or all nine dots of a prototype. The dots presented were a random sample of the possible combinations. It was hoped that this would provide some insight into the cues that Ss used in obtaining the correct classification. The overall results are shown in Fig. 6. There is a strong linear trend between the number of dots exposed and the probability of correct classifications. The linear relation indicates that Ss can use partial cues to obtain some information about the correct category. This suggests that not all of the information involved in classifying the stimulus is configurational. Some of it must be related t o the position of individual dots, as well as to their configurations. It was somewhat surprising that Ss could do better than chance (25%) when presented with only a single dot. Presumably, this indicates that the density of dots within certain sections of the slide serves as a useful cue in making the classification. Sokolov (1963) has presented a probablistic theory of the process of perception. He suggests that perception can be viewed as a sequential
Michael I. Posner
72
decision process in which each cue serves to reduce the Ss uncertainty about the correct classification. I n his model, each dot can be considered t o be independent. The data obtained from the partial information study are more-or-less consistent with this view. However, if each dot were truly independent the curve shown in Fig. 6 would be negatively accelerated. Inspection 70
65
60
v)
55
0 c
0
-=-
5
50
0 c
?
i2
45
t
0
E
a
40
35
30
25
I
I
0
2
I
I
4 6 Number of dots
I
I
8
10
FIG.6. Correct classification of the prototype as a function of amount of partial information (number of dots). This test follows learning and pattern recognition procedures described in the text. (The line is fitted by eye.)
of the graph indicates that the departure from linearity is in the opposite direction from what would be expected if the function were negatively amelerated. This study is probably too crude to give more than the roughest indication of the cues used in classification. A more thorough analysis would probably have to formulate specific hypotheses about the critical features in recognition and select partial cues to represent such features.
Abstraction and the Process of Recognition
73
4. Type of Representation
The level of processing that represents the central tendency and variability of past visual experience is called in this chapter an “abstract idea.” The philosopher, H. H. Price (1940), has written that the word idea is one of the most pernicious sources of confusion in the literature of western philosophy, for it has meant, among other things, either a concept or a mental image. The term “idea” is used in this chapter in a neutral sense. What is not clear from the data is the form of representation of this information. The description that S stores about the dot patterns is a description of his visual experience, and it can be used to recognize visual information. However, this does not mean that the information is stored in terms of a visual image, that S could see or visualize the sets of dots that represent the central tendency of the pattern, or that the formation of the abstract representation was free from verbalization. Extensive questioning of Ss who ran in these experiments did not reveal anything specific about these issues. Many 8 s reported using verbal rules that were related to the patterns. The rules tended to emphasize position of dots, center of gravity, overall orientation of the figure, familiar subgroups and association with objects. The rules were highly idiosyncratic and some Ss verbalized no rules at all. Ribot (1899) faced almost the same question posed here. He argued that “general ideas,” in his terms, did not require verbal processes for their formation. The basis of his evidence was that animals, idiots, and deaf mutes were able to abstract invariances from their visual environment, as when a dog responds to his “master” in very different contexts. He thus argued that this level of abstraction was simpler than the levels required by language. Recently, Konorski (1967) presented the view that complex pattern recognition units are present within the visual system and can function in isolation from speech units. Perhaps studies of abstraction under speeded conditions can help determine the importance of the verbal processes that accompany learning. It does not appear likely, however, that experiments with the dot patterns will tell us much about the form of representation in memory. I n Section I1 it was argued that, performance experiments can isolate levels of processing in letter recognition which are prior to the naming operations. The letter-matching techniques discussed in Section I1 seem to provide more information concerning the functional system that stores visual information immediately after presentation. I n the next two sections, we return to this technique in order to separate the visual and name levels of representation in memory. These experiments bear, at least indirectly, upon the form of stored representation that may be involved in schema formation.
74
Michael I. Posner
IV. Visual Representation in Memory Section I1 introduced a method for studying letter matching at the physical and a t the name level. The results led to the view that physical matches were not influenced by the letter name. I n this section, experiments are introduced which involve presentation of two letters successively. The difference in RT between physical (e.g., AA) and name (e.g., Aa) matches is used to infer’that a visual memory code of the stored letter is used in making the match.
A. CHANGESOVER TIME I n several experiments (Boies, 1969; Posner & Keele, 1967; Posner, Boies, Eichelman, & Taylor, 1969), a single visual letter was presented and followed from 0 to 2 seconds later by a second letter. The two letters could be physically identical, have only the same name, or be different. The Ss were instructed to respond “same” if the two letters had the same name, or if otherwise, “different.” In the first two studies, S had to move his eyes from the first letter t o the second letter. The time he spent viewing the first letter was up to him, although the instructions encouraged a brief glimpse. In the third experiment, no eye movement was required and the time of exposure was constant at .5 second. The first two studies used a memory drum and normal reading illumination, while later studies used inline displays which produced a bright field. In two of the studies, the time intervals were blocked so that S always knew the delay he would receive on a given trial, while in the other study the interval was randomized. I n still other studies, the interval between the letters was filled with a random black-and-white pattern or with a field of luminance equal to the original exposure. These various manipulations often affected the absolute RTs. For example, the use of an eye movement or interpolated pattern field increased RT, but had little influence on the function relating the difference between physical and name matches to the interval. Figure 7 shows this function for three different studies which represent a wide range of conditions but give highly similar results. The three studies summarized in Fig. 7 were the only ones that covered the range between 0 and 1.5 or 2 seconds with at least three intervals. Studies using only two intervals or a less extensive range gave similar results, but the quantitative agreement was not always as complete as in these three studies. For example, the left side of Fig. 8 presents resulk from a study using intervals of 0, .5, and 1 second and the differences between physical and name matches are somewhat smaller than those shown in Fig. 7. Analyses of “different” RTs show them t o be somewhat similar but longer than the name “same” responses. These are outlined fully in Posner et al. ( 1969).
Abstraction and the Process of Recognition
75
Immediately after presentation of the letter, the match appears to be based on a fully adequate visual code. A physical match is about 90 msec faster than a name match. This result is similar to that obtained with a simultaneous letter pair. This difference declines to nearly 0 after an interval of 2 seconds. Unlike previous studies of visual memory for letters (Sperling, 1960,1963; Keele & Chase, 1967), in these experiinentsss have already extracted the letter name at the start of the retention interval. The presence of a visual code must be inferred from the efficiency of a physical match.
O t
I
I
0
I 5
I
I
15 Interval (seconds) I
I
2
FIG.7. Difference in RT between name and physical identity ‘kame” responses as a function of IS1 between twp successive letters. The solid and open circles involve studies using a memory drum, normal room illumination, and an eye movement between letters. The solid triangles represent a study using an iriline display, .5-st:cond exposure of the first letter, and appearance of the sceond letter in the same spatial position. (After Posner 8: Keele, 1967; Posner et al., 1969.)
Unfortunately, it is not possible to interpret the absence of a difference between physical and name matches as meaning that Ss have no visual information. It indicates only that the visual information is not aiding the match. This could be because it is lost, because it is less accessible, or because it has become too noisy for an efficientmatch. While the difference between physical and name matches shows roughly the same time course as the loss of visual information from tachistoscopic exposures (Sperling, 1960, 1963; Keele & Chase, 1967),the lack of luminance, noise field, and exposure duration effects argues that the visual code studied by the RT method may be quite different than the decaying visual trace.
76
Michael I. Posner
If this visual code is at a higher level of processing than the stimulus trace, its maintenance might require active attention (Posner, 1967). B. MANIPULATINGATTENTION The studies cited above indicated that the presence of a visual noise field during the retention interval did not affect the difference between physical and name matches. Previous work on a different visual memory task produced a similar result. This task involved the ability of Ss to preserve the position of a point on a line (Posner & Konick, 1966). If Ss had to read and record numbers in the interval between observation and recall, performance was little worse than with an unfilled interval. However, if they were required to operate on the numbers by addition or classification, retention was greatly reduced. This finding suggested that the maintenance of a visual code can be closely related to the degree of attention available during the retention interval. An experiment was conducted (Posner et al., 1969) to compare the effects of visual noise and mental processing on the visual code inferred from the RT task. The first letter was present for 1 second. This was followed by a .5-second delay, during which Ss were randomly presented with a blank field, a mask field, or two visual digits. On the trials in which the visual digits appeared, Ss were required to add them and report the sum. The .5-second delay was followed by a second letter, to which Ss responded “same” or “different” as quickly as possible. The results showed that both the mask and the addition task increased the absolute RTs. However, the addition task abolished the difference between physical and name identity, while the noise field did not affect this difference. In a subsequent experiment, Boies (1969) presented a single auditory digit within the .&second delay interval. Subjects were instructed t o add three and report the sum after completing the RT task. On the first experimental day (second day of practice), Ss showed a pattern of results similar t o those obtained with the visual digit. With no interpolated digit, the difference between physical and name matches was about 44 msec. When the digit was interpolated, there was a significant increase in absolute RT, and the difference between the matches dropped to 20 msec. The reduction in difference between physical and name matches approached but did not reach statistical significance (.l >I, > .05). On the second experimental day, the interpolated auditory digit caused no increase in absolute time and no decrease in the difference between physical and name matches. I n this study, Ss reported that they stored the digit and actually performed the addition task after their response to the second letter. Visual noise alone does not affect the visual code. A single auditory
Abstraction and the Process of Recognition
77
digit also may have no effect, at least after practice, while adding visual digits has a clear effect. The critical variable could be the modality to which the interpolated digit is presented, or the overall difficulty of the interpolated task. I n any case, it is clear that a processing task, a t least when the input is over the visual modality, can selectively reduce the efficiency of a match based on the visual code. More research will be necessary to make certain of the mechanisms involved. C. REHEARSAL Are there conditions €or which S is able to maintain visual information more effectively than those that have been shown in previous experiments? Because of the results obtained from studying the retention of position along a line (Posner & Konick, 1966), it was surprising to find that the physical match efficiency was lost so quickly in the matching tasks. One possible reason is that S had relatively little incentive to preserve the visual aspect of the letter as distinct from the name. On one-quarter of the trials, the two letters had only the name in common and on one-half they were different. Thus, only on one-quarter of the trials was the visual code a sufficient basis for making the match. These conditions may have encouraged Ss to attend primarily t o the acoustic level, that is, to rehearse the letter names rather than attend to their visual form. In fact, that explanation corresponded closely to introspective reports. In an effort to find conditions for efficient maintenance of the physical match, pure and mixed lists were compared. I n these conditions, the first letter was always uppercase and the second letter was either always uppercase (pure list) or mixed upper- and lowercase (mixed list).6Physical matches in mixed lists could be compared with physical matches in pure lists. If the pure list provided more incentive for S to attend to the visual level, the physical match RTs for pure lists ought to be better after a delay interval than physical match RTs for mixed lists. The data from two different experiments are shown in Fig. 8. I n both studies, the physical matches in the mixed condition show a rapid increase in RT over time. The name condition is relatively flat over the interval. The physical matches in the pure lists also increase over time, but less than those in the mixed list. The divergence between the two types of physical match is significant. The results of these studies suggest Our experiments have shown little difference in RT for physical matches consisting of uppercase and lowercase letters (Posner et al., 1969, Experiment I). Moreover, within each study it is possible t o use data from delayed physical matches and “different” RTs to determine any systematic bias in favor of upperor lowercase. I n the studies, such differences are small with respect to the effects reported.
Michael I. Posner
78
that Ss can maintain the physical match more efficiently if given incentive. Attention to the visual level serves to maintain the efficiency of a physical match either by improving the clarity of the visual code or by keeping it accessible for a later match. It should be noted that the RT for the physical match in the pure list condition is not flat. It tends to turn up slightly in the first experiment, and more sharply so in the second study. The upswing cannot be attributed t o temporal uncertainty alone since the name match RTs are much flatter but have the same temporal uncertainty. There appears to be a genuine difficulty in maintaining the eficiency of the visual code even when the conditions encourage S t o do so.
550
1
EXP I
EXP II
(name)
400L /
/
0
5
I
I
I
0
5
I I
I 2
Interval (seconds1
FIG.8. R T as a function of IS1 for physical and name matches in mixed lists and physical matches in pure lists. The two experiments were highly similar except that in Experiment I1 longer ISIs were used and these were run in blocks. Both studies used inline displays and .5-second exposure of the first letter. (After Posner et al., 1969; Boies, 1968.)
This difficulty is pointed up by another experiment (Boies, Posner, & Taylor, 1968). One group was instructed t o respond “same” only if
the stimuli were physically identical. Another group was given the usual instructions to respond “same” if the letters had the same name. Inter-
Abstraction and the Process of Recognition
79
stimulus intervals of 0 and 2 seconds were used. It was expected that Ss who could use the physical form as a reliable cue would show a much smaller increase over the delay interval in RT for physical matches. This expectancy was not met. Both groups showed a similar increase in physical matches over time, while the name match RTs for the second group were virtually flat. These experiments indicate that a variety of operations can influence the relative efficiency of visual and name matches during the 2 seconds immediately following presentation of a letter stimulus. With the standard mixed-list condition, the relative advantage of matches based upon physical information declines systematically over a 2-second interval. This decline does not seem to be influenced much by the length of exposure of the first stimulus or by the presence of visual noise during the interval. The decline is affected by the relative attention the Ss can give to the two levels of processing. An interpolated visual addition task which presumably reduces the available capacity for attending to the visual code appears to have more affect upon the visual level than upon the name level. This would be expected if the visual information were extremely susceptible to interruption. When S’s attention is focused upon the visual information by giving him a pure list, he is better able to maintain the visual information in the accessible store at least over 1-13 seconds. The difficulty of maintaining the visual information, even for a period of 2 seconds, suggests that the visual code is highly susceptible to interruption and shows relatively little evidence of consolidation. The differential effects of delay on the various matches argue strongly against any temporal uncertainty explanation of these findings. Moreover, the significant divergenqe of physical matches in mixed and pure lists argues against any interpretation based upon the number of different visual forms (event uncertainty) since event uncertainty differences are fully present at time zero. Another possible explanation for these effects is that Xs set themselves to deal with either the physical or name level. The extreme susceptibility of these effects to interruption and the difficulty of maintaining the relative efficiency of a physical match argues against a generalized “set” mechanism. Two reasons for the difficulty in maintaining the efficiency of a physical match suggest themselves. One possibility is that Ss have difficulty in maintaining attention over a %second interval, and even a brief disruption of attention serves to reduce the efficiency of the visual code. Another possibility is that whatever system preserves the visual code is resistant to continuous activity. This might be somewhat similar to the tendency of a stopped image to disappear from view after a few seconds (Riggs, Ratliff, Cornsweet, & Cornsweet, 1963). Indeed, if one accepts a view like Konorski’s
80
Michael I. Posner
(1967), the properties of neural systems involved in perception and imagination ought to be similar. More experiments are necessary to determine exactly what happens to the visual code over time. One speculation is that the visual code is represented in great detail a t the moment immediately following visual stimulation. However, over a period of seconds it is assimilated into information about the same letter which makes up the stored abstract idea (Section 111,B). As the information is assimilated, it loses the detail and accuracy of the original visual stimulus and instead becomes part of general past experience with stimuli of that type. Attention to the visual information during this critical period may serve to increase the length of the process (Talland, 1965). Perhaps the interval observed in these experiments involves the activity, not of the trace of the particular letter presented on a trial, but of the whole trace system representing the abstract idea of the letter. If this is the case, it should be possible to obtain similar results by presenting the letter name and allowing S to activate the abstract level centrally. This is the topic of the next section.
D. GENERATION The concept of generation (see Fig. 1)refers to S’s ability t o go from a more general code to one of greater specificity. A particularly interesting example is the generation of a visual code from the letter name. The matching task can be used as a means of determining the efficiency with which such a visual code can be generated by S. In order to do this, a comparison is made between auditory and visual presentation of the first letter (Posner et al., 1969). When both letters are visual, Ss can perform matches at either the physical or name level. The extent to which RTs following auditory presentation resemble physical rather than name matches provides an index of the efficiency of generation. I n general, the methodology of these studies was similar to that described in the preceding section. In the first experiment (Posner et al., 1969), the first letter could be either visual (always uppercase) or auditory, and was present for about .5 second. This was followed by a delay interval of about .75 second. The second letter was always visual and could be either mixed, with respect to case (mixed list), or always uppercase (pure list). Some data from the experiment are shown in Fig. 9. I n the visual mixed conditions, the difference between visual and name identity is about 30 msec. This corresponds roughly to values shown in Fig. 7 for delays of .75 second. Moreover the difference between pure and mixed lists for physical matches agrees with the previous rehearsal experiments (Fig. 8). These two findings confirm the presence of a visual code when the first letter is visual.
Abstraction and the Process of Recognition
81
The first evidence related to generation is obtained by comparing the “same” RTs in the pure condition following visual and auditory presentation of the first letter. Figure 9 shows that there was a small difference (about 10 msec) in favor ofthe visual condition, but this is not significant. Moreover, in the mixed conditions, auditory-visual matching is as fast as the physical identity matches. This occurs whether the second letter is upper- or lowercase and despitJeinstructions given to consider the first auditory letter as a capital. Since the visual mixed condition provides evidence in favor of a visual code, it seems proper to suggest that the Ss base their matches upon some kind of visual code in the auditory condition rather than matching at the name level.
420
I
400
-
440
-
300 -
P
>
Different
Visual (name)
-
-
/
/
)Same
/
-AuditoryVisual /Visual
0-
(physical)
360 -
340
’/
/
-
~
Pure
Mixed Second list
FIG.9. RTs for matching a visual probe letter, following .75-second after visual (dotted line) or auditory (solid line) presentation of the first letter as a function of the type of list. (After Posner et al., 1969.)
It was surprising, however, that in the auditory-visual mixed condition Ss were able to produce the match with equal speed regardless of the case of the second letter. To obtain a better idea of what was happening during the interval, the same experiment was run with 0, .5, and 1 second between the first auditory letter and the second visual letter (Posner
82
Michael I. Posner
et al., 1969).As before, the pure lists showed no difference between visual and auditory conditions after a delay of 1 second. For the mixed lists at time zero, both auditory matches are longer than the physical identity visual matches. However, after the I-second interval the auditory conditions are faster than the visual match at the name level, and at least as fast as the visual physical identity match. These experiments indicate that Ss are able to operate upon auditory information to produce highly efficient matches. We have called this process the generation of a visual code. Such generation is accompanied by subjective reports from some Ss that they “expect” or “are looking for,” or, more rarely, “see” some specific visual information in the interval following presentation of the letter name. Perhaps this production of visual information is related to the operation of lower-level visual analyzers, such as those proposed in the pattern recognizer “Pandemonium” (Selfridge & Neisser, 1960).Such models suggest that Ss are able to switch in analyzers which can detect features in visual information, i.e., Ss are able to activate past descriptors which have been built up about the visual information from letters they have experienced. Perhaps this is equivalent to the level of abstract ideas presented in this chapter. From our present data, it is not possible to determine the details of the visual code available as the result of generation. For example, it would be possible for Ss to generate only certain features which would allow them t o distinguish between different letters of the alphabet. These data do not indicate whether the generated visual information has a distinct size or particular color, and so on. The failure to find much difference between upper- and lowercase letters following auditory stimulation indicates that the generated code might be general enough to include both cases for some letters (e.g., Ff). One problem connected with the concept of generation as applied to these studies is the difficulty of obtaining generation under some conditions. IfSs are able to generate, why do they not generate the lowercase when they are stimulated by a visual capital letter? If this were done, one would expect to have no decay function in studies of the type presented in Fig. 7. Whether, under particular conditions, Ss choose to attend to the visual code, or choose to generate it, seems to be something that changes with test conditions. Presumably, the presentation of an auditory stimulus and the necessity for matching it against a new visual input does incline S t o activate his visual code. However, when visual information is presented, Ss do not appear to be inclined either to maintain that visual representation in a mixed list or to produce a visual representation of the other case. A recent study by Boies (1969) indicates that evidence for generation can be found with visual stimulation when S has help in maintaining
Abstraction and the Process of Recognition
83
the letter already received. This study varies the duration of the first visual stimulus rather than the time between the presentation of the two. The time between offset of the first letter and presentation of the second letter is always zero. Boies found that presenting the first letter for .5-1 second gave the usual difference between physical and name identity. However, as the length of the visual presentation is increased beyond 1 second, RT for name identity matches declines markedly while RT for physical identity matches increases slightly. One explanation of this is that in the presence of the visual information about one case, S begins to produce a visual code of the opposite case. These conditions may allow S t o free his processing capacity from maintaining the visual representation in order to allocate attention to development of the code of the opposite case.
E. REHEARSAL AND LONG-TERM MEMORY In this section a series of studies has been presented which indicate that immediately after the presentation of a visual letter Ss have a relatively complete representation of the stimulus that includes visual and name components. For a short period of time, the visual components are sufficiently accessible that Ss can match more quickly when incoming input is physically identical to the stored item than when they have only the name in common. It has been shown that the effects obtained in such experiments vary greatly, depending upon the direction of S’s limited processing capacity attentional system. If his attention is directed to the visual aspects of the stimuli, he shows better ability to maintain the visual representation. If his attention is distracted, the visual representation tends to be more affected than does the name representation. If presentation is auditory, S tends to produce a visual representation by the process of generation. What level of representation is activated both by presentation of a visual letter and by the presentation ofthe letter name? It seems reasonable to speculate that this level of analysis is similar to the abstract idea discussed in the preceding section. The activation of the abstract idea may be by means of current visual input, or may depend upon learned connections from the name level. In either case, the product is an internal representation which is, in some sense, a description of past experience with related visual forms. The properties of this representation would be those attributed to the abstract idea in Section I1,B. Konorski (1967) suggests that the difference between perception and imagination is the occurrence of sensory orientation in the former but not the latter. Activation of the abstract idea by a visual stimulus may also be qualitatively somewhat different than activation from the name.
84
Michael I. Posner
When a new visual input is presented, its full detail might be represented initially. Any small change in the letter would be expected to lead to an increase in RT over an exact match control. However, as the new input is incorporated into the abstract representation, it would lose specificity; thus many patterns of the same general form might be matched with equal facility. When activation is from the name level, the specificity of representation should be equivalent to an already assimilated input. These speculations receive some support from data. The generated representation never does provide matches as efficient as a physical match that follows immediately after visual stimulation (Posner et al., 1969). It is as efficient, however, as physical matches that follow a visual stimulus after a 1 second delay. The first 1-2 seconds seem to be the crucial period during which the details of the prior visual stimulus are assimilated into the overall abstract representation. This does not mean that they are entirely lost, since the abstract idea carries with it data not only on the schema but also on individual patterns (Section 111,B).Regardless of the source of stimulation, activation of the abstract idea is difficult to maintain for longer than a few seconds. These ideas are extremely speculative. However, they do produce some interesting consequences. First, the visual representation of a Ietter is not necessarily a passively decaying trace, but may be an actively created code which requires attention. This is suggested by the role that attention seems to play in the activation and maintenance of efficient visual matches. Second, visual and name components of a prior visual stimulus coexist within the memory system. If a t the time of presentation of a letter the abstract idea of that letter has already been activated, Ss may respond “same” without going to the name level. If the abstract idea is no longer active, they must then go t o the name level. The next section explores the possibility of gaining experimental control over these two components of the memory code. V. Separating the Visual and Name Codes of Prior Stimulation
Some recent studies have attempted to separate characteristics of newly presented visual stimuli from those that have been memorized. Chase and Posner (1965) compared searching a visual array of four letters for a single stored probe with searching a stored array for a single visually presented probe. They found differences both in search rate and in the effects of visual confusability between the conditions. Sternberg (1967b),in a more complete study, found that searching a memorized list was about equal in speed to searching the visual image of an immediately
Abstraction and the Process of Recognition
85
prior stimulus list. In his situation, the search through the image was self-terminating, while the search of the list was exhaustive. These studies suggest that there may be some differences in handling lists stored as visual arrays and lists stored as letter names. This section reports some experiments which seek to use the methods developed earlier in this chapter to understand differences between visual and name codes. I n this technique a visual array is used. Consider the presentation of a four-letter array for 1 second. The S has time to see and name the four letters. He can, therefore, be considered to have a list of the letter names. Previous data presented in this chapter indicate that he may also have a representation of a visual code of some or all of these letters, at least for a brief period. What happens when a probe letter occurs? It is possible that S refers the probe letter t o his visual code, to his name list, or perhaps both. The following experiments use the RT technique in order to separate the visual and name codes. A. MULTILETTERARRAYS The first experiment along this line (Posner & Taylor, 1969) involved the presentation of one-, two-, or four-letter arrays. The letters were always uppercase and were selected from B, H, M, Q , R, and Z. The stimulus arrays were thought to be relatively neutral with respect to visual and acoustic confusability. After presentation of the array, S received a random black-and-white pattern during a delay interval of either 10. 500, or 1500 msec. At the conclusion of that interval a single letter appeared, in the position of one of the letters in the array. The S’s task was to say whether that single letter had the same or a different name than the letter in the array that it had replaced. I n other words, it was necessary for the S to store all of the array in order to be correct, but he needed only to interrogate the array letter in the position of the probe to make his response. The four-letter positions were centered in the visual field and subtended a visual angle of about 2’. If the positions of the four-letter array were numbered from left to right, the single-letter array would always lie in Position 2 and the two-letter array in Positions 2 and 3. The design of the experiment was such that the letters presented at Position 2 were completely balanced over conditions. Thus, a relatively clean comparison could be made between one-, two-, and four-letter arrays with respect to Position 2 . The first question asked of the data was : Does the visual information at Position 2 vary as a function of the number of other letters that had
Michael I. Posner
86
MEAN RTs 2
FOR
AS A
TABLE IV PHYSICAL AND NAME“SAME” RESPONSES AT ARRAYPOSITION FUNCTION OF THE NUMBER OF LETTERS IN THE ARRAY Array length 1
2
4
Interval (msec)
P
N
N-P
P
N
N-P
P
N
N-P
10 500 1500 Percent error”
388 408 453 2
448 446 457 2.5
60 38 4 -
387 403 446 6.8
441 440 460 8.1
54 37 14 -
532 528 557 8.1
587 540 570 8.9
55 12 13 -
Error rates are collapsed over all array positions. P, physical; N, name.
to be stored? The answer to this question is contained in Table IV. The results show that at a 10-msecdelay the difference between physical and name identity RTs for Position 2 is about 50 msec ( p < .01). This is somewhat less than we normally obtain for single letters at time zero. This value declines with the delay interval ( p < .01) but does not vary as a function of the array length. I n Table V, the information from other positions in the array is shown. If one looks for evidence of visual information at these positions, the picture is quite different. The third and fourth letters in the four-letter array show little, if any, visual information as measured by the difference between physical and name identity. This has the effect of reducing the overall advantage of physical matches in the four-letter arrays over that found in the one- and two-letter arrays. For all positions, the difference between physical and name identity declines significantly with the number of letters ( p c: .01). So far, emphasis has been primarily upon the difference between physical and name identity. There is some hint that the number of letters in the array may have differential effects upon the name and visual codes. (See Table IV.) Physical matches tend to be fastest at the shortest delay interval regardless of the number of letters in the array. For name matches, however, increases in array length seem to delay the interval that gives the optimal RT. This could mean that with four letters in the array 1 second is not quite long enough to finish extracting and storing the letter names. If this explanation were correct it would imply that the visual and name codes could be manipulated separately.
Abstraction and the Process of Recognition
87
TABLE V
MEAN RTs
FOR
PHYSICAL AND NAMESAME RESPONSES AT OTHERARRAY POSITIONS‘ Array length
2 Probe position 1 3 4
P
4
N
N-P
P
N
-
-
-
496
560 -
64 -
438 521 535
486 495 550
-
N-P 50 -26 15
Values are given in milliseconds. All data are from 10-msec-delayinverval. P, physical ; N, name.
B. MANIPULATINGTHE NAMECODE In a study by Boies (1969) a single-letter array was used in the matching task. An operation was performed that could be expected to increase RTs based on the name code, while making relatively little difference for RTs based on the visual code. Prior to the presentation of each pair of visual letters, S heard a list of eight letters which he was to recall subsequent to making the match. After speaking these letters, E exposed a single visual uppercase letter for .5 second. Following that, there was a delay of either 0 or 2 seconds, during which a random checkerboard pattern was exposed. At the end of the delay, a single upper or lowercase probe letter was presented. The dependent variable was the time to respond whether or not the two visual letters had the same name. The results are shown in Fig. 10. The dotted lines in the figure represent trials on which there were no letters read prior to the matching task. Notice that the difference between physical and name identity RTs declines over the two intervals from about 75 msec to almost 0. The name match RTs are nearly flat, while the physical identity match RTs show a marked upswing. This confirms the pattern for the individual components of physical and name matches that have been found in most of the other studies in this series. (See Figs. 7 and 8.) The solid lines, however, are quite different. These come from trials in which S’s shortterm memory was filled with eight letter names. Physical identity matches are not affected by this operation. The name identity responses, however, do show a marked upswing.
Michael I. Posner
88
It seems reasonable to describe these results as showing selective interference with matching at the name level. I n accordance with previous sections, it is possible to speculate on the process involved. The first letter activates the abstract representation corresponding to its visual form. This in turn produces the associated letter name which is stored in an auditory short-term memory. When the second letter occurs, it also contacts the abstract idea corresponding to its visual characteristics. If this representation is already active, because of the previous letter, S can make the match a t this level. There is no reason that a match at this level should be affected by storage of the letter names. However, if the probe letter makes contact with an unactivated abstract idea, the letter name must be located and matched with the previously stored name. Since this match must take place within a crowded short-term store, one would expect interference after a short interval.
-I
480
-
460
-
440
-
---4
Memory Nom/ load
0
E
420-
p Physical
k
400
IcaI
-
380 -
360
IIT"
0
2 lntervol (seconds)
FIG.10. RTs for physical and name matches when Ss had to retain eight letter names given prior to the match (memory load) as against normal conditions (no memory load). (After Boies, 1969.)
One important consequence of this finding is the demonstration that the lack of a difference between physical and name RTs cannot, by itself, be taken to mean that the visual code is entirely lost. In the study just reported, increasing the time for a name match produces evidence for
Abstraction and the Process of Recognition
89
the presence of a visual code (physical matches are significantly faster than name matches). This suggests that in the normal situation the visual code must be present after 2 seconds, but provides a less efficient basis for the match than the name code. Other techniques will be necessary to determine the full time course of activation of the abstract level. When one interferes with the name level there is little, if any, effect upon matches based on the visual code. It is of even more interest to ask the reverse question. What happens to the efficiency of name matches when one interferes with the visual code?
C. MANIPULATINGTHE VISUALCODE Two experiments were conducted in order to vary the visual and name codes separately (Posner & Taylor, 1969). An array of three letters was presented for either 1 second (Experiment I) or .5 second (Experiment 11). After termination of the array there was a delay interval filled with a checkerboard pattern. The delay ended with the presentation of the probe letter. The letters which are of interest in this study (target letters) were always an uppercase G, C, or D in the middle of the array. I n Experiment I, these target letters were embedded in either a visually similar context (0 and Q) or an acoustically similar context (Z and V). In Experiment 11, a neutral context (M and R ) was also used. The Ss were not informed of the special status of the target letters, and probes involved both the target and context letters. Each experiments involved 16 Ss working for 4 days, and were similar in all other ways except that in Experiment I1 the percentage of “same” responses was .67 while in Experiment I it was only .50. The trials of interest were the correct “same” RTs to the three target letters when the probe was physically identical and when it had only the same name. The results of both studies are shown in Table VI. It is clear that the RTs for physical identity responses are increased when the array is visually similar. This is shown both by the small difference between physical and name RTs for these arrays and by the increased time for the physical identity responses over their times in acoustic and neutral contexts. These effects are most striking at time zero, when normally a physical match is fastei than a name match. It should also be noted that the RTs for name responses are not longer in the visually similar arrays than in other arrays. There is a tendency for the name RTs to increase over time more with the acoustic context. However, this effect was observed only in the first experiment and is by no means clear from the data. The main result of these experiments indicates that when the letters G, C, and D are embedded in visually confusing arrays, Ss are not able to make an efficient physical identity match. This means either that Ss
Michael I. Posner
90
TABLE VI
MEAN RTs
FOR
PHYSICAL AND NAME SAMERESPONSES AS A FUNCTION OF ARRAY CONTEXT"
Context Visual Delay (msec)
P
N
Acoustic
N-P
P
N
Neutral N-P
P
N
N-P
-
-
-
-
-
-
-
-
-
377 402
409 425
32 23
Experiment I 0
500 1000
561 568 570
576 585 581
15 16 11
524 536 563
562 583 602
38 47 39
Experiment I1 0 500
401 418
408 423
7 5
373 408
406 419
33 11
Values are given in milliseconds. P, physical; N, name.
are not storing the visual code of the target letters as adequately when they are in the similar visual context, or that they are not able to retrieve information as efficiently from the visually confusing array. The latter view seems less likely because we conducted a control experiment in which both array and probe were presented simultaneously. The visually confusing context had no effect on RTs for either physical or name matches in this situation. Thus, it appears that visual confusion acts on the registration or maintenance of the visual code. Since the effects are fully present at time zero (see Table VI), it is not possible to separate the registration of the code from its maintenance. For example, the poorer visual code of the target letter in the visual confusion context could be the result of S's tendency to concentrate more closely on the difficult discrimination between 0 and Q. These results do show that it is possible to manipulate the adequacy of the visual code without disturbing the time for a name match. It follows from this that the stored codes are separate. Otherwise, anything that disturbed the visual code would also affect the time to obtain the names. How do these findings fit with previous data? The presence of the visual code for only a very limited number of letters seems to argue against an interpretation of this code as a passively decaying trace of the letters
Abstraction and the Process of Recognition
91
(Sperling, 1960). It has previously been suggested that the lack of effect of luminance, exposure duration, and noise, and the presence of rehearsal also serve as evidence that this visual code is at a higher level of processing. On the other hand, the data might be consistent with the speculation that this visual code represents act,ivation of the abstract level. (See Fig. 1.) I n order to determine the letter names, all array items must have activated the abstract level. However, the adequacy of the visual code, even shortly after presentation of the array, seems to depend upon a number of factors. In four-letter arrays, Positions 3 and 4 show no evidence of a visual code efficient enough to aid the match. The same is true of the middle letter in a visually confusing array. If the visual code corresponds to activation of an abstract idea, such activation cannot guarantee that it will provide an efficient match, even for the brief period indicated by the decay function (see Fig. 7 ) . Rather, the maintenance of this activity would have to depend upon the number and similarity of other items in the array. One way to conceptualize this is to suppose that formation and/or maintenance of the visual code requires active attention (processing capacity). When only a single letter is present, enough attention is usually available to extend the life of the code for at least a few seconds. As the number of letters increases, attention must be spread over more items and the rate of decay is increased. Similarly, the encoding of visually confusing items might tend to absorb more of the processing capacity than when the items are distinct. A physically identical probe letter would activate the same abstract representation as the array letter. When the activation of the abstract representation by the first letter ends, the efficiency of a physical match is lost. When the probe letter is not physically identical, it activates an abstract representation which is different from any array letter. I n this case, RT would always require retrieval of the name and would, therefore, be unaffected by anything that varied the period of activation of the abstract ideas. This view agrees with the finding that name matches are unaffected by a visually similar context. One finding that raises problems for this explanation is that “different” RT is faster if the probe letter is uppercase than if it is lowercase (Posner & Taylor, 1969). This occurs only with arrays oftwo or more items; thus it cannot result from differences in the discriminability of upper- and lowercase letters. It is possible that uppercase “different” responses rest primarily on the visual code, while lowercase “different” responses rest on the name code. This could indicate that Ss first decide on the case of the letter before making the match, but such a possibility would be inconsistent with the levels of processing outlined in Fig. 1. Another possibility is that the decision about the letter case goes on in parallel with the
92
Michael I. Posner
process of matching. If an uppercase letter fails to obtain a physical match, the “different” response can be made directly, while a lowercase letter would have to be subjected to name retrieval as well.
VISUALAND NAMECODES D. SEARCHING Previous studies of memorized lists of letter names and lists which might be represented as visual codes (Chase & Posner, 1965; Sternberg, 1967b) have involved a comparison of search rates. In the experiments discussed in this chapter, the probe letter was presented in the position of one of the items in the array. While there was a distinct bias toward the left side of the list, this cannot be viewed as a search process in the usual sense. It is more closely related to the difficulty of locating various spatial positions. The relation between number of items in the array and RT is positively accelerated with this technique (see Table IV), while in almost all search studies it is either linear (Sternberg, 1966) or negatively accelerated (Chase & Posner, 1965). Recently, Taylor and Posner (1968)reported a study designed to compare characteristics of searching the visual and name components of previous visual arrays. Their interest was less in search rate than in the order of search. The arrays always consisted of three uppercase letters centered in the field. The letters were either in alphabetical order (ordered arrays, e.g. BCD), or not in alphabetical order (unordered arrays, e.g. DBF). The array was present for 300 msec. After delays of .7, 1.2, or 3.2 seconds, during which the field was dark, a single probe letter appeared. The probe letter was either at the left or right side of the field and either upper- or lowercase. The S’s task was to respond “same” if the probe had the same name as any of the letters in the previous array. The data in Table V I I are RTs from the shortest delay interval only. The data from the longer delays are consistent with these, but less striking. Those trials using an uppercase probe (physical identity) are on the left while those using a lowercase probe (name identity) are on the right. When the probe is physically identical to an array letter, the RTs are fastest when the matching letter is adjacent to the probe. That is, a left probe is fastest when it matches the left-hand array letter, while a right probe is fastest when it matches a right-hand array letter. This effect occurs regardless of whether the arrays are ordered or not, and appears as a significant interaction between probe and array positions ( p < .01). When the probe matches the array letter only in name, a very different picture is observed. For the ordered array, the fastest times are always obtained when the match is to the left array letter (first in the alphabet), regardless of the position of the probe. Thus, only the effect of target position is significant ( p < .01). With the unordered array, there are no
93
Abstraction and the Process of Recognition
significant effects of probe or array position. The S appears to search from left to right when the probe is on the left, but there is no systematic order when the probe is on the right. These results suggest another difference between visual and name codes. The order of search through a stored array is quite different depending upon whether it involves the visual code or the letter names. The tendency for letter names to be searched in alphabetical order is not surprising. However, the finding that RTs based on the visual code increase with physical distance from the probe letter is more difficult to MEAN RT AS
A
TABLE VII FUNCTION OF ARRAY
AND
PROBE POSITION"
Target position
Target position
Ordered Storage Arrays
Probeposition
L
R
ii
L
569
645
607
R
643
588
ii
606
616
L
R
X
L
625
714
669
615
R
608
697
652
611
x
616
705
661
L
R
x
Unordered Storage Arrays
Probeposition
L
R
x
L
588
676
632
L
609
674
641
R
698
633
665
R
689
681
685
x
643
654
649
x
649
677
663
Left column represents physical same RTs while right column represents name same RTs. Upper blocks (ordered)are for arrays in alphabetical order, while lower blocks are for unordered arrays. Values are given in milliseconds. (I
explain. On the one hand, this seems to fit with the idea of a visual code that can be searched in any order. On the other hand, if the probe letter excites an already activated abstract idea, there is no reason to predict that RT will depend upon physical distance from the probe. In previous studies, the position of the probe letter could have been confounded with the efficiency of the visual code for that letter since left array letters
94
Michael I. Posner
might be coded more efficiently. However, in the current study, these are separated. The results appear to mean that the visual representation of the array not only preserves the visual descriptions of the individual letters, but also their spatial arrangement. This finding seems more in accord with a “passive trace” view than an LLabstractidea” account. Whether this finding is unique to an experiment which, like this one, uses a dark interpolated field, or whether it is a general feature of multiletter arrays is still unclear. E. SUMMARY The data introduced in this section support three propositions indicated by the theory outlined in the previous section. First, the visual and name components of the array are stored separately and may be interrogated and manipulated separately. Second, the visual code is not apassive trace, but can be maintained only for a limited number of letters. Third, the time period for which the abstract representation of the visual code can mediate efficient matches is short. This time is briefer for multiletter arrays than for a single letter, since Ss limited attentional capacity is divided in the multiletter task. On the other hand, the data indicate that the coding of a letter array may be more extensive than the visual description (abstract idea) and letter names. It may include the spatial arrangement of the letters as well.
VI. Summary and Conclusions The general structure outlined in Fig. 1 provides a framework for different levels of processing involved in simultaneous and successive matching tasks. Posner and Mitchell (1967) have shown that physical and name identity stages appear as components when Ss are instructed to classify letter pairs as “same” if they are both vowels or both consonants. Thus, the processes discussed in this chapter may be observed as parts of more complex classification tasks. I n the vowel-consonant case, Ss appear to derive the letter name before deciding whether the stimulus is a vowel or a consonant. However, this may not always be the case. For example, we may know that a face is female before we know who it is. Even with letters some classifications may be performed without obtaining the letter name.’ It is clear that the most specific name for a stimulus is not always derived before names that are logically superordinates. The techniques developed in this chapter may be able to throw more light on the psychological organization of such classifications. Studies currently being completed suggest that in the case of the classification letter-digit,Ss proceed to the category classificationwithout first having identified the letter name.
Abstraction and the Process of Recognition
95
Even within the domain of physical and name matches the simple structure of Fig. 1 may be deceptive. I n the course of this chapter, it has been shown that the levels are not joined in a serial chain by obligatory transformations. Rather, Ss can be matching the visual aspects of simultaneous letter pairs and at the same time referring them to past experience in order to obtain the name (Section 111,A). Nor does the construction of a code at one level necessarily obliterate the previous codes. After naming a letter, Ss may still show evidence of having a visual description present. These codes are separately stored, have their own time courses, and can be interrogated independently. (Section IV,V.) Given the lack of a determinate serial linkage, what is the value of the structural levels provided by Fig. 12 Perhaps the main advantage comes in studying the role of independent variables upon the perceptual process. I n Section 11,the effect of familiarity was reviewed. It was found that familiarity did affect the matching task a t the visual level, but only in the integration of successive units, not in the matching of a single unit. Visual confusability among letters in an array affects the eBciency of matches based upon a visual memory code, but not those that require identification of the letter name. Only a detailed account of the structure of perceptual and memory codes within the context of a particular task can provide a reasonable answer to questions concerning the effect of a some variable upon the perceptual process. The failure to provide such a structure leads to the conflicts that have so often attended such questions. The separation of visual memory codes from retention of letter names is an important aspect of current work. Much recent research has focused upon the fate of the visual trace of a letter (Sperling, 1960,1963; Neisser, 1967). A major issue has been whether the trace (icon) is a brief code whose fate was either erasure by visual input or replacement by the letter name, or whether it could serve as the basis of a more permanent visual memory. The status of the visual code studied by the RT method is not completely clear. The rapid decay function (see Fig. 7) and the ability t o search the code as a spatial array (Section V,D) seem to agree with the view that the physical match is based upon the passive trace of the immediate past stimulus. On the other hand, the lack of effects of luminance, duration and visual noise, the importance of attention, and the ability to rehearse and generate the visual code argue that the basis of a physical match is not the trace of the stimulus, but the activation of an abstract code which serves as an internal representation (abstract idea) of that class of prior visual experience. Perhaps some resolution of this conflict is obtained by the view that matching based upon the trace is replaced by matching based upon the abstract code as the prior stimulus is absorbed into the schema of previous experiences.
96
Michael I. Posner
Recent studies (Bahrick & Bouchee, 1968; Dallett, Wilcox, & D’Andrea, 1968) argue that there is a visual component to long-term memory (LTM). Moreover, there is evidence (Brooks, 1968; DeSoto, London, & Handel, 1965) that Ss can generate a visual code that has important objective consequences upon their performance. This evidence indicates that the visual aspects of a stimulus must be represented in LTM in addition to the stimulus name. The experiments reported in this chapter have used letters because of their convenient feature of providing common names to perceptually distinct forms, but the visual storage of information is likely to be of much more importance for pictorial and spatial information without ready verbal labels. For this reason, the studies of dot patterns were introduced to provide evidence about the functional system used to store information concerning past visual input. The abstract idea (Section II1,B) is a means for combining present and past input. It serves t o summarize the visual aspect of experience in the same way that a name summarizes experience a t another level. Having stored the name 2,S may no longer be able to tell whether the past stimulus was “a’) or “A.” I n the same way, having strengthened the abstract idea of “A,” he has lost information concerning the detailed shape of this particular example of-the letter. The idea of a visual memory does not require that Ss store the full details of every visual stimulus; rather, they can be combined and summarized. Section 1113 outlines the functional characteristics of this abstract level of representation. The schema preserves the central tendency of past visual experience and the individual exemplars define a category boundary. However, the dot pattern experiments were unable to provide much detail concerning this system. If experiments on generation of visual codes from letter names tap the same system, they may offer more promise of providing detailed information on the perceptual qualities of the abstract level. For example, it is possible to ask whether a generated visual representation has determinate size, color, orientation, or spatial position. The question can be considered by varying properties of the probe letter and observing their effects upon RT. It remains to be seen if such studies can provide a vehicle for analysis of the abstract level. REFERENCES Attneave, F. Transfer of experience with a class-schema to identification learning of patterns and shapes. Journal of Experimental Psychology, 1957, 54, 81-88. Attneave, F. Applications of i n f o m t i o n theory to psychology. New York: Holt, 1959. Bahrick, H. P., & Bouchee, B. Retention of visual and verbal codes of the same stimuli. Journal of Experimental Psychology, 1968, 78, 417-422.
Abstraction and the Process of Recognition
97
Bartlett, F. C. Remembering: A study in experimental and social psychology. London & New York: Cambridge University Press, 1932. Beck, J. Effect of orientation and of shape similarity on perceptual grouping. Perception & P s y c h o p h y s k , 1966, 1, 300-302. Boies, S. J. Rehearsal of visual codes of single letters. Unpublished master’s thesis, University of Oregon, 1969. Boies, S. J., Posner, M. I., & Taylor, R. L. Rehearsal of visual information from a single letter. Paper presented a t the meeting of the Western Psychological Association, San Diego, May, 1968. Brooks, L. R. Spatial and verbal components of the act of recall. Canadian Journal Of Psychology, 1968, 22, 349-368. Bruner, J. On perceptual readiness. Psychological Review, 1957, 64, 123-152. Chase, W. G., & Posner, M. I. The effect of visual and auditory confusability on visual and memory search tasks. Paper presented a t the meeting of the Psychonomics Society, Chicago, 1965. Cohen, B. H. Recall of categorized word lists. Journal of Experimental Psychology, 1963,65, 368-376. Conrad, R. Acoustic confusion in immediate memory. British Journal of P q chology, 1964, 55, 75-84. Cox, N. Effect of familiarization pretraining with random shapes on same-different judgment times. Unpublished master’s thesis, Carleton University, 1967. Dallett, K., Wilcox, S. G., & D’Andrea, L. Picture memory experiments. Journal of Experimental Psychology, 1968, 76, 312-320. DeSoto, C., London, M., & Handel, S. Social reasoning and spatial paralogic. J . of Personality a d Social Psychology, 1965, 4, 515-521. Dukes, W. F., & Bevan, W. Stimulus variation and repetition in the acquisition of naming responses. Journal of Experimental Psychology, 1967, ?4, 178-181. Edmonds, E. M., Mueller, M. & Evans, S. H. Effects of knowledge of results on mixed schema discrimination. Psychonomic Science, 1966, 6, 377-378. Egeth, H. E. Parallel versus serial processes in multidimensional stimulus discrimination. Perception & Psychophysics, 1966, 1, 245-252. Eichelman, W. H. Letters as units of processing in a visual matching task. Unpublished master’s thesis, University of Oregon, 1968. Evans, S. H. A brief statement of schema theory. Psychonomic Science, 1967, 8, 87-88. Fitts, P. M., Weinstein, M., Rappoport, M., Anderson, N., &Leonard,J. A. Stimulus correlates of visual pattern recognition : A probability approach. Journal of Experimental Psychology, 1956, 51, 1-11. Flavell, J. H. The developmental psychology of J e a n Piaget. Princeton, N.J. : Van Noatrand, 1963. Frost, R. Recognition of prototypes in running recognition memory experiments. Unpublished experiments, University of Oregon, 1968. Gibson, E. J. Learning to read. Science, 1965, 148, 1066-1072. Gibson, E. J., Bishop, C. H., Schiff, W., & Smith, J. Comparison of meaningfulness and pronounceability as grouping principles in perception and retention of verbal material. Journal of Experimental Psychology, 1964, 67, 173-182. Goldstein, K. Language and language disturbance. New York: Brune & Stratton, 1948. Gottsdanker, R., Broadbent, L., & Van Sant, C. Reaction time to single and to first signals. Journal of Experimental Psychology, 1963, 66, 163-167. Haber, R. N. Effect of prior knowledge of the stimulus on word-recognition processes. Journal of Experimental Psychology, 1965, 69, 282-286.
98
Michael I. Posner
Hawkins, H. L. Multidimensional stimulus comparison in a “same-different” reaction time task. Unpublished doctoral dissertation, University of Oregon, 1967. Hawkins, H. L. Parallel processing in complete visual discrimination. Perception &Psychoanalysis, 1969,5, 56-64. Hebb, D. 0. T h e organization of behaviour. New York: Wiley, 1949. Hebb, D. 0. The semi-autonomous process: Its nature and nurture. American Psychologist, 1963, 18, 16-27. Hershenson, M. Stimulus structure, cognitive structure, and the perception of letter arrays. Journal of Experimntal Psychology, 1969, 79, 327-335. Hinsey, W. C. Identification-learning after pretraining on central and noncentral standards. Unpublished master’s thesis, University of Oregon, 1963. Hochberg, J. I n the mind’s eye. I n R. N. Haber (ed.), Contemporary theory and research in visual perception. New York: Holt, Rinehart, & Winston, 1968. Hubel, D. H., & Wiesel, T. N. Receptive fields and functional architecture in two non-striate visual areas of the cat. Journal of Neurophysiology, 1965,28,229-289. Humphrey, G. Thinking. London: Methuen, 1951. Keele, S. W., & Chase, W. G. Short term visual storage. Perception & Psychophysics, 1967, 2, 383-386. Keele, S. W., Fentress, J., & Posner, M. I. Classification of test patterns following exposure to distortions of a prototype. Unpublished experiments, University of Oregon, 1968. Konorski, J. Integrative activity of the brain. Chicago : University of Chicago Press, 1967. Lindsay, R. K., & Lindsay, J. M. Reaction time and serial versus parallel information processing. Journal of Experimental Psychology, 1966, 71, 294-303. Luria, A. R. T h e mind of a mnemonist. New York: Basic Books, 1967. Mandler, G., & Mandler, J. M. Thinking: From association to Gestalt. New York: Wiley, 1964. Mewhort, D. J. K. Sequential redundancy and letter spacing as determinants of tachistoscopic recognition. Canadian Jourmal of Psychology, 1966, 20, 435-444. Miller, G. A., Galanter, E., & Pribram, K. Plans and the structure of behawiour. New York: Holt, Rinehart & Winston, 1960. Neisser, U. Cognitive psychology. New York: Appleton-Century-Crofts, 1967. Neisser, U.,. & Beller, H. K. Searching through word lists. British Journal of P ~ c h O l o g y 1965, , 56, 349-358. Nickerson, R. S. “Same-different’’ response times with multi-attribute stimulus differences. Perceptual and Motor Skills, 1967, 24, 543-554. Oldfield, R. C. Memory mechanisms and the theory of schemata. British Journal of Psychology, 1954, 45, 14-23. Pick, A. P. Improvement of visual and tactual form discrimination. Journal of Experimental Psychology, 1965, 69, 331-339. Podell, H. A. Two processes of concept formation. Psychological Monographs, 1958, 72 (15, Whole No. 468). Pollack, I. Speed of classification of words into super-ordinate categories. Journal of Verbal Learning and Verbal Behatior, 1963, 3, 159-166. Posner, M. I. Information reduction in the analysis of sequential tasks. Psychological Review, 1964, 76, 491-504. (a) Posner, M. I. Uncertainty as apredictor of similarity in the study of generalization. Journal of Experimental Psychology, 1964, 68, 113-118. (b) Posner, M. I. An informational analysis of the perception and classification of patterns. Paper presented a t the 18th meeting of the Internationr; Congress of Psychology, Moscow, August, 1966.
Abstraction and the Process of Recognition
99
Posner, M. I. Short term memory systems in human information processing. Acta Psychologica, 1967, 27, 267-284. Posner, M. I., Boies, S. J., Eichelman, W. H., & Taylor, R. L. Retention of visual and name codes of single letters. Journal of Experimental Psychology, 1969, 79 (Monograph Suppl. l ) , 1-16. Posner, M. I., Goldsmith, R., & Welton, K. E. Perceived distance and the classification of distorted patterns. Journal of Experimental Psychology, 1967, 73, 28-38. Posner, M. I., & Keele, S. W. Decay of visual information from a single letter. Science, 1967, 158,1377139. Posner, M. I., & Keele, S. W. On the genesis of abstract ideas. Journal of Experimental Psychology, 1968, 77, 353-363. Posner, M. I.,& Keele, S. W. Retention of abstract ideas. Unpublished experiments, University of Oregon, 1969. Posner, M. I., & Konick, A. F. Short term retention of visual and kinesthetic information. Organization Behavior & H u m a n Performance, 1966, 1, 71186. Posner, M. I., & Mitchell, R. F. Chronometric analysis of classification. Psychological Review, 1967, 74, 392-409. Posner, M. I., & Taylor, R. L. Subtractive method applied to separation of visual and name components of multi-letter arrays. Acta Psychologica, 1969, in press. Price, H. H. The permanent significance of Hume’s philosophy. Philosophy, 1940, XV, 10-36. Price, H. H. Thinking and experience. Cambridge, Mass. : Harvard University Press, 1953. Reeves, J. W. Thinking about thinking. New York: Braziller, 1965. Ribot, T. Evolution of general ideas. La Salle, Ill.: Open Court, 1899. Riese, W. The sources of Hughlings Jackson’s view on aphasia. Brain, 1965, 88, 811-822. Riggs, L. A., Ratliff, F., Cornsweet, J. C., & Cornsweet, T. The disappearance of steadily fixated visual test objects. Journal of the Optical Society of America, 1963, 43, 495-501. Robinson, J . S., Brown, L. T., & Hayes, W. H. Test of effect of past visual experience on perception. Perceptual avd Motor Ski& 1964, 18, 953-956. Selfridge, 0. G., & Neisser, U. Pattern recognition by machine, Scientific American, 1960,203. 60-68. Shepard, R. N., & Teghtsoonian, M. Retention of information under conditions approaching a steady state. Journal of Experimental Psychology, 1961, 62, 302-309. Sokolov, Y. N. A probabilistic model of perception. Soviet Psychology and P s y chiatry, 1963, 1, 28-36. (English Transl.) Sperling, G. The information available in brief visual presentations. Psychological Monographs, 1960, 74 (11 Whole No. 498). Sperling, G. A model of visual memory. H u m a n Factors, 5, 1963, 19-31. Sternberg, S. High-speed scanning in human memory. Science, 1966,153,652-654. Sternberg, S . Two operations in character recognition : Some evidence from reaction time experiments. Perception & Psychophysics, 1967, 2, 45-53. (a) Sternberg, S. Scanning a persisting visual image versus a memorized list. Paper presented a t the meeting of the Eastern Psychological Association, Boston, 1967. (b) Stevens, S. S. On the operation known as judgment. American Scientist, 1966, 54, 385-401.
100
Michael I. Posner
Strange, W., Keeney, T., Kessel, F., & Jenkins, J. J. The abstraction over time of the prototype from distortions of random dot patterns. Paper presented at the meeting of the Midwestern Psychological Association, Chicago, May, 1968. Talland, G. Deranged memory. New York: Academic Press, 1965. Taylor, R. L., & Posner, M. I. Retrieval from visual and verbal memory codes. Paper presented at the meeting of the Western Psychological Association, San Diego, 1968. Uhr, L. Pattern recognition. New York: Wiley, 1966. Woodworth, R. S. Experimental psychology. New York: Holt, 1938.
NE 0-NONCONTINUITY THEORY' Marvin Levine STATE UNIVERSITY O F NEW YORK AT STONY BROOE STONY BROOK, NEW YORK
I. Introduction ............................................. 11. Probing forHs.. ......................................... 111. The Dynamics of H Testing.. ............................. A. The Four-Dimensional (4-D)Problems ................... B. The Eight-Dimensional (8-D)Problems .................. IV. Discussion ............................................... A. Abstract versus Specific Theory. ........................ B. Conclusion ........................................... V. Appendix ............................................... A. The Stimuli for the 8-D Problems.. ...................... B. The Effect of Oops-Errors upon H Interpretation.. ........ References ..............................................
101 103 105 105 107 122 122 126 127 127 128 132
I. Introduction Noncontinuity theory, it is well known, is a theory dealing with choice responses during discrimination learning. Specifically, it holds that the choices reflect hypotheses that S is testing. It is less well known that noncontinuity theory is today enjoying a revival. Indeed, it may fairly be characterized as the leading theory of discrimination learning by adult humans. Its reemergence, surprisingly, has occurred abruptly in the last 10 years although, as with any historical development, foreshadowings had occurred for a few preceding years. Ten years ago, however, the popular theory of discrimination learning was quite different. The decade of the 1950's saw many major theorists deriving the phenomena of discrimination learning from the application of conditioning theory. Thus, Hull (1950), Bush and Mosteller (1951), Estes and Burke (1955), Green (1958), and Restle (1955, 1958) described discrimination learning as the product of a gradual strengthening via reinforcement of the correct S-R pairs and a concomitant gradual extinguishing of the wrong S-R pa&. The series of papers was capped by the Bourne and Restle (1959) extension of conditioning theory to concept learning and other complex discrimination tasks. The research described and the preparation of this chapter were supported by PHS Research Grant No. MH 11857-02 from the National Institute of Mental Health. Appreciation must also be expressed to Fred Frankel, who developed the computer programs. 101
102
Marvin Levine
Despite the prestige of these several theorists, an alternative view was starting to develop. Bruner, Goodnow, and Austin (1956) showed that Ss’ hypothesis statements were organized to lead to the correct hypothesis; Harlow (1959) derived his error-factor theory from the study of patterns of response sequences ; Levine (1959) systematically analyzed these patterns and suggested that they reflected “hypotheses” ( H s ) in the sense that Krechevsky (1932) had used that term. Further weight was added to the noncontinuity view with the data and suggestion by Rock ( 1 957) that learning was an all-or-none process. The scales were suddenly shifted in the early 1960’s when two of the leading proponents of discrimination as gradual conditioning modified their views. Estes (1960, 1964) began to emphasize the all-or-none features in learning; Restle (1962) switched to an H (he used the term “strategy”) theory of discrimination learning. Restle’s new statement, in particular, provided strong impetus to the movement. The theory, mathematically structured, was unambiguously testable. Furthermore, in keeping with the interests of the conditioning theorists it dealt only with choice responses (as opposed to verbal statements). Finally, the theory had an appealing simplicity. The fundamental assumptions were: ( 1 ) At the start of a discrimination learning task, S randomly samples from a universe of H s and responds on the basis of the H selected; (2) when a response is followed by E saying “right,” S keeps his H for the next trial; (3) when a response is followed by E saying “wrong,” S returns his H to the set and randomly resamples. This last assumption, because it permits S to select the H just disconfirmed, is sometimes called the zero-memory assumption. Bower and Trabasso (1963, 1964) set an experimental foundation for the theory by testing some of the implications of Restle’s assumptions. An important implication concerns the learning curve. X samples either incorrect H s or the correct H . If the former, then the probability that he makes a correct response is, in a two-alternative task, .5. If, after some error, he samples the correct H then he never again makes an error, i.e., he manifests the criterion run of correct responses. The theory, therefore, implies that prior to the final error the probability of a correct response is .5. Note that this implication differs from that of graduallearning or continuity theories, which predict that the probability of a correct response increases as S approaches the final error. Bower and Trabasso aligned all their protocols at the final error and obtained the proportion correct at each successive trial preceding that error. The resulting curve was flat at about .5. Another implication concerns the effects of changing the solution while S is in the precriterion state. According to Restle’s theory, whenever S makes an error he must resample from the total set. His probability
Neo-noncontinuity Theory
103
of sampling the correct H is the same whether this H had been correct throughout the problem or whether a reversal shift had previously occurred. This idea was the basis for the continuity-noncontinuity controversy of an earlier generation (Krechevsky, 1938; Spence, 1945). Bower and Trabasso showed that the response:reinforcement contingencies could be changed when S made an error without delaying criterion performance. They demonstrated this for reversal, nonreversal, and multireversal shifts. 11. Probing for Hs Restle’s theory consists of two parts: (1) an assertion that H s are sampled and that the H determines the choice response-this will be referred to as the basic assumption; (2) statements about the effects of feedback (e.g., “right” or %rang") upon retaining and rejecting H s . These two parts are clearly separable. One can subscribe to the basic assumption but have assumptions other than Restle’s about the effects of feedback. A number of substitutions for the zero-memory assumption has been proposed (e.g., Levine, 1963b; Trabasso &Bower, 1966).Indeed, one might omit any assumptions about the effects of feedback upon H s and, instead, might determine these effects experimentally. The realisation of this possibility formed the core of Levine’s work during the past decade and will be described here. An obvious asset in the study of H-testing would be a probe that permits detection of S’s H at any point in the experiment. A few experiments (Levine, 1963a; Levine, Leitenberg &, Richter, 1964; Richter & Levine, 1965) suggested that a series of consecutive “blank,” Le., no feedback, trials might serve as the needed probe. These experiments led to the following blank-trials assumption : S responds according to a single H during a series of blank trials. If, for example, S’s H a t some point in a problem is “choose the larger stimulus” and if, starting at that point, the E says neither “right” nor “wrong” for a series of trials, thenS selects the large stimulus consistently for those trials. If this assumption is correct then it should be possible t o infer S’s H from his response pattern during a series of blank trials. Consider Fig. 1. At its center are four pairs of typical discrimination stimuli. Each pair varies along four dimensions-size, letter, color, and position-and is presented on a single trial. Let us make the assumption, the rationale for which will be explained later, that the full set of H s from which S may sample corresponds to the eight values of the dimensions. That is, S may sample the H s “larger” (abbreviation for LLchoosethe larger stimulus”) or “smaller,” “black” or “white,” “X” or “T,” or ‘(left” or “right,” and nothing else. Given this representation and the blank-trials assumption, if the four stimulus pairs a t the center of Fig. 1are presented
Marvin Levine
104
in four consecutive blank trials then only eight response patterns should occur. These patterns, shown in the eight columns of Fig. 1, are manifestations of the H s indicated at the top of each column. Thus, the singlealternation pattern in the first column indicates that S is holding the H “black,” since only the black attribute is correlated with this pattern during these four trials. Such inferences require not only the blank-trials assumption but also the assumption that these eight H s exhaust, the set of all possible Hs. If, for example, sequential H s were within the set, then it would not be clear whether the single-alternation pattern manifested the H “choose black” or “alternate positions.” I n principle, one could infer the H from a set of any finite size c,ontaining H s of any sort. I n a subsequent section, an experiment assuming 16 H s will be described. One need only present enough blank trials so that each H be manifested in a unique response pattern. The H set, however, must always be finite and its composition known to the E . This assumption about the makeup of the total H set from which S samples will be called the composition assumption.
r
H
H
FIG.1. Eight patterns of choices corresponding to each of the eight H s when the four stimulus pairs are presented on consecutive blank trials.
Suppose the composition assumption were that S’s set consisted of the eight Hs in Fig. 1. Two procedures have face validity for meeting such an assumption. One could first give S a large number of long problems whose solutions were randomly selected from only the eight Hs. One could extinguish, so to speak, the selection of all other Hs. This would be the obvious procedure with a nonverbal organism. With the adult human, one can largely short-cut this procedure with instructions : Tell S that there will be only eight solutions and state what these will be. A combination of these two procedures has been most effective with the adult human. The probe sequence of blank trials can be short. If the composition
Neo-noncontinuityTheory
105
assumption is that there are eight H s , only three blank trials are needed to yield eight different response patterns. In Fig. 1, for example, only the first three stimuli are required to produce eight unique patterns. If 16 H s are assumed, then four trials are needed. I n general, if 2" H s are assumed, then n trials can yield 2" different response patterns. There is an advantage, however, in having one trial more than the minimum number. The additional trial doubles the number of possible response patterns. Half of these will be manifestations of H s defined by the composition assumption, the other half will be patterns not stipulated by that assumption. To continue with the exaniple in Fig. 1, the four trials allow 16 different response patterns. The eight-response patterns which have been omitted from Fig. 1 are all 3-1 patterns, i.e., permutations of three responses to one position and one to the other. A 3-1 pattern, because it does not correlate perfectly with any stimulus attribute in Fig. 1, would not reflect an H specified by the present composition assumption. Because half of the response patterns are consistent with the theory and half are not, a simple null hypothesis is provided : If the theory is utterly wrong, if the true state of affairs is far from that described in the assumptions above, then the probability of an H-pattern appearing is .5. One blank trial beyond the minimum, therefore, provides a useful basis for evaluating the theory. To recapitulate, the theoretical equipment necessary to probe for H s consists of three assumptions : (1) the basic assumption : S samples from a universe of H s and responds according to the H selected; (2) the blanktrials assumption: S responds according to a single H during a series of blank trials ; (3) the composition assumption : the universe of possible H s is finite and is known to E .
111. The Dynamics of H Testing The next sections will describe two experiments employing blank trials as probes. Both experiments provide an analysis of short discrimination problems presented to college students. The chief difference between the two experiments is that the first used four-dimensional problems (cf. the stimuli of Pig. 1) and the second used eight-dimensional problems. The first has been treated at length (Levine, 1966) and will only be summarized here ; the second will be described in detail, and an extensive description of the procedures and materials provided. A. THEFOUR-DIMENSIONAL (4-D) PROBLEMS In the initial experiment, 80 AS'S received 16 experimental problems of the sort schematized in Pig. 2. Three feedback (F)trials were each followed by four blank trials, with a fourth F trial concluding this 16-trial problem. The first three F trials always defined the solution. That is,
106
Marvin Levine
the stimuli were such that no matter which responses S made, the three outcomes logically disconfirmed the seven incorrect Hs. Prior to the presentation of these 16 16-trial problems, Ss were instructed about the four dimensions and the eight corresponding solutions TRIAL
FIG.2. Schema of a 16-trial problem showing the feedback trials ( F , )on which E says “right” or ‘‘wrong’’and the blank trials from which the H s are inferred.
that might occur. They also received four longer preliminary problems which illustrated these solutions and which introduced the Ss to blank trials. The chief findings were : (1) The H patterns appeared on 92.47; (3550 out of 3840) of the fourtrial probe tests. (2a) If E said “right” (+) following a four-trial probe (i.e., on F, or F3),the probability that the same H would appear on the next 4-trial set was .95 ( N = 1112). Using the symbols of Fig. 2,
P ( H i = H i - l \Pi = +) = .95. This provides reasonable validation of Restle’s assumption that S keeps his H when told “right.”
Neo-noncontinuity Theory
107
(2b) If E said “wrong” (-), the corresponding probability was .02 ( N = 1027), i.e.,
P(Hi= Hi-, I Fi = -)
= .02.
This result contradicts Restle’s third assumption, that S first returns the disconfirmed H to the set and then randomly resamples. Such an assumption would require the disconfirmed H to be resampled with a probability of. 125. It appears, rather, that S does not return the disconfirmed H to the set before resampling. Comparing the results in (2a) and (2b) reveals a curious asymmetry. When told “right,” the S apparently forgot his H 5% of the time; when told “wrong,” this happened only 2% of the time. This asymmetry will be treated in more detail in the next section. (3) The H selected by S immediately after a (‘wrong’’tended to be consistent with the information contained in the preceding outcome trials. The probability that S chose a consistent, i.e., logically correct, H was transformed (in a manner described in the next section) t o reveal that the average size of the H set went from eight H s at the outset of a problem to three H s after the third outcome trial. Thus, S was not only rejecting the manifested H [in the sense of the result in (2b) above], but was sampling and rejecting other logically disconfirmed H s as well. (4) The more often S was told “right” on F Trials 1 and 2 , the greater was the probability that H , was correct, given F, = -. Thus, after F,, Ss appeared to have learned more from being told “right” than from being told “wrong” on F, and F, even though an equivalent amount of information was conveyed by the two types of outcome.
B. THE EIGHT-DIMENSIONAL ‘(8-D)PROBLEMS 1. Introduction The experiment just described has a few limitations. First, a 4-D problem with the accompanying set of eight H s is far removed from ordinary problem-solving situations, in which the number of alternative solutions can be exceedingly large. Second, a few authors (Trabasso & Bower, 1968; Chumbley, 1967) have commented that the demonstration that S remembers earlier information when he resamples (cf. findings (Zb), (3), and (4)above) occurs in a relatively easy problem. The research that has appeared to confirm Restle’s zero-memory assumption has been performed, typically, with problems of five or six dimensions. The implication is that Restle’s assumptions might apply to more complex problems. These considerations suggested that the experiment be redone with 8-D problems. There is a further advantage to this change, beyond simply
108
Marvin Levine
checking the generality of the results already obtained. An 8-D problem is difficult enough to cause many errors. Thus, one may more efficiently and over a longer number of trials inquire into the resampling of H s following errors. One may, then, present longer problems and study changes in H sampling over trials. Finally, these changes are accomplished with little change in the periodicity of feedback. As noted above, one additional blank trial doubles the number of possible response patterns. I n an 8-D problem, therefore, five blank trials yield 32 possible response patterns of which 16 are H manifestations. 2 . Nethod
a. Subjects. Fifty-six students from the Introductory Psychology course at the State University of New Pork a t Stony Brook served as Ss. b. Materials. Slides were rear-projected so that the pair of stimuli fell on two 3-inch-square translucent panels 6 inches apart. A 3 x %foot board with windows for the panels was placed so that S could see only the two panels and none of the equipment at the rear. A choice response, consisting of a touch of either panel, activated an appropriate circuit in the control equipment. At the base of the board, cent,ered between the two stimulus panels, was a “reinforcement” key. Pressing this after a choice-response illuminated for 1 second the appropriate feedback word on F trials (“right” or “wrong,” located one above the other between the stimulus panels). This key-press also stepped the slide projector, cleared the control equipment, and brought S’s hand to a neutral position after each trial. c. Problems. Eight-dimensional stimuli were constructed by adding to the four dimensions in Fig. 1 the dimensions of one or two, solid or dashed, square or circular borders, with a spot above or below the border. Five-hundred seventy-six slides of these stimuli were organized into 16 36-trial experimental problems. A problem consisted of six different F-trial slides each of which was followed by a set of five blank-trial slides (cf. Fig. 2). The F-trial slides were never identical to a blank-trial slide and were so constructed that each outcome logically reduced the number of possible solutions by half. Thus, exactly four F trials always provided enough information to define the solution. Within each set of five blanktrial slides no two stimulus dimensions were ever perfectly correlated. Each H , therefore, always yielded a unique response pattern. A more detailed description of the stimulus sequences can be found in Appendix A. d. Procedure. A series of eight preliminary problems was presented first. These problems served to acquaint S with the stimuli of the experiment, to introduce the use of blank trials, and t o indicate the range
Neo-noncontinuityTheory
109
of solutions. Four 4-D problems were first presented, along with instructions describing the eight possible solutions. These were followed by four 8-D problems, the additional solutions being first noted. After each preliminary problem, S was asked t o state the solution. If he verbalized ~, any H other than one of the solutions stipulated, he was t,old, L L Nthe solution is not as complicated as that. It will always be one of these” (the possible solutions were then reviewed). All Ss were told the correct solution for each problem. This procedure, then, served to fulfill the composition assumption. The 16 experimental problems followed. Each dimension coytributed a solution during the first eight problems ; the opposite values served as solutions during the second eight problems. Four different sequences of solutions were generated, each one being employed for 14 Ss. The E made no comments to S during these 16 problems. 3. Results and Theory a. Response Patterns during Blank Trials.2 The data from the sets of blank trials fall into two classes : the 16 patterns conforming t o the specified Hs and the 16 patterns not conforming. The H patterns appear on 91.4% (4913out of 5376) of the five-trial sets. Thus, in this more complex problem, S’s behavior is as strongly systematic as it appeared in the 4-D problem. Nonetheless, 8.67; of the patterns did not conform to the theory. I n order to better understand both this deviation from perfect H-pattern performance and some of the results that follow it will be useful at this point to add another assumption. It occasionally happens that S , after a blank-trial response, will say something to the effect that “That last response was a mistake. Is it too late t o change it?”This suggests that Ss occasionally make mistakes in responding relative to their Hs. An S, for example, who believes that the left side is the basis for choosing may, “by mistake,” choose the right side on some trial. These mistakes will be referred to aptly (if inelegantly) as oops-errors and will require a fourth assumption, the oops-error assumption : On any one trial of a blank-trial series, S has a constant probahility of choosing incorrectly relative to his H . Knowing the percent of inconsistent (non-H) response patterns, it is possible to compute the probability of an oops-error. If, for example, 8% of the five-trial sets showed inconsistent patterns, then the estimated probability of an oops-error would be .02. For the range of values conA formal, more detailed treatment of the theoretical material in this section will be found in Section V,B (Appendix).
Marvin Levine
110
sidered below, the probability of an oops-error is approximately onefourth the probability of an inconsistent pattern. It appears, then, that in the present data the probability of an oopserror is slightly greater than .02. This conclusion it misleading, however, because the proportion of inconsistent patterns is not constant for the six probes of a problem. Figure 3, which shows the proportion of inconsistent patterns after each outcome, indicates two effects :The proportion decreases throughout a problem and is greater following F,= - than following F i= +. It is as though S is somewhat confused or distracted at the start of a problem and following a “wrong.” The probability of an oops-error ranges from .044 (at H I after F , = -) t o just under .01 (at H, after F5 = +).
#
H/
H2
H3
H4
H5
H6
$
FIG.3. The proportion of inconsistent patterns at each H set given that the immediately preceding outcome was right (+) or wrong (-).
The existence of such oops-errorsmeans that occasionally an H pattern will be misinterpreted. Fortunately this does not occur too frequently. The stimulus sequence during blank trials was such that a single error almost always produced an inconsistent pattern. Two or more errors, however, typically produced a proper H pattern and caused a misinterpretation. The probability of a misinterpretation, it turns out, is alniost exactly equal to the probability of an oops-error: at H , after F , = - , for example, the probability of a misinterpretation is .047; at H , after F, = + it is .010. The presence of oops-errors changes the sampling conception somewhat. This may be formalized as follows: The S starts the ith series of blank trials with an H . This will be referred to as the true H and sym-
Neo-noncontinuity Theory
111
bolized by Ti. Because S has a small probability of making a mistake in responding, the pattern of response may be classified in one of three ways : ( 1 ) The pattern may manifest Ti, symbolized H i= Ti. (2) The pattern may, because of oops-errors, manifest an H different from the true H , or H i = Di. (3) Because of oops-errors, the pattern may be inconsistent. This will be symbolized by H i = I. The symbol H i# I will denote an interpretable pattern. Since the three manifestations are mutually exclusive
P (H i = Ti) + P ( H i = Di) + P ( H i = I) = 1 At H , , for example, Fig. 3 shows that P ( H , = I(F,= -) = .155. It is possible to compute from this that P ( H , = D,IF, = -) = .047 and P ( H , = T,[F, = -) = .798.
p
.95
.90
OUTCOME TRIAL
FIG. 4. The solid curve shows the proportion of F-trial responses predictable from the immediately preceding H ( N > 750 at each point); the dashed curve is the estimated theoretical limit of Predictability.
b. The Response R on the Outcome TriaL2 Because no outcome intervenes between the blank-trial probe and the response on the next outcome trial, i.e., between H i and R i + , , the assumptions of the theory permit the prediction of that response. Suppose, for example, that Hi = “large” (read: “The manifest H on the ith blank-trial series is ‘choose large”’). One need only determine on which side the large letter appears on the next trial. According t o the theory, S will choose that side. Figure 4 shows the proportion of times that the predictions are correct a t each F trial. The predictions beyond the third P trial are virtually perfect. Of course, the oops error constrains the predictability of the response. Because the response itself may be incorrectly made or because the preceding H had H i = Di the F trial response cannot be perfectly predicted. The dashed line in Pig. 4 shows what the predictions
Marvin Levine
112
should be. It is clear that the predictions are as good as they can be, given the oops error assumption. c. The Eflects of Outcomes.z The upper solid curve in Fig. 5 shows the probability of repeating Hi-I when the response on Fi was called “right.” This is estimated at each trial for those dat,a in which both H i - , and H i are interpretable patterns (i.e., Hi-I # I, Hi # I) and in which Ri is consistent with Hi-, (in the sense shownin Fig. 4;symbolizedRi c Hi-l). Symbolically, the upper curve estimates
P(Hi = Hi_,IFi = +, Hi-I #I, H i #I, Ri c Hi-l). The overall probability is .94. There appears, however, a greater tendency to repeat the H at later trials.
.80
P
.20 ,~
0
H/ FIG.5. P ( H , = H i - , ) when the intervening outcome is “right” or “wrong” ( N 1 250 a t all points); the dashed curve shows the estimated theoretical limit, given “right.”
The bottom curve shows the comparable probability when the intervening outcome was “wrong.” It shows, that is,
P ( H i = Hi-IJFi= -, Hi-I #I, H i # I , Ri c Hip1). It is clear that S virtually never repeats his H after a disconfirmation: the overall probability is .01. No special trend appears over trials. Figure 5 confirms and extends the results from the 4-D experiment. In
Neo-noncontinuity Theory
113
particular, it shows that even in these very difficult problems the S remembers the H that was disconfirmed. Two other features of Fig. 5 require some theoretical elaboration. The first is that for neither outcome is there perfect memory, i.e.,
P ( H i = HiPlJFi = +, etc.) < 1.0, and P(Hi= Hi-lIFi = -, etc.) > 0. The second is that the same asymmetry seen in the 4-D experiment is obtained again : There is a greater discrepancy after “right” (.06) than after “wrong” ( . O l ) . Thus far, the theory has made no assumption concerning the effects of outcomes. The following outcome assumption can be added now: If a response is called “right,” S always keeps his true hypothesis, Ti; if the response is called ((wrong,”S always rejects Ti, i.e., P ( T i = Ti-l IF, = +) = 1; and P(Ti = Ti-l IFi = -) = 0. Figure 5 , of course, does not compare true Hs, but rather manifest Hs. This means that the occasional times at which an oops-error produces H i = Di, exceptions will seem to occur to the perfect relationship specified by the outcome assumption. It is possible to prove, furthermore, that this apparent deviation from perfection will be greater after “right” than after “wrong.” The rationale behind the proof may be seen from this example: Suppose HiP1= Ti-l, the next outcome is “right,” and because oops-errors occur in thenext blank-trials set, Hi = Di.No matter how H i is changed, it of necessity will appear to produce an exception to the assumption that Ti = Ti-l after a “right.” That is, H i can be interpreted as any of the 15 Hs other than Ti-1 and appear to produce an exception. On the other hand, suppose again that HiPI= TiP1 and Hi = Di, but that the intervening outcome is “wrong.” Only one of the 15 manifestations of Di (the pattern corresponding to Hi-l) will yield Hi = Hi-,, i.e., will appear to produce an exception to the assumption that Ti # TiPl after a “wrong.’) I n short, after a “right” all of the 15 transformations caused by oops-errors will produce a discrepancy ; after a ccwrong)) only one of the 15 transformations will have such a n effect. The deviation given a “wrong,” therefore, should be approximately lj15 the deviation given a “right.” Furthermore, knowing P ( H i = I), i.e., knowing the information in Fig. 3, one can calculate P(Hi = Di) and can then derive the predicted values for the data in Fig. 5 . The dashed curve at the top shows the predicted probability of manifesting the same H given a “right.” The corresponding curve given a “wrong” is not shown but the values decrease from .005 to .001. The theory, clearly, leaves little variance unexplained. The important conclusion is that X always keeps his true H when it is confirmed and rejects it when it is
114
Marvin Levine
disconfirmed but that this simple routine is obscured somewhat by the oops-error. d. Resampling Rejected Hs. The outcome (‘wrong’) performs two functions: It causes S to reject Ti and requires that he resample from the pool for a new H . The result in Fig. 5 shows that T iis not in the pool from which S is sampling. It <s possible, however, that Ti returns to the pool after a few trials. To detect this process it is necessary to separate the two functions of ((wrong.’)We wish to know the probability that ” rejected) will appear following a suba true H called ‘ ( ~ o n g (and sequent “wrong” (when X must again resample). Figure 6 shows the
.05 .04 P
.03 .02 .01
HZ
H3
H4
n5
9 FIG.6. P ( H , = HIIF, = -, F,
= -),
with N 2 140 a t each point.
probability that H , appears at blank-trial set j when F, = and Fj = -. The probability that S selects the rejected H increases gradually over trials. Even with five intervening 3’trials, however, the probability is less than .03. This is well below .0625, i.e., below chance.’ The solid curve in Fig. 7 shows the same kind of function, but averaged for Hs rejected at all points. Aside from a smoother function produced by the larger number of observations, the conclusions are the same as in the preceding figure. There is some upward drift in the probability of resampling a disconfirmed H but nothing that would suggest replacement of the H by all Ss. Finally, we may plot the data in the same way but with a further qualification. Suppose that H , were “large)’ and that H , -)
The value .0625 is the “zero-memory’’chance level but, given the information already reviewed, is unrealistically low. Resampling a t H , is performed, we have seen, without T,. The probability of resampling the T2 to T, generally is also low with a mean of .014. If S totally forgets that T, was disconfirmed but, as the data show, remembers and doesn’t resample T, to T5, then T I is one of 12 H s and the chance probability is .0833.Subsequent data will show that the chance level is even higher.
Neo-noncontinuityTheory
115
were “X” (and that F, = -, F, = -). The stimulus for the sixth F trial must have portrayed either a large X or a small X. If it were a large X, then S, responding to X and being told “wrong,” might also have been reminded that “large” was wrong. If, on the other hand, the stimulus showed a small X, then S, responding to X and being told
.
,031 P
.02
.
*‘ *
‘a’
.01
0
1
2
3
4
5
INTERVAL (1-i)
FIG. 7. Solid curve, P(H5 = H i ( F i + ,= -, Fj = -) for various numbers of F trials between H i and H i (for example, the value at j - i = 1 is based on the weighted proportion ofP(H, = H I ) ,. . .,P ( H , = H , ) ;the value a t j - i = 5 is based only on P ( H , = H I ) - N > 140 a t each point). Dashed curve, same probability as above with the additional restriction that F j does not disconfirm H , ( N > 60 a t each point). L(
wrong,” receives no reminder about the incorrectness of “large.” The dashed curve in Fig. 7 plots the probability of resampling a disconfirmed H only for the latter condition, i.e., when S is not reminded of the incorrectness of H i on Fj. The overall probability is .026. Again, there is strong evidence for memory even across 25 ( 5 feedback, 20 blank) trials. e. Measuring the Size of the H Set. The foregoing sections show that Ss drop disconfirmed H s out of the set. Consider an S who was told “wrong” on trials F,, F,, and F6, and is now resampling after F,. From the above data, it is not too wide of the mark to assert that the set from which S is resampling for H , does not contain the three disconfirmed Hs. The set, that is, consists at most of 13 Hs. It is “at most” because there is a logical basis for the set t o be much smaller than this. Consider the selection of H , after E says ‘(wrong” on trial 1. By the nature of the stimuli, eight H s are characterized as wrong and eight by implication are right. The proportion of times that H I is one of the eight H s designated as correct is found to be 311/366 = .85. If this proportion were 1.0, the number of H s from which S resampled, N ( H , ) , would be a maximum of eight; if it were .SO, N ( H , )would be a maximum of 10 (8 correct H s out of 10). These results follow from the assumption that the probability of choosing one of the eight logically correct H s after Trial 1 (designated L,) is equal to eight divided by the total number of H s in the set after trial 1, i.e.,
P ( H , = L,) = 8 / N ( H , ) .
Marvin Levine
116
Since P ( H , = L , ) = .85, then N(H,) = 8/.85 = 9.41. This value should, of course, be interpreted as the average size of the functional H set when S resampled, i.e., after E said wrong on Trial 1. TheSs, then, by rejecting more than six of the eight incorrect H s , show excellent retention of the information from the first trial. Defining N(Li)as the number of logically correct Hs after the ith outcome, the formula given above generalizes quite naturally for evaluation of the functional set size on later trials. If S resamples after Fi then
The stimuli were such that the number of logically correct H s was exactly halved by each outcome. Thus, N(Li)= 8 , 4 , 2 , and 1 after 1 , 2 , 3 , 15
SIZE OF SET
-
10
5
L 0
I
I
I
I
I
I
I
2
3
4
5
6
OUTCOME TRIAL
FIG.8. The size of the H set after each outcome trial (I? > 280 at each point)
and 4 outcomes, respectively. After the fifth and sixth outcome, of course, N(Li)= 1. The value of N ( H i )following a wrong at the ith trial is presented in Fig. 8. The size of the set decreases for the first four trials to an asymptotic value of about five H s and then appears to show a small upswing. This last detail is probably systematic, having also occurred with 4-D data (Levine, 1967). It undoubtedly appears because the Ss with small sets are being absorbed into the solution state. For present purposes, however, the end rise may be ignored and the set size beyond the second F trial considered constant a t the mean value of 5.9 Hs. This, of course, means that after a couple of F trials about 10 H s are rejected. Where do these rejected H s come from? Clearly not only from manifest H s called “wrong” (cf. Fig. 7 ) , for these would account for the reduction
Neo-noncontinuityTheory
117
in the set by, a t most, one H per F trial. For another and currently popular answer to this question, it will be useful first t o review some recent theoretical developments. f. Retention during Learning. Gregg and Simon (1967) critically surveyed the data supporting the zero-memory assumption and concluded that a variety of alternative assumptions would provide equally good approximations to the data. They suggested a continuum of memory assumptions as follows : (1) Local consistency : On an F trial called “wrong,” S recognizes that half (here, eight) of the H s are disconfirmed and selects his next H from among the remaining subset. This, in effect, reduces the set t o eight Hs. Trabasso and Bower (1968) have employed this assumption. If this were X’s only memory capability, the obtained curve in Fig. 8 would fall after one trial to eight H s and remain at that level for the rest of the experiment. This assumption, then, provides only an approximation t o the data. The S appears t o be remembering more than just the information from the last F trial. (2) Local consistency with local nonreplacement : Not only does S avoid the eight H s just disconfirmed but he also eliminates after the second F trial the dimension containing the H he was just holding. An example will provide the rationale for this version. Suppose that “large” is in the correct set after F, = -, is sampled, and is disconfirmed on F,. The S rejects not only the eight H s called “wrong” on F, but also both “large” and “small” : If an H has been sampled, its complement, by the local consistency rule, must have been disconfirmed earlier. This yields an asymptote of seven Hs. (3) Local consistency plus : Simon and Gregg recognize that other H s can be rejected. Thus, local consistency plus memory for two previously disconfirmed H s would account for the data in Fig. 8. (4)Global consistency: At the extreme end of the continuum S can remember not only the last trial but the information from all preceding trials. He would then sample only among logically correct Hs. If this were the case, the data would follow the curve labeled “perfect processing” in Fig. 8. The 8 s are evidently not a t this extreme.
All the assumptions on the continuum have local consistency as a minimum. All these assumptions, therefore, require that the H s observed following a “wrong” be consistent with the immediately preceding outcome trial. If an H consistent with the last F trial is symbolized as H i c Fi, then P ( H i c Fi)= 1.0 is a ~ s e r t e dThis . ~ assertion can be tested In the present context oops-errors qualify the assertion somewhat. Misinterpretations, when H , = D,, would constrain P ( H , c F i )to be about .98.
118
Marvin Levine
by observing the H s after each F trial. Figure 9 shows the proportion of H s consistent with the immediately preceding F trial. Two facts are clear: The proportion never equals 1.0 and it decreases over trials. The quality of X’s memory, i.e., his ability to eliminate 10-11 incorrect H s (cf. Fig. 8), is not based on perfect recall of the last outcome trial. Other information is obviously being recalled. Also there appears t,o be an intraproblem proactive interference effect. As the series progresses, S 1.00 \LOCAL
CONSISTENCY
.90-
.a0
P
-
.70-
.60-
.50- ‘ 1
ZERO MEMORY
‘
2
1
3
4
5
6
FIG.9. The proportion of H s consistent with the information from the immediately preceding outcome trial ( N > 280 a t each point).
Figure 9, while it disqualifies any assumption incorporating complete local consistency, provides a partial answer to the question posed by Fig. 8 : Where do the rejected H s come from? The S rejects some but not all of the H s from the immediately preceding trial. We may come a step closer to an answer by extending the analysis presented in Fig. 9. Assume, for example, that S is wrong on (and is resampling after) F,. If he remembers perfectly the information from F,, then H , should always be consistent with that information, i.e., perfect memory of the first F trial should yield
P ( H 6c F,
I F6 = -)
=
1.0.
Neo-noncontinuityTheory
119
If he remembers nothing of the F, trial, then this probability would be .5. The consistency measure is, therefore, a memory index. We can, of course, obtain this measure for each F trial at each H set, i.e., P ( H i c FjlFi= -) can be obtained after each F trial when S is resampling. This information is presented in Fig. 10. The top curve (labeled H 3 )shows the proportion of H s at the third H probe that are consistent with the information at Trials 1, 2, and 3, respectively. In general, the curve labeled H i shows the proportion of H s a t the ith probe that are consistent with the information a t each of the first i F trials. The most striking feature of these curves is that they all describe typical serialposition effects: Both recency and primacy appear. By H , these effects have evolved to memory €or the last two and the first trials. Figure 10,
1
I
1
1
I
I
2
3
4
5
6
OUTCOME TRIAL
FIG.10. The proportion of H s a,t the indicated sets (the H i ) consistent with each of the preceding outcome trials ( N > 280 a t each point).
then, reveals the compensatory recall required by Figs. 8 and 9. The Ss supplement the partial recall of the last trial with recall of the nextto-last and first trials. The bowed serial-position curve has been found previously in concept learning (Trabasso & Bower, 1964) when S was told that he would be required to recall the stimuli and when special recall tests were given. I n the present study, this seemingly universal memory phenomenon is in situ-S’s retention is tapped while he is intent upon problem solving. No recall instructions were presented. The primacy effect was relatively greater in Trabasso and Bower’s study. They found that after six trials, recall of F, was about the same as recall of F,. This is possibly an effect of the recall instructions they presented. The weaker primacy finding in the present study, however, may be caused by the large number of intervening blank trials. Research is currently under way to explore this possibility.
Marvin Levine
120
The data in Fig. 10 are persuasive that the recency effect reflects storage in short-term memory. The fourth F trial, for example, is well remembered immediately after the feedback but is no longer remembered by F , . The same is true for all other stimuli except the first. The primacy effeot, on the other hand, decays more slowly showing, perhaps (if the identity of values at H , and H , indicates an asymptote), longer-term memory. g. Outcomes and Memory. It was found in the 4-D experiment that the more often S was told “right” on F Trials 1 and 2, the more likely he was to select the correct H when he resampled after the third F trial. The S, that is, appeared to remember more from prior trials when he was right than when he was wrong on those trials. .25
P 20
0
1
2
3
NO.OF “RIGHTS“ FIG.11. The proportion of correct H s at the indicated sets (the H i ) , following F, = -, for zero to i - 1 numbers of preceding “rights” (frequency is indicated above the point when N < 100).
This effect was investigated after F Trials 2 , 3, and 4 of the present experiment. Figure 11 shows the proportion of times that the correct H , H+, was manifested at each of the three probe sets when S was resampling. The curve labeled H , , for example, shows P ( H , = H+IF, = -, and H , # I ) for F, = - (0 “right”) and for F, = + (1 “right”). I n general, the curves show P ( H i = H+IFi = -, and H i # I)for 0 t o i - 1 preceding “rights.” The figure shows that Ss remember the information better from correct trials than from incorrect trials, essentially replicating the 4-D result. h. Presolution Performance. The analysis so far has dealt entirely with the H probes and their implications. A more conventional datum will now be considered: performance on the F trials. From the standpoint of H theory, behavior during a problem may be conveniently divided into two phases, presolution performance (usually, responses prior to the
Neo-noncontinuity Theory
121
final error) and criterion performance (responses after the final error). Bower and Trabasso (1964) noted that conditioning theory and H theory differ in their predictions about presolution responding. According to conditioning theory, the probability of a correct response should increase as the trials approach the final error ;in H theory, the probability should equal .5. Bower and Trabasso found the latter to hold in several experiments. In the present experiment, the presolution state can be defined efficiently by the use of H,. An S will be considered in the presolution state if H , is not the correct H , i.e., if H , # H+, given, of course, that H , is an interpretable H pattern. The proportion correct at each F trial for those problems meeting the conditions ( N = 438) is plotted a s the solid
.30 P
.20 .lo
0
1
.. _.--------_. REJECT- I I
, 1
I
I
+PERFECT ’‘,ti PR0;ESSlNG (REJECT ALL)
2 3 4 5 OUTCOME TRIAL
6
FIG.12. Solid curve, the proportion of correct responses on the F trials, given that H , is not H+ ( N = 438 at each point). Dashed curves, the family of theoretical curves (the parameter is the assumed number of incorrect H s S can keep out of the set).
curve in Fig. 12. I n one sense, there is a replication of the Bower and Trabasso result. The curve, clearly, is not gradually rising from P(+) = .5 to some higher value. I n contrtkt to their results, however, the curve is not at .5 but, curiously, is consistently and significantly below “chance.” The Ss appear to be perversely persisting in errors. This deviation from .5 may be explained by considering an important difference between the Bower and Trabasso experiment and the present one. They randomized their stimulus sequences ; here, the sequences on the F trials were restricted to meet the counterbalancing requirements. The predictions of presolution performance generated by H theory when these special sequences are employed will be discussed below. First, however, a digression concerning the H s manifested during precriterion performance will be useful. It would be consistent with the results to date if H + did not occur during presolution performance. This, indeed, is closely approximated : P[Hi = H+IH, # ( H + or I)] = .014 (i = 1, .. ., 5; N = 2190). A more in-
122
Marvin Levine
teresting fact concerns the opposite of H+,where the opposite is defined as the other H on the same dimension (symbolized HO). If, for example, H+ is “large,” then Howould be “small.” It turns out that Hois equally infrequent :
P[H, = HOIH, # (H+or I)]= .015. Thus, the S in his sampling has omitted the relevant dimension rather than just the particular value associated with the correct H . This result replicates the finding with 4-D problems (Levine, Miller, & Steinmeyer, 1967) and supplements the data of Glanzer, Huttenlocher, and Clark (1963), Kendler and Kendler (1962), and Tighe and Tighe (1968), all of whom urge the importance of dimensions in controlling discrimination learning. The theorem may now be derived that, with the stimulus sequences of the present problems, presolution performance will show P(+)< .5 The derivation requires the basic H assumption (see Section 11),the composition assumption (see Section 11),the outcome assumption (see Section III,B,3,c), and the assumption that the S samples from a set containing neither H+ nor H o (cf. above). These assumptions characterize the S as having a set of 14 Hs, keeping the H when told “right” and abandoning it when told “wrong.” This last feature is unfortunately incomplete. It was shown above that theoretical alternatives have ranged from et zero-memory extreme (8 resamples from 14 Hs-i.e., S rejects zero H s ) to the other extreme of perfect processing (S rejects all disconfirmed Hs). Each alternative assumption about the number of Hs rejected was combined with the other assumptions, and the theoretical variants were applied to the counterbalanced stimulus sequences. Each variant yielded a different theoretical presolution curve. Four members of this family are shown as the dashed curves in Fig. 12. All variants of the theory predict below-chance performance. The conclusion is compelling, therefore, that the obtained presolution performance occurs because S behaves according to the processes described by H theory. A more detailed discussion of performance suppression during presolution responding may be found by Levine, Yoder, Kleinberg, and Rosenberg (1968).
IV. Discussion A. ABSTRACT VERSUS SPECIFICTHEORY The data concerned with H processing have led to two kinds of theory, one abstract, the other specific. Abstract theory, by far the more popnlar (employed above, and by Trabasso & Bower, 1964, 1966, 1968; Gregg & Simon, 1967; Levine et al., 1967, 1968; Restle, 1962), treats H s as marbles in an urn and raises such questions as the number of H s sampled,
Neo-noncontinuity Theory
123
and the number rejected after each outcome. It is relatively taskindependent, with applications to both concept learning and discrimination learning, and with presumed relevance to a variety of problemsolving tasks. Specific theory, less well explored (for two examples, see Levine, 1966; Restle & Emmerich, 1966), is concerned with S’s mode of coding, storing, and retrieving the information from each trial in the specific task under study. These are not two different theories in the sense that opposing predictions may be found or that data may be produced to confirm one and t o reject the other. I n the example of each type of theory presented below, it will be seen that the two types are complementary and even overlapping. The selection of one Versus the other depends only upon the questions one asks.5 1. Abstract Theory There is a subtle discrepancy in some of the results presented above. The blank-trials data indicate that S holds only a single H throughout the set of five blank trials. Indeed, both the basic H assumption and the blank-trials assumption assert that S holds a single H. The reduction in set size (cf. Fig. 8), however, shows that S attends to and rejects several Hs per trial. The following revision of the basic assumption provides coordination between single-H control and multi-H monitoring. The S, when he samples, takes a subset of H s from the universe.In order to respond to the next trial, he then selects one from this subset as his tentative working H . This is the H that dictates his response, emerges during the blank trials, is retained when E says “right,” and is rejected when E says “wrong.” However, S is also monitoring the other Hs in his subset. Two things happen, therefore, on a trial when E says “right.” The S retains his working H for the next trial and can reject from the subset those H s disconfirmed by the ,outcome. Suppose, for example, that S recognizes that both “large” and “black” might be correct. On the next trial, a large white X is on the right. If S takes “large” as his working H , responds to the right side, and is told “right,” “black” is disconfirmed and can be eliminated. After “wrong,” S can reject not only the working H but any other disconfirmed U s from the subset. He may now either select a new working H from the remainder of the subset or, if this has gone to zero, select a new subset. I n short, S can monitor the subset and eliminate Hs after both “right” and The distinction between abstract and specific theory is applicable in other sciences as well. The astronomer may study the planet Jupiter, on the one hand, for the information that its movements reveal about the laws of mechanics. Or he may study it to theorize about the nature of its atmosphere, temperature, surface, and so on. In the latter case, the model derived from the data would be specific to this particular planet.
124
Marvin Levine
“
wrong.” Figure 11 shows, incidentally, that S performs this processing on the subset more efficiently after a “right” than after a “wrong.” This subset-sampling scheme resolves a puzzle which appeared recently. As noted above, Bower and Trabasso (1964) showed that the probability of a correct response was about .5 up to the last error, when it jumped to 1.0. They had assumed that S was in two states: First he held only wrong H s , then, starting at the last error, he held only the correct H . The puzzle arose when Erickson, Zajkowski, and Ehmann (1966) found a gradual decrease in latency during the criterion run, i.e., while S was presumably holding only the correct H. Thus, choice responses showed a sudden change in strength, whereas latencies showed gradual strengthening reminiscent of continuous conditioning processes. Trabasso and Bower (1968) accounted for the latency data by shifting from the one-H-at-a-time to a subset-sampling model. Following the trial of the last error, S was assumed to sample a new subset, one which contains H+.With each correct response S eliminates those Hs in the subset inconsistent with that trial’s information until, after a few trials, only H+ remains. Making the reasonable assumption that S’s latency is a function of the number of Hs that must be scanned for consistency, Bower and Trabasso concluded that latency would decrease over the first few trials of the criterion run. The subset-sampling conception thus resolves the choice-versus-latency discrepancy. I n a compelling tour de force, Trabasso and Bower also demonstrated that this conception accounts for the data when two Hs are correct and confounded throughout a problem. With few additional assumptions, the theory predicts the proportions of Ss who will solve the problem with one, the other, or both of the Hs. The predictions follow from the subsetsampling conception which, of course, permits samples containing one or both correct Hs. The predictions were confirmed with nice quantitative precision in a variety of experiments. Abstract H theory, then, may be characterized as subset-sampling theory, a conception that was considered as early as Restle’s (1962) elaboration. The theory accounts for many of the facets of the data above, for the latency data and, as Trabasso and Bower have shown, for many of the all-or-none features in discrimination learning. 2. Specijic Theory
This form of the theory is concerned with the processes underlying H sampling and testing. What, for example, happens when S “selects a subset”? How does S remove disconfirmed H s from the subset? What, in particular, is there about this H removal that it is differentially affected by “right” and “wrong” (see Fig. l l ) ? The answers to these
Neo-noncontinuityTheory
125
questions consist in a detailed description of covert processes, expressed in phenomenological language. An example of specific theory, and a very tentative answer to some of these questions, was proposed for simultaneous discrimination learning by Levine (1966). Influenced by the work of Haber (1964) and of Glanzer and Clark (1964), who stressed the role of verbal coding in the recall of visual displays, Levine (1966) assumed that S verbally codes the F-trial stimuli. He further assumed that in simultaneous discrimination, in which a pair of stimuli are presented, S codes aspects only of the stimulus chosen. More specifically, S codes those aspects of the stimulus chosen that are in his functional H subset. Suppose that in a 4-D problem an S’s initial set consists in all eight Hs, that “large” is the working H , and that the top pair of stimuli in Fig. 1 is presented. According to the process postulated above, S repeats to himself the words “large, black, X, on the left,” responds to the left side, and keeps rehearsing these words until the outcome. If E says “right,” S will have memorized a smaller subset any of which might be the solution. If E says “ ~ r ~ n g all , ’ ’the verbalized H s are disconfirmed. The S , in this latter circumstance, can achieve a subset of correct H s by finding the complement of the just-disconfirmed Hs. He can now, that is, say something similar to the following: “ ‘Large,’ ‘black,’ ‘X,’and ‘left’ are wrong. Then, ‘small,’ ‘white,’ ‘T)’ and ‘right’ must be correct.” Note the differential effects of “right” and “wrong.” After a “right,” the coded subset is retained without further activity; after a “wrong,” S must first recode before he can have a comparable subset of acceptable Hs. This recoding requires extra time and incurs the risk that incorrect or incomplete translations will occur. To continue with the example, suppose E says “right,” a series of blank trials is presented (S duly manifesting the presumed working H , “large”), and the second pair of stimuli in Fig. 1 is then presented. The S , whose subset now contains “large,” “black,” “X,” and “left,” would choose the stimulus dictated by his working H , “large,” and would code those aspects of the stimulus that are in his subset. He would choose, that is, the right-hand stimulus and would verbally code and rehearse “large and black.” Again, a “right” produces the new subset effortlessly; a (L wrong” requires rejecting “large” and “black,’’ returning to the subset of four held at the outset of the second F trial, and recovering the remainder of that subset. Again, a “wrong” requires more processing and possibility of error than a “right.” This scheme accounts for the rapid decrease in set size (cf. Fig. 8) and for the differential effects of “right” and “wrong”, (cf. Fig. 11). It also has implications concerning the effects of the duration of the trial intervals. Since the coding is verbal and readily rehearsable, S should have
126
Marvin Levine
little difficulty maintaining the subset during any delay of the outcome. On the other hand, since the processing of the information occurs after the outcome, a long intertrial interval (ITI),because it affords processing time, should yield faster learning than a brief ITI. I n short, delays between the response and the outcome should have little effect ; delays between the outcome and the start of the next trial should speed learning. Both of these results have been demonstrated by Bourne and his associates (Bourne & Bunderson, 1963; Bourne, Guy, Dodd, & Justesen, 1965). A third implication is that the S needs more time to process the information after a “wrong)) than after a “right.” This prediction received confirmation by Erickson et al. (1966) and by Erickson and Zajkowski (1967) who found that 8’s latency was longer on trials following a “wrong)) outcome than on trials following “right.” A result by Bourne, Dodd, Guy, and Justesen (1968)) however) was not consistent with either the theoretical implication or with the latency results. These researchers compared a long IT1 after “wrong” and a short IT1 after “right” to the reverse, i.e., to a short IT1 after “wrong” and a long IT1 after “right.” The specific theory predicts that the long IT1 after “wrong” should be better than after “right.” Bourne et al. found no difference between these conditions. However, White ( 1968), who replicated this experiment, did find a significant effect in the expected direction. Little more will be said for specific theory here. Its development is so rudimentary that it is, perhaps, more useful as a source of questions than as a predictive system. Such problems as the role of visual imagery in information storage, perceptual determinants in subset selection, and the time at whichS codes the stimuli all pose themselves. These problems, at least initially, must be solved within the context of the specific task. For example, the process of coding the chosen stimulus probably occurs only when the stimuli terminate before the outcome, as in the 4-D and 8-D experiments described above. If the stimuli lasted for 3 or 4 seconds beyond the outcome, one might expect S to select and code his subset after the outcome. “Right” would be a cue to encode a subset on the side with the response; “wrong” would be a cue to encode a subset on the other side. The differential effects of “right” and “wrong” seen above should, then, diminish if the stimuli lasted beyond the outcome.
B. CONCLUSION Theory, then, is proceeding at two levels, one abstract, probabilistic, intended for a variety of problem-solving situations ; the other qualitative in description, concerned with the specific processes in the specific task. The development along these lines has already produced some notable accomplishments and promises both application to a greater variety of discrimination tasks and a finer analysis of each task. The
Neo-noncontinuity Theory
127
concomitant development of theory and of probes, such as the blanktrials probe described above, should yield a rigorous, detailed, yet comprehensive description of human discrimination learning.
V. Appendix A. THE STIMULI FOR THE 8-D PROBLEMS Figure 13 shows in binary form one of the set of 8-D stimuli employed. The figure presents only the left-hand side of the stimuli, which is why column 1, the position colurnn, contains only 1’s. The right-hand side contains the other levels of the dimensions, is completely redundant with the left-hand side, and, hence, is not shown. The right-hand stimuli may be constructed by simply interchanging the 1’s and 0’s in Fig. 13. The set is internally orthogonal (1-0)and has these characteristics : (1) Each
level of each dimension is paired exactly four times with each level of every other dimension (squares are solid four times, large letters are on the left four times, and so on). (2) For any two stimuli, exactly four levels are changed. (3) Stimuli 1, 3, 5, and 8 (also the remaining four stimuli) produce a different left-right sequence for each of the 16 levels. With these four stimuli, the response patterns produced by the 16 H s will be different. Four such stimuli will be referred t o as an 1-0 subset. Using an 1-0 subset, a fifth stimulus taken from the remaining four is necessarily a partial inverse of one of the original four. One stimulus is said to be a partial inverse of another when the first four columns
Marvin Levine
128
are identical and the last four are different; in Fig. 13, 2 is a partial inverse of 1. When the five stimuli consisting of an 1-0 subset and one partial inverse are presented, the levels (or H patterns) will have these intercorrelations: Any level will have one other level paired with it for four trials, six levels for three trials, six for two, one for one, and one for zero. This last is the other level on the same dimension. Four different 1-0 sets of eight stimuli were constructed. Within any one problem, one of these sets was used for the outcome trials, and another was used for the blank trials. The outcome trials used six different stimuli. The first four of these always consisted in an 1-0subset. This permits logical elimination of half of the remaining H s at each trial. Each set of five blank trials consisted in one 1-0 subset and one partial inverse. For a single problem, six such blank-trial sets were randomly selected and ordered from the same 1-0 sequence.
B. THEEFFECT OF OOPS-ERRORS UPON H INTERPRETATION 1. True and Manifest H s A distinction will be made between the true H , T, which S holds and the manifest H , M, which is inferred from the response pattern. On the ith set of blank trials, M, can be classified in one of three ways : No oopserrors have occurred and T is manifested, i.e., Mi = Ti; oops-errors have occurred causing a different H to be inferred, i.e., M, = D i ; and oopserrors cause an inconsistent (non-H) pattern to appear, i.e., Mi = I. The occurrence of an oops-error will be symbolized by Z. The notation Z = k will mean that k oops-errors have occurred in a blank-trial set ( k = 0, 1, . . ., 5). Taking the response pattern corresponding to a particular H in a blank-trial set, one can systematically change one (or more) response at a time and determine in each case whether the new sequence produced has M = D or M = I. Because the blank-trial sets are uniform, consisting in one 1-0 subset and one partial inverse, TABLE I THEPROBABILITY THATTHE H MANIFESTATIONWILLBE IN EACHOF THREECATEGORIESFOR EACH NUMBER k OF OOPS-ERRORS
k 0
1 2
3 4 5
I= P ( M = I1Z 0 .8 .4
.4 .8 0
=k)
J = P ( M = DIZ 0 .2 .6 .6 .2 1.o
=k
)
K = P ( M = TIZ 1.0 0 0
0 0 0
THE
=k)
Neo-noncontinuityTheory
129
P(M = 112 = k)is the same in each blank-trial set for any particular value of k. When Z = 1, for example, the error occurring on any one of four of the five blank trials will produce M = I, i.e., P(M = IIZ = 1) = .8; the TABLE I1
THEPROBABILITY THATTHE H MANIFESTATIONWILLBE IN EACHOF THREECATEGORIES“
k
L= P(Z=k)
P [ ( M = I ) & (Z =k)] P[(M=D) & (Z =k)] =I.L =J*L
THE
P[(M=T) & (Z =k)] =K*L
5
2 P[(M = I) & (Z = k)]
P(M = I) =
k-0
5
P(M=D)=
2 P[(M=D)&(Z=k)] k=O
(3) a Letting p be the probability of an oops-error on any trial of a blank-trialset, the top part of the table shows thk probabilities of 0 to 5 errors and of M being in each of the three categories (see Table I for the definition of I, J, and K); the equations at the bottom of the table show the probability that M is in each of the three categories.
error falling on the remaining trial will produce M = D, i.e., P(M = DIZ = 1) = .2. The full set of probabilities is shown in Table I. With this information, the probability that M is in each of the three categories may be computed. The formulas are shown at the bottom of Table 11. The body of Table I1 shows the intermediate steps in the derivation of these formulas. The probability that M = I is seen to be a function ofp, the probability that an oops-error occurs on any one trial. Of course, P ( M = I) can also be estimated from the data. The estimates at each blank-trial set following a “right” or a “wrong” were shown in Fig. 3. Substituting the value
130
Marvin Levine
for a blank-trial set into Eq. ( 1 ) permits the solution for p . The graph relating p to P(M = I),over the obtained range of values of I, is plotted inFig. 14.6 With p determined, P(M = D) and P(M = T) may be obtained from Eqs. ( 2 ) and (3), respectively. Note that P(M = D) is relatively small. Under the worst conditions obtained, the first set after a wrong, P(M = I) = .155 but P(M = D) = .047.
.04
.03
P .02
.01
P(M=I)
FIG.14. Part of the graph of Eq. ( l ) ,permitting the evaluation of the probability of an oops-error from the proportion of inconsistent patterns.
2. Predicting the F-Trial Response
From the pattern of responses during the blank-trial set the H is inferred. Because no outcome is given after the trial the blank-trials assumption still holds: The same H determines the next (the F-trial) response. This response, therefore, should be perfectly predictable. The oops-error operates, however, to impair prediction in two ways: ( 1 ) M = D could have been produced during the blank-trial set ; an incorrect prediction will then occur half of the time, when the response predicted from the incorrectly inferred H is not the same as the response produced by the true H . ( 2 ) An oops-error occurs on the F trial; this, of course, happens with probability p , as determined from the immediately precedEquation ( 1 ) is symmetrical in p and 1 - p, so that two roots in the domain of 0-1 exist. All the values of p employed in this chapter were based on the assumption that oops-errors are reasonably infrequent, i.e., would have probabilities less than .5.
Neo-noncontinuityTheory
131
ing blank-trial set. Therefore, the probability that the F-trial response is consistent with the preceding manifest H , symbolized P(R,,, c M,), given that the preceding H is interpretable, is given by:
This equation was used to derive the dashed (theoretical) curve in Fig. 4. 3 . The Effects of Outcomes on H Repetition
The outcome assumption says that P(T,+, = T[\F[+,= +) = 1 P(TI+, = T,IF,+, = -) = 0. Figure 5 , of course, shows a plot not of the true H s but of the manifest Hs. The theoretical counterpart of the data when the intervening outcome trial is “right” is
P(Mi+, = MiIFi+i = +, Ri+i c Mi, Mi # 1,Mi+, #I).
(4)
The result M,,, = M, will obtain, of course, whenever Mi = T, and MI+, = T,,, . It will also obtain when both M, = D, and M,,, = D,,,, and when the (incorrectly) inferred H is the same at both sets. The probability of this latter circumstance is negligible, so that an excellent approximation to (4)is given by
P(M,,,
= M,IF,,,
=
+, etc.)
The four terms in the equation may be obtained from Fig. 3 and the equations in Table 11. The results were plotted as the dashed curve in Fig. 5. I n a similar fashion,
P(M,+, = M,IF,+, = - > R,,, c Mi, M, # 1,Mi,, # 1) (5) may be determined. The event Mi+, = Mi will occur, given the conditions in ( 5 ) ,when either Mi = D,, M,,, = D,+, or both and when the incorrectly inferred H in the one set is the same as the inferred H in the other set. Assuming that in a 16-H (8-D) problem this last condition occurs 1/15 of the time, then P(M,+, = M,IFi+, = -, etc.) = (1/15)P[(M, = Ti) & (M,,, = Di+I)IP,+I = -, etc.] + (1 - P ) (1/15) = Dt) & (Mi+, = Ti+l)lFi+1= etc.1 + (1/15)P[(Mi = Di) & (Mi+, = Di+l)lFi+l= etc.1 -9
-3
132
Marvin Levine
-1
= Di) P W l + , = Ti+,lFi+I = + (1 - P ) 1 P(M1 - P(M, = I) 1 - P(M,+, = IJFi+,= -)
Again, the values at each point may be obtained from Fig. 3 and the equations in Table 11.
REFERENCES Bourne, L. E., Jr., & Bunderson, C. V. Effects of delay of informative feedback and length of postfeedback interval on concept identification. Journal of Experimental Psychology, 1963,65, 1-5. Bourne, L. E., Jr., Dodd, D. H., Guy, D. E., & Justesen, D. R. Response-contingent intertrial intervals in concept identification. Journal of Experimntal Psychology, 1968, 76, 601-608. Bourne, J. E., Jr., Guy, D. E., Dodd, D. H., & Justesen, D. R. Concept identification: The effects of varying length and informational components of the intertrial interval. Journal of Experimental Psychology, 1965,6, 624-629. Bourne, L. E., Jr., & Restle, F. Mathematical theory of concept identification. Psychological Review, 1969,66,278-296. Bower, G., & Trabasso, T. Reversals prior to solution in concept identification. Journal of Experimental Psychology, 1963, 66, 409-418. Bower, G., & Trabasso, T. Concept identification. I n R. C. Atkinson (Ed.),Studies in mathemtical psychology. Stanford, Calif. : Stanford University Press, 1964. Pp. 32-94. Bruner, J. S., Goodnow, J. J., & Austin, G. A. A study of thinking. New York: Wiley, 1956. Bush, R. R., & Mosteller, F. A model for stimulus generalization and discrimination. Psychological Review, 1951, 58, 413-423. Chumbley, J. I. The memorization and manipulation of sets of hypotheses in concept learning. Unpublished doctoral dissertation, Indiana University, 1967. Erickson, J. R., C Zajkowski, M. M. Learning several concept identification problems concurrently. Journal of Experimental Psychology, 1967, 74, 212-218. Erickson, J. R., Zajkowski, M. M., & Ehmann, E. D. All-or-none assumptions in concept identification. Journal of Experimental Psychology, 1966, 72, 690-697. Estes, W. K. Learning theory and the new “mental chemistry.” Psychological Review, 1960, 67, 207-223. Estes, W. K. All-or-none processes in learning and retention. American. Psychologkt, 1964,19, 16-25. Estes, W. K., & Burke, C. J. Application of a statistical model to simple discrimination learning in human subjects. Journal of Experimental Psychology, 1955, 50, 81-88.
Neo-noncontinuity Theory
133
Glanzer, M., & Clark, W. H. The verbal loop hypothesis: Conventional figures. American Journal of Psychology, 1964,77, 621-626. Glanzer, M., Huttenlocher, J., & Clark, W. H. Systematic operations in solving concept problems: A parametric study of a class of problems. Psychological Monographs, 1963, 77 ( 1 , Whole No. 564). Green, E. J. A simplified model for stimulus discrimination. Psychological Review, 1958,65,56-63. Gregg, L. W., & Simon, H. A. Process models and stochastic theories of simple concept formation. Journal of Mathemtical Psychology, 1967, 4, 246-276. Haber, R. N. Effects of coding strategy on perceptual memory. Journal of Experimental Psychology, 1964, 68, 357-362. Harlow, H. F. Learning set and error factor theory. I n S. Koch (Ed.), Psychology: A study ofa science. Vol. 2. New York: McGraw-Hill, 1959. Pp. 492-537. Hull, C. L. Simple qualitative discrimination learning. Psychological Review, 1950, 57, 303-313. Kendler, H. H., & Kendler, T. S. Vertical and horizontal processes in problem solving. Psychological Review, 1962, 69, 1-16. Krechevsky, I. “Hypotheses” in rats. Psychological Review, 1932, 39, 516-532. Krechevsky, I. A study of the continuity of the problem-solving process. Psychological Review, 1938, 45, 107-133. Levine, M. A model of hypothesis behavior in discrimination learning set. Psychological Review, 1959,66, 353-366. Levine, M. Mediating processes in humans a t the outset of discrimination learning. Psychological Review, 1963, 70,254-276. (a) Levine, M. The assumption concerning “wrongs” in Restle’s model of strategies in cue learning. Psychological Review, 1963,70, 559-561. (b) Levine, M. Hypothesis behavior by humans during discrimination learning. Journal of Experimental Psychology, 1966, 71, 331-338. Levine, M. The size of the hypothesis set during discrimination learning. Psychological Review, 1967, 74, 428-430. Levine, M., Leitenberg, H., & Richter, M. L. The blank-trials law: The equivalence of positive reinforcement and nonreinforcement. Psychological Review, 1964, 71, 94-103. Levine, M., Miller, P., & Steinmeyer, C. H. The none-to-all theorem of human discrimination learning. Journal of Experimental Psychology, 1967,73, 568-573. Levine, M., Yoder, R. M., Kleinberg, J., & Rosenberg, J. The presolution paradox in discrimination learning. Journal of Experimental Psychology, 1968, 77, 602608.
Restle, F. A theory of discrimination learning. Psychological Review, 1955, 62, 11-19. Restle, F. Toward a quantitative description of learning-set data. Psychological Review, 1958,65, 77-91. Restle, F. The selection of strategies in cue learning. Psychological Review, 1962, 69, 329-343. Restle, F., & Emmerich, D. Memory in concept identification: Effects of giving several problems concurrently. Journal of Experimental Psychology, 1966, 71, 794-799. Richter, M. L., & Levine, M. Probability learning and the blank-trials law. Psychonomic Science, 1965, 2, 379-380. Rock, I. The role of repetition in associative learning. American Journal of Psychology, 1957, 70, 186-193.
134
Marvin Levine
Spence, K. W. An experimental test of the continuity and non-continuity theories of discrimination learning. Journal of Experimental Psychology, 1945, 35, 253266.
Tighe, T. J., & Tighe, L. S. Perceptual learning in the discrimination processes of children: An analysis of nine variables in perceptual pretraining. Journul of Experimental Psychology, 1968,77, 125-134. Trabasso, T., & Bower, G. H. Memory in concept identification. Psychonomic Science, 1964,1, 133-134. Trabasso, T., & Bower, G. H. Presolution dimensional shifts in concept identification: A test of the sampling with replacement axiom in all-or-none models. Journal of Mathematical Psychology, 1966,3, 163-173. Trabasso, T., & Bower, G. H. Attention in learning. New York: Wiley, 1968. White, R. M., Jr. Effects of some pretraining variables on concept identification. Unpublished doctoral dissertation, University of Colorado, 1968.
COMPUTER SIMULATION QF SHORT-TERM MEMORY: A COMPONENT-DECAY MODEL' Kenneth R. Laughery2 STATE UNIVERSITY OF NEW YORK AT BUFFAL.0
BUFFALO, NEW YORX
I. Introduction.. ........................................... 11. The Model-An Overview. ................................ 111. The Model-A Detailed Description.. ....................... A. Basic Units of Information. ............................ B. Basic Forgetting Mechanism. ........................... C. The Memory Structures. ............................... D. Memory Processes. .................................... E. Types of Errors That the Model Can Make. ............... IV. A Sample Simulation. .................................... V. Some Simulation Results. ................................. A. Simulated Study Number 1 : A General Experiment. ....... B. Simulated Study Number 2 : High versus Low Auditory Similarity ............................................ VI. Discussion and Conclusions. ............................... A. Some Possible Extensions. ............................. B. Some Possible Revisions.. .............................. References ..............................................
135 138 139 140 142 144 149 173 174 182 182 187 188 189 194 197
I. Introduction The past few years have witnessed the development of several models of human memory. Examples are Atkinson and Shiffrin (1968), Bower (1967), Sperling (1967), Waugh and Norman (1965), and Wickelgren and Norman (1966). The most generally accepted concept in the theories is that memory can best be represented by three separate storage systems. The three parts of this general model of memory are illustrated in Fig. 1. The first of these systems is the sensory storage or very-short-term memory, viewed as a peripheral or perceptual storage. There is general This research was supported by Research Grant No. MH-11595 from the National Institute of Mental Health, United States Public Health Service. Contributions to the development of the short-term memory model have been made by a number of people associated with the author during the past 2 years. Special thanks go to Patricia Fiero, Richard S. Cimbalo, Allen L. Pinkus, James C. Fell, and Gilbert J. Harris. 135
Kenneth R. Laughery
136
agreement that information stored in this type of memory decays with time and is lost in a matter of a few seconds or less (Averbach & Coriell, 1961 ; Glucksberg, 1965 ; Sperling, 1960). Another type of memory, shortterm memory (STM),is the focus of these various models. Despite general agreement that information is lost from STM (as opposed to loss of access to the information), there are two views as to the nature of this loss. One view is that information is lost as a result of decay; an alternative is that STM has a limited capacity so that items are lost by being replaced by new items entering the system. Choosing between these alternative postulates is almost a matter of theoretical style, since the issue has not lent itself to any clear-cut experimental resolution. Long-term memory (LTM), the third type of storage, received relatively little attention in the theoretical efforts cited above. It appears, however, that there is general agreement as to why information is not always recalled perfectly from this permanent storage. The widely accepted view is that what is lost is access to the information and not the information itself.
A
VERY INPUT
MEMORY OR SENSORY STORAGE
ATTENTION
-
SHORT-TERM MEMORY
OUTPUT
The work reported in this paper represents an attempt to construct a computer simulation model of human memory. It is a theory of the average S. As a rule, simulation models have either focused upon the behavior of an individual S (e.g., Feldman, 1963; Laughery & Gregg, 1962) or upon the performance of an “average S” (e.g., Feigenbaum, 1963). Clearly, these are two separate approaches to theory construction. In addition to the theoretical inclinations of the modeler, the decision as to which approach to take is frequently determined by the nature
Computer Simulation of Short-Term Memory
137
of the task environment, The tactic of building a model for individual Ss (and hopefully finding ways of pulling them together into a general model that includes appropriate parameters for representing the various individuals) has usually been employed for simulating problem solving kinds of behavior. I n these tasks one can collect protocol data which provides a rich source of information regarding behavioral processes. On have the other hand, atbempts to construct models of the “average 8’’ generally involved tasks that preclude getting protocols (the task would be grossly disrupted by the procedure). The memory-span task is an example of a situation in which protocols would have a disrupting effect (except for postsequence verbal reports). The model represents the human a t an information-processing level of description. At this level, the human is viewed as having available a variety of processing mechanisms which can be employed to operate upon information. The source of the information may be the external environment, such as a display device, or the internal environment (memory). Similar to the other models already mentioned, a system consisting of the three storage mechanisms is envisioned with the main focus on STM. The model differs from the others, however, in one important respect : It describes in complete detail a set of structures and processes that are intended to represent certain aspects of human memory. While the model to be described purports to deal with the human memory system, it is actually a simulation of performance in a particular task. The task is the standard memory-span procedure : A sequence of items (such as digits) is presented, following which S reproduces as many of the items as he can remember. Furthermore, the task is limited to a vocabulary of 36 items: the 10 digits, 0-9, and the 26 letters of the alphabet. Although the model is presently limited to the memory-span procedure and a 36-item vocabulary, it will be argued later that the model can be expanded to simulate performance in many other memory tasks. Such extensions will not require a restatement of the basic memory structures or processes, but rather will involve only the addition of task-dependent structures and processes to allow the model to perform in the different situations. This distinction between basic and task-dependent structures and processes is crucial. Indeed, one of the theses of this effort is that a relatively few basic memory structures and processes can account for performance on a variety of memory tasks-the degrees of freedom in the model are not larger than the points of fit. One problem that has plagued computer simulation theorists is communicating to others the structures and processes that make up the model. These models usually deal with relatively complex tasks, so the models themselves have been complex. One issue is the level of descrip-
138
Kenneth R.Laughery
tion at which the model should be presented. Presenting the program instructions usually provides little help in understanding the theoretical concepts underlying the model. On the other hand, attempts to describe the model verbally generally result in some loss of precision. A compromise is to describe the processes in the model in terms of flow charts which capture the formal qualities of the model while communicating the logical complexity. Another communication problem is the integration of the various parts of the model. There are many parts of a simulation model, and it is important to understand how they fit together. A linear presentation of these parts plays considerable havoc with the reader’s intellectual digestive system. A more appropriate procedure is to make two or three descriptive passes through the model, with each successive presentation providing a more detailed description. The following model presentation consists of an overview, a detailed description, and an example of the model’s performance (a trace) on a specific sequence of items. 11. The Model-An
Overview
I n this section, a general description of the model will be presented. As already mentioned, the model represents performance in a memoryspan task in which the item vocabulary consists of the 36 digits and letters. The digits and letters, however, are not the basic units of information in the model. The basic information units or components are a set of visual and auditory features Chat d e h e the visual and auditory dimensions of the vocabulary items. Two types of memory, STM and LTM, are represented (the current version of the model does not include a very-short-term memory). The LTM contains the visual and auditory definitions of the vocabulary items and represents S’s permanent knowledge about the items. The STM is a series of memory structures, each holding information about an individual item. This item information includes the names of its auditory components, the times at which the components were stored, and functions describing the decay of the components. The basic forgetting mechanism in the model is a decay process. This process is described by an exponential function which defines the probability of retrieving a component as a function of length of time it has been in STM. The components decay independently. The flow of events in the model is as follows : (1) A set of components (visual or auditory-representing the presentation mode) are presented to S. ( 2 ) The S notices (“sees” or “hears”) the set of components.
Computer Simulation of Short-Term Memory
139
(3) The S searches LTM and finds an item whose components match the input (the item is recognized). (4) An STM structure is set up for the item with a substructure for each auditory component. Each substructure contains the component’s name, the time it was stored, and a function describing its decay. (5) If the next item has not been presented, the items already in STM are rehearsed during the interitem interval. Rehearsal consists of retrieving components of an item from STM, recognizing the item by finding a match in LTM, and then updating the component substructures in STM. Updating involves resetting the time and changing the decay function so that the component decays at a slower rate. (6) If after all items are rehearsed there is still time available (a new item has not been presented), an attempt is made to recode groups of items that are consistent with certain recoding criteria. (7) At specific points during the above activity, S checks if a new item has been presented, and if it has the process reverts back to Step 2. (8) When a special “respond” signal appears and is recognized (a 37th vocabulary item), control is transferred to a respond process that attempts to recall (and output) the items. (9) When recall of a sequence is completed the STM structures representing that sequence are erased. A new sequence is then begun with nothing in STM. Thus, the model does not deal with proactive or retroactive interference between sequences.
111. The Model-A Detailed Description Two major parts of the simulation program can be distinguisheddata structures and routines. The data structures represent the long-term and short-term memories. The routines represent the processes available for operating on memory. These processes include taking in information from the environment, storing it in memory, retrieving it from memory, and outputting information to the environment. Other routines represent processes such as rehearsal and recoding or chunking. Beyond this distinction between data structures and routines, two basic concepts are fundamental to understanding the model. These model concepts are a simulated clock and a window. The clock is essentially a cumulative record of the time required by the various memory processes. Underlying this concept is the assumption that the human is basically a serial processor ; he can perform only one cognitive process at a time, and all processes require some amount of time to be carried out. With the occurrence of each process, the clock is incremented by an amount of time associated with that process. The time base is milliseconds. The window represents the visual display or the tape recorder through which information is presented to S. While the data structures and
140
Kenneth R. Laughery
routines are intended to represent the human subject in a particular task, the program also includes data structures and routines that are intended to represent the experimenter. “Experimenter” refers to the human in charge of the experiment as well as those pieces of equipment that are part of the laboratory ~ i t u a t i o nI.n~ the model, the window is a cell or series of cells into which and from which the experimenter routines can place and remove information. The experimenter routines monitor the simulated clock and at appropriate times put information into or take information out of the window. The S also monitors the window, and through it the information is “seen” or “heard.” Given these two concepts, the model can now be presented in detail. The presentation is divided into five sections ;the basic information units, the forgetting mechanism, the memory structures, the memory processes, and types of errors the model can make.
A. BASICUNITSOF INFORMATION I n addition to the 10 digits and 26 letters, there is a 37th item in the model’s vocabulary which is a signal to S that the sequence is complete and it is time to recall. It represents an auditory or visual signal occurring at the end of the input sequence. While the digits, letters, and special symbol make up the vocabulary of the simulated S , these are not the basic units of information with which the model deals. Rather, the information units are visual and auditory components or features that can be used to define the visual and auditory dimensions of the vocabulary items. The auditory components are named P1, P2, . . ., P43, and their definitions are presented in Table I. The visual information components in the model are a set of 21 basic line segments and line relationships which include elements for describing a standard version of the digits and letter^.^ These visual components are named V1, V2, . ..,V21, and their definitions are presented in Table 11. Each of the vocabulary items, digits, and letters can be uniquely defined by some combination of either the auditory components (which describe how the item sounds) or the visual components (which describe what the item looks like). These auditory and visual descriptions are presented in Table 111. The visual and auditory descriptions of the respond signal are special symbols (V24 and P44) that are not part of the basic sets in Tables I and 11. No discussion of the experimenter data structures and routines will be given since this part of the program contributes little to understanding the model. The standard version is the digits and letters produced by an Industrial Electronic Engineering Company Binaview display. The reason for selecting this version was that Binaview displays were used in all our experiments involving a visual presentation.
Computer Simulation of Short-Term Memory TABLE I BASICAUDITORY COMPONENTS Component name P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 PI 1 P12 P13 P14 P15 P16 P17 P18 P19 P20 P21 P22 P23 P24 P25 P26 P27 P28 P29 P30 P31 P32 P33 P34 P35 P36 P37 P38 P39 P40 P41 P42 P43
Phonetic symbol P b t d
k g f V
e
6 8
Example Pay bay tip dip call gone fat vat thin then sue
2
zoo
d3 h
shoe vision church judge hat
s s:
m n
9 i
I
e E a3 Q
3 0
u U
A
a
a ;f
W
j 1 r 01
hI 31
clu ou
-Y nip lung eat it , vacation Pen sat father all notation pull pool above above worker worker we Ye5 law rob aim high boy loud open
141
Kenneth R. Laughery
142
While the types of information units that can be stored in memory involves a basic assumption about the memory system, the units that actually are stored is probably a function of the task environment. It will be assumed in this model that only auditory components are stored in and retrieved from STM. This assumption seems appropriate for representing performance on a memory-span task involving digits and letters. For other tasks, however, other types of units (e.g., visual or speech-motor) may be appropriate. TABLE I1 DESCRIPTION OF BASICVISUALCOMPONENTS Characteristic name
Characteristic description
v1
Vertical left Vertical center Vertical right Horizontal top Horizontal middle Horizontal bottom Full positive sloping line Full negative sloping line Full curve, closed Half curve, closed, top Full curve, open right Full curve, open left Full curve, open top Half curve, open left, top Half curve, open left, bottom Half curve, open right, top Half curve, closed, bottom Intersection Two lines sloping same way Part positive sloping line Part negative sloping line
v2 v3 v4 v5 V6 v7 V8 v9 v10 v11 v12 V13 V14 V15 V16 V17 V18 v19 v20 v21
B. BASICFORGETTING MECHANISM The basic forgetting postulate in this model is that information units in STM decay in time. Clearly, this is a basic assumption about the memory system. Actually, the model contains 30 different rates of decay that describe how information is lost. Each rate is described by an exponential equation giving the probability that a unit of information, a
Computer Simulation of Short-Term Memory
143
component, can be retrieved as a function of the length of time it has been stored in memory. The form of these equations is P = Ae-Bt + C (1) where p is the probability that the component is retrieved, t is the length of time the component has been in store, and A , B, and C , are free parameters. TABLE I11 AUDITORYAND VISUALDEFINITIONSOF VOCABULARY ITEMS ~~
~
Vocabulary items A
B C D E F G H I J
K L M N 0 P
Q
R S T
U V
W X Y Z 0 1 2 3 4 5 6 7 8 9 Respond signal
~~
Auditory definitions
Visual definitions
P39 P2, P21 P11, P21 P4, P21 P2 1 P24, P7 P16, P21 P23, P15 P40 P16, P39 P5, P39 P24, P37 P24, P18 P24, P19 P43 P1, P21 P5, P36, P30 P26, P33 P24, P11 P3, P21 P36, P30 P8, P21 P4, P31, P2, P37, P36, P30 P24, P5, P11 P35, P40 P12, P21 P12, P21, P38, P28 P35, P31, P19 P3, P30 P9, P38, P21 P7, P28, P38 P7, P40, P 8 P11, P22, P5, P11 P11, P24, P8, P32, P19 P39, P 3 P19, P40, P19 P44
V5, V7, V8 V1, V14, V15 Vll v1, v12 V1, V4, V5, V6 v1, v4, v 5 v 5 , v11 v1, v 3 , v5 V2, V4, V6 V4, V13 v1, v20, v 2 1 V1, V6 v1, v 3 , v20, v 2 1 V1, V3, V8 v9 V1, V14 V9, V18, V21 V1, V14, V21 V15, V16 v2, v 4 V13 V7, V8 V7, VS, V19, V20, V21 V7, V8, V18 v 2 , v20, v 2 1 V4, V6, V7 v11, v 1 2 v2 V6, V12 V14, V15 V2, V5, V18, V20 v1, v 4 , v 1 5 V17, V20 v4, v 7 V10, V17 V3, V16 V24
Kenneth R. Laughery
144
I n the present model the C parameter, which represents the asymptote, is assumed to be zero; the A parameter, the probability of retrieval of the unit a t time t = 0, is assumed to be 1. These assumptions leave the decay rate B as the free parameter describing the decay function. A subset of the family of curves, or decay rates, used in the model is shown in Fig. 2 along with the value of the B parameter for each of the curves. The D’s are the names of the routines that actually carry out the calculations for that particular decay rate. The manner in which this process is incorporated into the model will be described later. B PARAMETER VALUE S
I
,002 D I ,004 D 3 ,006 D 5 ,009 D 7 .012
0
10
5
TIME
IN
15
20 STORE (SECONDS)
25
,016
D9 Dll
.022
D I3
.031
DI5
.044
D I7
.060
D I9
.004
D21
.I29
D23
30
FIG.2 . Family of curves describing decay functions. (This figure shows 15 of the 30 decay functions used in the model.)
C. THE MEMORYSTRUCTURES
It is assumed that human memory can best be represented by a system consisting of three separate storage mechanisms : very-short-term memory, STM, and LTM. For the model presented here, only STM and LTM are represented. Some additional comments will be made later regarding the development of the model t o include a very-short-term memory. 1. Long-Term Memory
The LTM is represented by a structure that contains information that is permanently available to S. I n order t o simulate the human in a memory-span task consisting of sequences of digits and letters, LTM contains information about digits and letters that is relevant to this particular task. I n the model this information includes : (1) the visual components that describe what the item looks like; (2) the auditory com-
Computer Simulation of Short-Term Memory
145
ponents that describe how the item sounds: and (3) some information about how the item may be combined with other items for recoding into larger chunks. This recoding information will be described in Section III,D,S where recoding processes are discussed. The LTM structure i s a modified discrimination net (Feigenbaum, 1963). The nature of the structure is shown in Fig. 3. It consists of a list (LO)containing the names of all items in the vocabulary (L1-L37).5 Each item is in turn the name of a list that contains the names of two sublists. The first sublist contains the auditory components that describe the sound of the item’s name. The second sublist contains the visual components that describe what the item looks like. Hence, this list structure, consisting of the main list of vocabulary items each of which has two sublists containing the auditory and visual components describing the item, represents the LTM. There is a third sublist containing information about recoding, but discussion of that portion of LTM will be deferred to a later section. 2. Short-Term Memory
I n this model STM has an unlimited capacity. The STM structure consists of a number of memory cells (see Fig. 4),M,-M,,6 each of which holds all information about an individual item (a digit or letter). The cells are connected by links (to be described later) which represent order information. Actually, each M location is not a single cell but a large number of memory cells which are organized into a list structure. The M location contains the name of the memory structure, which in turn contains all of the information about that particular item. The memory structure for an item is shown in Fig. 5 . I n Fig. 5 , the name of the structure is a list of the auditory components that describe the item. More precisely, it is a list of names of sublists each of which contains three types of information : ( 1 ) the name of the auditory component, a P unit; (2) the clock time a t which that component was stored in STM; and ( 3 ) a decay function, a D, specifying the exponential relationship between the probability of retrieving the component and the length of time the component has been in store. At this point, two important assumptions of the model should be noted: I n the model the vocabulary items are coded as Ll-L37, where L1 = A , L2 = B, ... L36 = 9 and L37 = the special respond signal. The reason for this code, as opposed to referring to them by their usual symbols (A, B, C, and so on) is simply that the computer programming language (IPL-5)does not permit the standard symbols to be used in this fashion. An important point here is that the number of cells, N , is not a parameter of the model, but the number of items to be remembered. That is, the model simply uses as many memory cells as needed to store information about the items.
146
Kenneth R. Laughery
First, only the auditory components of an item are assumed to be stored in STM; and second, the decay of the individual components of an item is assumed to occur independently. I n the initial conception of the model, both auditory and visual components were stored in STM. A number of studies have shown that the auditory dimensions of items plays an important role in STM (e.g., Cimbalo & Laughery, 1967; Conrad, 1964; Laughery, 1963; Laughery & Pinkus, 1966; Wickelgren, 1965a). However, other evidence has appeared which indicates that in simulating performance in a memory-span LO
I COMPONENTS VIS v5 v7 V0
AUDITORY COMPONENTS P2 P21 -VISUAL
COMPONENTS VI V14 V15
L3(C) M3
L36(9)
M4
L
AUDITORY COMPONENTS Pi9 P40 PI9 VISUAL COMPONENTS
t
$+
I
v3 V16
-L37(0UTPUT SIGNAL)
1
AUDITORpT?:OMPONENTS VISUAL COMPONENTS V24
FIG.3. Long-term memory structure.
Mn
FIG.4. Short-term memory structure.
task for digits and letters, visual components should not be a part of the information in STM (Baddeley, 1966; Cimbalo & Laughery, 1967; Laughery, Harris, & Ulbricht, 1967).For example, interference and confusion among items being recalled seems to follow their auditory, not
Computer Simulation of Short-Term Memory
147
their visual similarities. As a result of these studies, the model was modified so that only auditory components are a part of STM.’ Another line of evidence is relevant to the question “What is stored in STM?” Hintzman (1965, 1967) has reported data which indicate that
STRUCTURE
LINK TO NEXT STRUCTURE
NAME OF DECAY FUNCTION
MEMORY STRUCTURE
-i
AUDITORY COMPONENT
NAME OF COMPONENT
NAME OF COMPONENT (PI
DECAY FUNCTION
TIME TAG
DECAY FUNCTION
AUDITORY COMPONENT
NAME OF COMPONENT
DECAY FUNCTION
FIQ.5. Short-term memory structure for a single item.
’
As mentioned earlier, the decision to use only auditory components in STM is task-dependent. It is not suggested that visual, articulatory, or some other dimension of information has no role in STM; but rather, in this particular taskthe memory span for digits and letters-only the auditory dimension is relevant.
148
Kenneth R. Laughery
speech-motor or articulatory components may also be important. Some comments on this alternative and the possibility of “distinctive features” will be made later in this chapter. The second characteristic of STM in the model to be noted concerns the independent decay of the auditory components. As indicated in Fig. 5 , each component has a decay function specifically associated with that particular component. The decay function may be the same or different for various components. However, it is assumed that the rate a t which a component decays depends neither upon the nature of the other components nor upon the specific decay functions assigned to other components. Bower (1967) suggested two alternative possibilities as to how components might be forgotten. The first alternative was called “hierarchical loss of components.” This idea was that the various components could be ordered in importance, and the retention of a component would vary directly with its importance. Furthermore, components at one level in the hierarchy would not be lost until the components at the lower levels had been lost. The second alternative suggested by Bower was called “independent loss of components.” I n this case, the components are equally important and show equal resistance to forgetting. He assumed that the components in the independent loss alternative are forgotten at the same rate. The process suggested is independent loss but with possibly different rates for the several components. I n other words, some decay more quickly than others, but this rate is independent of the rate at which others decay. Incidentally, the decay process does not describe the probability that a component is lost forever. Rather, it describes the retrieval probability at a particular time. For example, if a component is not retrieved during an attempted rehearsal, it may still be retrieved during a later rehearsal or recall. (Of course, at a later time the probability would be even less.) Another issue raised by Bower (1967) regards the value of the component after its initial value has been forgotten. Bower offered two possibilities: the value reverts to a null state (it simply goes away); or, the forgetting consists of replacing the original value of the component by some incorrect, non-null state. The null-state idea is similar to the decay notion of forgetting, while replacement is more consistent with interference. I n the present model, when a component is forgotten it reverts to the null state. Finally, it seems appropriate at this point to note an alternative forgetting mechanism that was considered in the initial formulation of the model. Instead of having the decay function represent the probability of retrieving a component as a function of how long it has been in memory,
Computer Simulation of Short-Term Memory
149
the function could represent the relationship between strength of the component and the time in store. By defining a strength threshold below which the component is not retrieved and above which it is retrieved, the same all-or-none retrieval of the component would be represented.8
D. MEMORYPROCESSES The routines representing the memory processes will now be described. These are the processes available t o S for taking in information from the environment, storing information in memory, retrieving information from memory, and outputting information t o the environment. Flow charts will be used to communicate the flow of events in the model. The aim is to indicate what the model says about these memory processes. After this description of the processes, a sample simulation will be presented. The routines that comprise the simulation program are hierarchically organized. A general routine a t the top of this hierarchy is referred to as the model executive routine (El), and it directs the overall flow of events in the model. This routine executes a series of subroutines which in turn have subroutines, and so forth. The model executive routine contains subroutines which represent the experimenter, the equipment, and the subject. The experimenter routines are responsible for setting up the sequences according t o specified criteria (e.g., the types of items, digits, or letters, in the sequences and the rate a t which they are to be presented), for monit'oring the clock, and for placing items into the window and removing items from the window a t the appropriate times. I n short, these routines conduct the experiment. A number of model routines that represent S will be described in detail, and Table I V presents the program name and a descriptive title of each. The routine with which this discussion will begin is a subroutine of E l and is referred to as the subject executive routine (Sl). The flow chart in Fig. 6 presents S1. At the time this routine is entered, the simulated clock has been set t o zero and an experimenter routine has put the first item into the window. The S1 routine is entered a t the beginning of a new sequence of items and, as can be seen from the flow chart, exited when that sequence has been output by S . Before each new sequence is presented to the simulated S , the clock is reset t o zero and the memory The introduction of retrieval thresholds would bring the model into contact with the recent work on decision processes in STM (the theory of signal detectability) by Wickelgren and Norman ( 1966). However, the application of decision theory to memory does not require a continuous-strength assumption since Bernbach (1967) has proposed a model based upon decision theory that assumes a finite-state memory (two states-available and unavailable). We intend to explore these alternatives in future versions of the model.
Kenneth R. Laughery
150
structures set up for the previous sequence are erased. This procedure is essentially a present boundary condition of the model and precludes its representing any proactive or retroactive interference that may occur between sequences. It is, therefore, a model of performance on a single “representative” sequence. TABLE IV PROGRAM NAMEAND DESCRIPTIVE TITLEOF MODEL ROUTINES Program name
El s1 52
53 s4 55 S6 s10 Sll 520 s22
540 S41 S42 S50 S52
SlOO SlOl S105 SllO
Descriptive title Model executive Subject executive Input stimulus Is a new item in window? Net sorting Store and update in short-term memory Respond Interitem activity Find location (M) of next short-term memory structure Store basic component Store link substructure Select item Generate list of consistent items Generate list of items consistent with most components Retrieve auditory components of item from dictionary Update Rehearsal Retrieve remembered components of an item Is rehearsal continued? Recoding
An important characteristic of the model concerns the nature of the item information that has been placed in the window by the experimenter routines. This information is the visual or auditory components which describe the item being presented. If the simulated experiment involves a visual presentation mode, the window will contain some combination of visual components (V’s). If the presentation mode is auditory, some combination of auditory components (P’s)will be in the window. For example, if the first item is the letter X and the presentation mode is auditory, the window will contain P24, P5, and P11 (see Table 111). 1. Noticing
From Fig. 6 it can be seen that the first process is for S to notice or take in the information in the window. What happens depends upon the
Computer Simulation of Short-Term Memory
151
mode of presentation being simulated. If the presentation is auditory (the components are P’s), a STM structure is set up in M1 for each of the components of the first item. If the presentation is visual, this structure is not set up immediately since visual components are not stored in STM. As indicated in Fig. 6, the input stimulus routine 52 is responsible for carrying out this input process.
INPUT STIMULUS
s2
1
+,
INTERITEM
SORT IN NET (RECOGNIZE IN LTM) S4
ACTIVITY SIO
A NO
FIG.6. Subject executive routine, S1.
With auditory presentation the STM structure is set up in the following fashion. In serial order, each of the components is copied from the window and combined with a time tag and decay function t o make up a list: The name of the list is then added to another list whose name is the structure stored in M1 (see Fig. 5 ) . The storing in memory of each individual component is carried out by a store basic component routine X20. The flow chart for S20 is presented in Fig. 7. This process works in a straightforward manner: A new sublist which will represent this component is generated; the name of the component is taken from the
Kenneth R.Laughery
152
window and placed on this sublist; the value of the simulated clock at this instant is copied and added to this sublist; and a decay function associated with this particular component is retrieved and added to this sublist. The store basic component routine is a process that takes time (a time-charge process). Consequently, at the end of the process an increment of time is added to the clock. This time increment is one of the
0 START
CREATE NAME OF COMPONENT
[PUT COMPONENT NAME ON LIST
I
PUT COPY OF CURRENT CLOCK VALUE ON LIST
I I
ASSOCIATED WITH COMPONENT AND PUT ON LIST
UBSTRUCTURE ON SHORT-TERM
IADD
TIME INCREMENT TO CLOCK VALUE
I
FIG.7. Store basic component routine, 520.
basic time parameters in the model. I n most of the simulation runs carried out to date, a value of 10 msec has been assigned to this parameter. (The reasons for using 10 msec will be discussed later.) The manner in which decay functions are associated with specific components is also straightforward. The model has a dictionary containing the various auditory and visual components and an associated decay rate ( B constant) which describes the decay for that component when it is first placed into STM. (In the current version of the model, of course,
Computer Simulation of Short-Term Memory
153
the visual components are not put into the STM.) Hence, the store basic component routine simply retrieves from the dictionary the decay function associated with the particular component and adds it to the sublist. As indicated in Fig. 2, the model contains a number of decay functions which differ in terms of the decay rate (the B constant). Actually, any one of five different decay processes, those with B values of 0.107, 0.129, 0.160, 0.208, and 0.269, have been used as the initial values of the decay process for the different components. The reason for assigning different decay rates to different components is based upon some ideas expressed by Wickelgren (1965a). Wickelgren noted that intrusion errors in the memory-span task tend to have an auditory component (a phoneme) in common with the correct item. He further noted that some auditory components account for more of these acoustically similar intrusions than others, and that this difference was related to the pronunciation time of the phoneme. Using this idea, the auditory components in this model were classified into five categories according to their pronunciation time as reported by Fletcher (1953). The decay rates were then associated with the various components in the dictionary such that the longer the pronunciation time of the component the slower the decay rate. Once the sublist representing the first component has been set up and added to the STM structure, the second component is copied from the window, a similar component substructure is set up, and this substructure added to the STM. This process repeats for each of the auditory components in the window. When all of the components representing the first it’em have been put into STM, S2 terminates. For example, if the first item was an X in the auditory presentation mode, the STM structure will contain the infdrmation indicated in Fig. 8. The time tags associated with the different component substructures, 0, 10, 20, reflect the serial nature of the processing as well as the fact that the time increment was added to the clock at the end of each execution of the store basic component routine S20. Also, because the time increment is added to the clock a t the end of the process, the value of the clock would be 30 when the input stimulus routine S2 is finished. 2. Item Recognition
The only thing the model has simulated thus far is S noticing the item in the window. This noticing does not include what would normally be thought of as S recognizing the item. He has merely “heard” the item. The next step in the model (the second box in Fig. 6) is the net sorting routine and represents a recognition or discrimination process. I n general, the process consists of taking a set of components and retrieving from the discrimination net (LTM) a vocabulary item on the basis of these information components. The flow chart in Fig. 9 describes the net sorting
Kenneth R. Laughery
154
routine S4. There are two inputs to this routine: a set of components (V’s and/or P’s); and a signal indicating no response ( a blank) is a legitimate alternative in the retrieval process. The output of the net sorting routine is the name of a vocabulary item if it is not a blank or an appropriate signal if it is a blank. The reason for the input signal regarding the legitimacy of a blank is to allow the model to represent various taskdependent processes. For example, in the response phase of the task, when S is attempting to recall the sequence of items, the task may require
v STRUCTURE
MI
COMPONENT SUB STRUCTURE
I
-
P24
-
0
-
lo
-
D24 ( 8 . 160)
COMPONENT SUB STRUCTURE
I
-
p5
-:B=
D26 269)
.
‘+FHTHT STRUCTURE
( 8 . 160)
FIG.8. Short-term memory structure after inputing first item (letter X).
that he output as many items as were input. In this situation the respond routine, which also uses this net sorting routine, would signal that a blank is not a legitimate alternative, and the model, like S , must guess (if the item is not remembered). In other parts of the model using the net sorting routine, such as the rehearsal process, a blank may be a legitimate alternative. From Pig. 9, the first step in the net sorting routine is a subroutine S41, which generates a list of all items in the net that are consistent
Computer Simulation of Short-Term Memory
155
with the input components. The consistency rule is that if an item contains all of the components in the input it is considered consistent. I n the example in which the first item is the letter X and the input components are the three auditory components that make up the letter X, the 541 routine would generate a list that contained only the letter X. However, if the item were the letter E (whose auditory description consists of the single phoneme P21), the S41 routine would generate a list of items consisting of all the letters or digits t,hat contain this particular phoneme (BCDEGPTVZS).
CLOCK VALUE
t
GENERATE LIST OF ALL ITEMS IN THE NET THAT ARE CONSISTENTWITH INPUT COMPONENTS (541)
‘GENERATE LIST OF ITEMS THAT MATCH THE GREATEST NUMBER OF INPUT COMPONENTS (S42)
OUTPUT ITEM 4
-
-
SELECT ITEM (5401
CLOCK VALUE
FIG.9. Net sorting routme, 54. (Recognition or discrimination process.)
The second step in the net sorting routine asks the question “Is the list of consistent items empty?” It is possible under certain conditions, which will be described later, that the input to the routine may be a set of components that is not consistent with any vocabulary items. When such a situation occurs the list of items generated by S41 is empty, and the model goes to routine S42 which generates a list of items that match the greatest number of input components. For example, if there are three input components, and no item is consistent with all of them, 542 searches for items that are consistent with any two of the input components and puts them on the list. If none of the items are consistent
156
Kenneth R. Laughery
with two components, S42 gathers up those items with any one component . At this point, a list of consistent items will have been generated and the model proceeds to the question “Is there only one item on the list?” If the answer is YES, the input components will have defined a unique item, and this item would be output by S4. Since this recognition process (net sorting) is one of the basic processes in the model, it has an associated time charge. This time increment is another parameter of the model and is added to the clock value as the last step in the S4 routine. I n most of the simulation runs to date, a value of 100 msec has been assigned to this time parameter. (The basis for using 100 msec will be discussed later.) Returning to Fig. 9, if the answer to the question “Is there only one item on the list?” is NO, another question is asked “Does the set of input components exactly match a single class of components of any vocabulary item?” For example, if the input components were P24 and P11, the list of consistent items generated by 541 would contain both S and X. The two input components, however, exactly match the auditory description of S but not X. Hence, the answer to the question would be YES and the letter S would be output. Similarly, if the input component were P21, the items BCDEGPTVZ3 would be on the list. The output would be the letter E because its auditory description exactly matches the single input component. (This rule creates a particular problem for the model which will be described later.) Note that an exact match with a single set of components (auditory or visual) of an item is a second way that the input components define a unique vocabulary item. The remainder of the net sorting routine deals with the situation in which the input components do not define a unique item, the NO branch from the last question. At this point, a select item routine S40 is executed. Since the procedures in 540 are straightforward, its flow chart will not be presented. Essentially what happens is the following. If the input indicates that a blank output is legitimate, and if the list of consistent items contains 10 or more (an arbitrary number at this stage in the development of the model), a blank signal will be output. On the other hand, if there are nine or less items on the list, or a blank is not a legitimate alternative, the 540 subroutine selects one at random. The last step in this branch of routine 54 is to increment the clock by a time value associated with this phase of the process. The time increment added under these conditions is not the same parameter as the time increment when the input components define a unique item (the value of that time increment was 100 msec). When a unique item is not defined by the input, some sort of decision process (represented by S40) is presumably added which will cause the overall process to take longer.
Computer Simulation of Short-Term Memory
157
Hence, the value of the time increment in the branch of the routine containing the additional selection process will be larger than the time charge when a unique item is defined. Various times have been assigned to this parameter in the simulation runs to date, with 300 msec being a typical value. After the net sorting routine, the next step in the subject executive routine, Fig. 6, is the question “Was the item the respond signal?” (Remember that the respond signal is one of the items in the discrimination net, and it will be uniquely defined when the appropriate component appears in the window.) If the answer to this question is YES, control will be transferred to the respond routine SS. The procedures in the respond routine will be discussed later. If the output of the net sorting routine is not the respond signal, control will be transferred to the store and update in short-term memory routine 55. 3. Updating Item in STM
In this branch of the subject executive routine, the possibility that the output from the net sorting routine was a blank is not considered. The reason is that in this branch there is no possibility that the blank alternative would have been selected. The execution of S4 has occurred immediately after a new item has appeared in the window, and as long as the item is in the window and S notices it he will not fail to recognize it. I n other words, all of the components of the item are available as input to the net sorting routine and thus lead to recognition of a unique and appropriate item. The possibility of a blank output occurs under other conditions which will be described shortly. When an item has been retrieved from the discrimination net, its auditory components are added t o the STM structure for that item by the 55 routine. If these components are already stored in the STM structure (as they would have been in this case by the 5 2 routine), then the information regarding these components will simply be updated. The manner in which the store and update in short-term memory routine 55 proceeds is shown in Fig. 10. The inputs to this routine are the location of the STM structure where information about the item is being stored (one of the M’s) and the name of the item. I n the example mentioned earlier, if 55 is updating the first item in the sequence, X, these inputs would be M1 and X (actually, this latter input will be L24, the program code for X). From Pig. 10, the first thing that happens in 55 is the execution of a subroutine 550 which takes the name of the item to a dictionary and retrieves the auditory components that make up that dimension of the item (for the letter X these are P24, P5, and P11). The next step is a
158
Kenneth R. Laughery
question “Is there another component?” which is simply the first step in a loop. Each of the components on the list generated by S50 are taken one at a time and dealt with. After all of the components have been handled, the answer to this question is NO and the routine is excited. Initially, the answer to the question is YES since none of the components have yet been considered. The first component is then taken from the
0 START
OF ITEM’S AUDITORY
THERE ANOTHER
IN THE MEMORY STRUCTURE 7
GENERATE NEW COMPONENT SUBSTRUCTURE AND PUT ON MEMORY STRUCTURE 6201
-
”
UPDATE COMPONENT SUBSTRUCTURE
FIG.10. Store and update in short-term memory routine, S5.
list and the STM structure scanned to determine if this component already exists in the structure. I n the present example, the auditory components of the letter X will have already been stored in the STM structure named in MI, and the answer to this question will be YES. There are other circumstances, however, in which the answer may be NO. For example, if the presentation mode in the experiment is visual, the input stimulus routine 52 does not set up the STM structure for the item, since the auditory cues are not yet available. Indeed, when the
Computer Simulation of Short-Term Memory
159
presentation is visual, the item must be recognized in LTM and the auditory cues retrieved before anything can be stored in the STM structure. Under this condition, auditory components of the new item would then be set up along with time tags and decay functions in the STM structure by routine S20 in the manner previously described. Note that when the presentation is visual, the auditory components stored in STM represent a secondary code of the input stimulus (cf. Bower, 1967). However, when the presentation is auditory, the stored components represent the primary code of the stimulus. There are other situations in which the components may not already exist in the memory structure. These situations arise during the rehearsal and will be mentioned when that process is discussed. If the component already exists in the memory structure, control is transferred to the update routine S52. The update routine makes two changes in the memory for that component. First, it replaces the time tag in the component substructure with a copy of the current value of the clock. I n other words, the decay of the component starts anew. The second change concerns the nature of the decay function. As indicated in Fig. 2 , there is a family of decay functions in the model such that the various functions differ in terms of the rate of decay. There are 30 different functions in all. When a component is updated, its decay function is replaced by a function whose rate of decay is the next slowest in the set. This change represents learning in the model. There is an additional point worth noting about the change in the decay rate that occurs during updating. As mentioned, the updating routine simply selects the function with the next slowest rate. This updating procedure will also occur during rehearsal, which means that the decay function for a component may be updated many times. However, the amount of change between functions is not constant. The differences between the slower decay rates are less than between the faster rates. I n fact, these differences are described by an exponential function also. The result of this procedure is that the first rehearsal contributes more t o remembering an item than the second rehearsal, the second contributes more than the third rehearsal, and so on. This change in decay rates represents a fundamental assumption about the memory system. The update routine is another of the basic processes in the model. As in the other basic processes, each time the update routine is executed a time increment is added to the clock. Since this process is in some sense a storage process, the time charge assigned to it in the runs obtained to date has been the same as the time for the store basic component routine S20. I n most of the runs the value has been 10 msec. When the S5 routine is complete, a point has been reached at which the simulated S has noticed the item in the window, recognized it as
160
Kenneth R. Laughery
something with which he is familiar and set up an appropriate STM structure for the item. I n the case in which the letter X was presented and was the first item in the sequence, the time on the clock a t the end of S5 read 160 msec (30 msec for the subroutines executed by S2, 100 msec for the net sort in 54,and another 30 msec for the updating in S5). As indicated in the subject executive routine, Fig. 6, the next step in the process is routine 53 which asks the question “Is a new item in the window?” 4. Checking Environment
At this point in the discussion, there are another two characteristics of the model that must be clarified : when and howS monitors the window, and the manner in which the experimenter routines interrupt to make changes in the window. Earlier, the point was made that S is viewed as a serial processor. With respect to noticing new items in the window, there are at least two procedures which might be adopted that seem consistent with the serial assumption. The first of these procedures assumes that S is interrupted, regardless of what he is doing, at that instant at which a new item is placed in the window. The second procedure assumes that S actively checks the window at frequent intervals. In the second alternative, S’ssequence of behavior would be interruptable only at these checkpoints. The procedure adopted is the second, and the rule for when these checkpoints will occur is that S will look for a new item in the window every time he has finished processing a single vocabulary item. This processing may involve taking in and storing a new item or rehearsing items that have already occurred. Thus, in Fig. 6, the window will be checked after the 55 routine has been executed. The second point concerns when the experimenter routines make changes in the window. The procedure is quite straightforward. Each time S is about to check the window t o see if a new item has occurred, an experimenter routine is f i s t executed which checks to see if any window changes should have occurred. If so, the changes are made. More precisely, this experimenter subroutine is executed as a first step in routine S3. The question as to how S knows that an item in the window is new is essentially finessed in the present model. I n short, a signal is output by the experimenter routine indicating whether or not the item in the window represents a change since the last time S monitored the window. Routine S3 simply senses this signal. Also, routine S3 is a time charge process and has a time parameter indicating the value added to the clock at the end of the routine (10 msec in most runs). The next step in the subject executive routine depends upon whether or not a new item has occurred in the window. If the presentation rate
Computer Simulation of Short-Term Memory
161
in the experiment is very fast (e.g., 10 items per second), the answer to this question is YES and the simulation branches back t o the f i s t box in Fig. 6, input stimulus S2. If, on the other hand, the interitem interval is longer than the time consumed by the basic time charge processes so far executed (160 msec in the current example), control is transferred to the interitem activity routine S10. 5 . Rehearsal
The interitem activity routine is an attempt to simulate what S does between taking in one item and waiting for the next item to appear in the window. I n this model, there are two types of activities that occur in this interval, rehearsal and recoding. The general procedure during the interitem period is shown in Fig. 11. First, the model attempts to rehearse all of the items that have occurred thus far, routine S l O O . The number of items rehearsed in this model is a function of the time available for r e h e a r ~ a l If . ~ there is still time remaining after all items have been rehearsed (a new item has not yet occurred in the window), an attempt will be made to recode or chunk items into larger units, routine 5110. If after the recoding process has been executed time still remains, control will branch back to the rehearsal routine again. I n both the rehearsal and recoding routines, the simulated S checks the window to see if a new item has occurred each time an item has been rehearsed or considered in the recoding process. The nature of the rehearsal process, routine SlOO, is described by the flow chart in Pig. 12. The first step in the process is t o go to the location of the first STM structure, MI. I n this model the simulated S always knows the location of the first item-he may not remember what the first item is, but he knows where to look for it. This knowledge makes M1 an anchor point (Fiegenbaum & Simon, 1962). The next step is to execute routine SlOl which examines the memory structure in M1 and retrieves all of the components that can be remembered. The procedure in SlOl is t o consider the components one a t a time and determine which can be remembered. This determination is made by subtracting the time value associated with the component from the current value of the clock, which results in the length of ime the component has been in memory since being stored or last updated. This time value is then used by the exponential decay function associated with the component t o compute a probability that the component is remembered. The process then generates R random number between 0 and 1, and on the basis of the This time-based procedure is different from the procedure proposed by Atkinson and Shiffrin (1968). They propose a rehearsal buffer with a capacity to regulate the number of items rehearsed.
162
Kenneth R. Laugherg
random number decides that the component is or is not remembered. For example, if the item has been in store 1 second and the probability of retrieval computed by the decay function is .SO and the random number generated is .65, the component will be remembered. If the random
FIG.11. Interim activity routine, S10.
number is .88, the component will not be remembered. The SlOl routine is the basic retrieval process in the model. Each time it is executed a time increment is added t o the clock. The value of this parameter in most runs has been 10 msec. The remembered set of components output by SlOl is then provided as input t o the net sorting routine, S4 in Fig. 9, which attempts to retrieve an item that is consistent with these components. As indicated in the third block in Fig. 12, a blank is a legitimate alternative in the rehearsal routine. I n the example, the SlOl routine would have retrieved some number of auditory components of the letter X from M1 (probably all of them since very little time has lapsed since the updating) and sorted them in the discrimination net. The next question is whether or not the output of the S4 routine is a blank.
Computer Simulation of Short-Term Memory
163
From Fig. 12, if the result is not a blank (i.e., an item is retrieved from the net), the auditory components that make up the retrieved item are added to or updated in the short-term memory structure by routine 55. If the S4 output is a blank, the S5 process is skipped. The next step in the rehearsal process is to execute the find location of next short-term memory structure routine S11. Having rehearsed the first item in the sequence, X, the model will now attempt to find the
P START
PICK UP ANCHOR POINT
RETRIEVE COMPONENTS
(SIOII
SORT IN NET
(S4) (BLANK 0 K 1
UPDATE COMPONENTS IN STM STRUCTURE (551
FIND LOCATION OF NEXT STM STRUCTURE (SIII
FIG.12. Rehearsal routine, S100.
item that followed X in the sequence. Since in the example no item has yet occurred following X, a discussion of S11 will be delayed briefly. It is sufficient to note here that S11 is capable of sensing the fact that the last item rehearsed was the last item presented. At this point in the rehearsal routine, the simulated S has finished rehearsing a sirgle item. Consistent with the rule cited earlier, the window is now checked to determine if a new item has occurred. The S3 routine
164
Kenneth R. Laughery
executes the experimenter subroutine which checks the window and makes any appropriate changes. If a change does occur, the S3 routine outputs a YES and the rehearsal routine is terminated. Control is then transferred back to the SlO routine, Fig. 11, which asks the question “Is activity continued?” The fact that a new item has appeared in the window results in a NO answer to the question which in turn causes the interitem activity routine S10 to be exited. Control then reverts to the subject executive routine, Fig. 6, which executes the S2 routine taking in the new item. Back at the rehearsal routine, Fig. 12, if a new item has not appeared in the window the question “Is rehearsal continued?” is asked by routine 5105. The answer to this question is determined in a straightforward fashion. If any of the items presented so far have not been rehearsed (a fact which S11 has determined), routine 5105 outputs a YES. Control in the rehearsal routine then transfers back to the SlOl routine which attempts to retrieve the components of the next item to be rehearsed. In this fashion, the rehearsal routine continues to rehearse items until one of two conditions occur: a new item appears in the window, or all of the items presented so far have been rehearsed. If all items have been rehearsed, the rehearsal routine is exited. Control is transferred back to the interitem activity routine, Fig. 11, and the question “ISactivity continued?”is asked. Since rehearsal is finished but no new item has occurred in the window, the answer to the question will be YES and the recoding routine SllO will be executed. In the example in which only the letter X has been presented, the rehearsal routine is exited immediately after rehearsing the X. If no new item has yet occurred in the window, the recoding routine is executed. The recoding routine, which will be described in detail later, notes that only one item has been presented and is exited quickly. If a new item still has not appeared in the window, control is transferred back to the rehearsal routine which again rehearses the letter X. The result of this procedure is that the letter X continues to be rehearsed until a sufficient amount of time passes and a new item occurs in the window. When the new item appears, control is transferred back to the subject executive routine which, as described above, takes in the new item by executing routine S2. The S2 routine now creates a new memory structure the name of which is stored in M2. Since the presentation is auditory, component substructures are set up for each auditory component of the new item. Each of these substructures contains the name of a component, the clock value when that component was stored, and an associated decay function. At this point, the last characteristic of the STM structure will be described.
Computer Simulation of Short-Term Memory
165
6. Links and Order Information By referring back t o Fig. 5 , it can be seen that in addition t o the sub-
structures for each of the auditory components making up the item, there is another substructure that is a part of the memory for the item. This substructure is referred to in the model as the link. The link substructure contains information about the order in which the items were presented. The manner in which it works is quite straightforward. Each time a new item occurs, the 52 routine, in addition t o setting up the auditory component substructures for the new item, goes back t o the structure for the previous item and adds to it information about where the next item is being stored. Specifically, the information added t o the structure includes the location of the new structure (in this case M2), a time tag which is the current value of the clock, and a decay function which defines the rate a t which the link will be lost from memory. This link information is added to the memory structure by a separate subroutine (522) of S2 in much the same fashion that subroutine 520 (see Fig. 7 ) creates substructures for the component information. The 522 subroutine is considered a basic process in the model and has the same time parameter as the 520 subroutine. I n other words, this single parameter defines the time charge for creating component substructures in memory (S20), creating link substructures in memory (522), and updating these substructures (552). From the above discussion, it can be seen that the model distinguishes item information from order information. This distinction obviously provides a mechanism that will allow the model to produce one of the most common findings of the memory-span procedure-the order error. An order error occurs when S recalls the correct items but in the wrong sequence. There are alternative mechanisms that might be considered to account for order errors. One possibility is simply t o store an input time tag with the item information (Yntema & Trask, 1963). This time tag would represent the time when the item was input and would not be changed by rehearsal. The recall procedure would involve scanning the cells and outputting the items on the basis of the input time tags. The availability of these tags might decay, thus producing the possibility of order errors. Conrad (1965) offers a second alternative which is simply to have the items stored in an ordered set of bins (memory cells). He argues that transposition errors can be accounted for on the basis of item errors substituting for each other. His argument, however, requires two additional assumptions : (1) S can scan the memory cells containing the items in order (which reflects the input order); and ( 2 ) when an item is not remembered, a substitute will be selected from a set of alternatives that is relatively small and consists primarily of recent items. (Incidentally, it will become clear later in the description of the S11 routine that
166
Kenneth R. Laughery
the fact that the items are stored in a sequence of cells called M1, M2, and so on has nothing to do with the order information. The cells could be filled randomly (M5, M2, M8, and so on)-as long as the location of a new item is linked t o the memory structure of the previous item.) The distinction between order and item information is not new; both Brown (1959) and Crossman (1960) made the distinction to account for the order error. The decision to introduce a decay process t o account for the loss of order information is supported by results reported by Wickelgren (1967). To return t o the subject executive routine, Fig. 6 : After the new item is input, its components are sorted through the discrimination net by routine S4. The output of this routine is the name of the new item, and the question is asked “ISthis item the respond signal?” If the answer is NO, the 55 routine updates all of the auditory components in the memory structure for the new item stored in M2. At this point the window is checked t o see if another new item has occurred, and, unless the presentation rate is quite fast, the answer is NO. The interitem activity routine S10 is then executed again. SlO begins by executing the rehearsal routine S l O O (see Fig. 12). Rehearsal starts with the structure in M1 which for this example consists of the components of the letter X. The SlOl routine retrieves as many of these components as can be remembered and these in turn are sorted through the net by S4. If the X is remembered, the components in the memory structure are updated by S5. Control is then transferred t o the find location of next STM structure routine S11 which can now be described in detail. 7. Using Links
The S11 routine is capable of doing three things. First, it looks a t the memory structure for the item just rehearsed and attempts t o find the link information. If there is no link structure associated with this item, the S11 routine outputs a signal indicating that the item just rehearsed is the last item taken in (the situation that was mentioned earlier). If a link structure does exist, 811 proceeds to its second step; namely it attempts to retrieve the location where the memory structure for the next item is stored. It does this by going to the link structure and retrieving the time tag, subtracting this time tag from the present value of the clock to determine how long the link information has been in memory, using this time in the decay function to calculate a probability of retrieval, generating a random number, and deciding whether or not the next location (M2 in the example) is remembered. (Notice that the specific name of the next location, M2, has nothing to do with whether or not it is remembered.) If it is remembered, the location is output and the link structure is updated. The updating is much the same as updating
Computer Simulation of Short-Term Memory
167
one of the component substructures. The time tag is reset to the current value of the clock, and a new decay function representing a slower decay rate is placed in the link structure. The third phase of the S l l routine occurs when a link structure exists but its name is not remembered. Under these circumstances, all the STM locations, M’s, containing memory structures are scanned, and those that have not yet been rehearsed during the current rehearsal process are retrieved. One of these locations is then selected on a random basis. I n this phase of S11 no updating occurs; also, no new link is formed between the previous item and the randomly selected location. The last step in the S l l routine is a time charge. Actually, there are two separate time charges in this routine. Regardless of which of the three alternatives (no link, retrieve link, pick link randomly) dictates the results of S11, a basic time increment representing the retrieval process is added. The time parameter that defines the increment is the same as the one used by routine SlOl (which has a value of 10 msec in runs to date). I n other words, the basic retrieval process is viewed as requiring the same amount of time regardless of whether component or link information is being retrieved. If, however, the link exists and is retrieved (the second alternative), an additional time charge for updating the link is added. This time charge is the same as the charge for the update routine S52 (also 10 msec to date). Most of the basic processes in the model have now been described. As one might imagine, the model continues the cycle of taking in a new item (which includes setting up its component substructures and the link substructure on the previous item), sorting the new item through the discrimination net and updating it, and then rehearsing the items that have already occurred, always starting with the first item in the sequence (the item stored in Ml). After taking in each new item and after rehearsing each of the previous items, the window is checked. When another new item occurs, the procedure starts over. This sequence of events (except for attempts t o recode which are yet to be discussed) is continued until the new item is the signal t o respond. When the respond signal occurs, control is transferred to the respond routine S6. 8.
Responding
The respond routine is essentially the same as the rehearsal routine except that an additional time charge is made after each item is retrieved from the net. The additional time charge represents the time it takes S actually to output the item (say it aloud or write it on paper). This respond time charge is another time parameter in the model, and in the runs to date a value of 500 msec has been used. There is an additional characteristic of the respond routine that might
168
Kenneth R. Laughery
be mentioned. The rehearsal routine allows for a blank item to be output by the net sorting routine 54.The respond routine contains an option for allowing or not allowing a blank. Computer runs in which blanks are not permitted represent an experimental situation in which S knows how many items were input and is required to output the same number (if he does not remember an item, he must guess). When blanks are allowed the model is presumably simulating a situation in which S can output as many items (more or less than were input) as he chooses. Since the respond routine is the last step in the subject executive routine, Fig. 5, the completion of S6 also causes the S1 routine to be exited. Control is then transferred back to the model executive routine E l which may do several things. If all of the sequences have been simulated for this run, the program may simply be terminated. If there are additional sequences to be simulated, E l resets the value of the clock to zero, erases all of the information from the memory cells rendering them empty, sets up the next sequence to be simulated, and again executes the subject executive routine. 9. Recoding
The one remaining part of the model that has not yet been discussed is the recoding routine SllO. The concept of recoding or chunking was first introduced by Miller (1956). It refers to the process of combining several items into a ((largerunit” which can then be processed as a single entity. Miller has shown that the memory span for items is greater when the items are chunked. At the current stage of the model’s development, routine S.110 has not been debugged completely, and in fact has not been included in any of the simulation runs to be discussed in this chapter. Nevertheless, some of the basic ideas as to how the recoding process fits into the model have been developed. I n considering the nature of the recoding process, it is immediately obvious that some rule must exist for defining the chunk. While Ss undoubtedly combine items into a chunk according to a great many rules, two have been considered in the context of the present model. The first rule is based upon meaningfulness. A meaningful chunk can be formed when a string of items can be combined into a unit with which S is already familiar. An obvious example is a string of letters that form a familiar word. Another example is the sequence of digits 1492 which has meaning to most Ss. A second rule for defining a chunk in the model is based upon pronounceability. A sequence of items can be combined into a single unit if the string has a pronounceable name. For example, the sequence of letters DUP can be recoded into a pronounceable chunk and processed
Computer Simulation of Short-Term Memory
169
as a single unit. Note that words are consistent with both the pronounceability and meaningfulness rules for recoding. Not all meaningful combinations, however, are readily formed into pronounceable chunks. Sequences such as FBI and IBM illustrate this point. The first effort to develop a chunking process in the model focuses upon the pronounceability rule. That is, a set of procedures have been spelled out that permit the model to recode strings of letters or digits into a pronounceable chunk. This recoding routine SllO is executed by the interitem activity routine, Fig. 11, after all of the items presented thus far have been rehearsed. Two pronounceability rules are employed to define a recodeable sequence of items: First, any set of three digits can be recoded into a pronounceable chunk ; and second, any set of three letters in which the first and third are consonants and the second is a vowel (i.e., any CVC) can be recoded into a pronounceable chunk. When the recoding routine is executed, the first thing that happens is that the items in the STM are retrieved one at a time starting with the item in M1. The retrieval process here is the same as in the rehearsal or respond routines; namely, the SlOl routine retrieves the remembered components, the 54 routine sorts the components through the net and outputs a vocabulary item, and the S5 routine updates the memory structure. When the items in M1, M2, and M3 have been retrieved, the recoding routine examines them to see if they satisfy either recoding rule (i.e., they are all digits or they form a CVC). Incidentally, after retrieving each of these items the window is checked to see if a new item has yet appeared. If a new item does appear, the recoding process is immediately interrupted, and control reverts back to the S2 routine which will then process the new item. If either the first or second rule is satisfied by the string of three items, the three items are recoded into a pronounceable unit. If neither rule is satisfied (e.g., all three items are consonants) the recoding routine moves ahead and picks up the item in M4. Now the items contained in M2, M3, and M4 are examined to determine if they can be recoded. In this fashion, the recoding routine continues t o consider all combinations of three items until it finds a recodable set (at which point it goes ahead and recodes the items into a chunk), or until it has considered all of the items presented thus far, or until a new item occurs in the window and interrupts the process. What does it mean in the model to recode a set of three items into a chunk? A number of things happen. The model first goes to LTM for each of the three items and retrieves a set of auditory components which represents the contribution the item makes to the name of the new chunk. The reader may return to Fig. 3 and recall the remarks describing the information in LTM. It was mentioned that a third sublist associated with each item in LTM contains information about recoding. Figure 13
Kenneth R.Lttughery
170
presents an example of one of these sublists (for the letter A). Two types of information are contained in the recoding sublist. The first is a symbol that indicates that the item is a digit, vowel or consonant ;from this the recoding routine obtains its information to determine if a sequence of three items is recodeable. The second type of information in the substructure is the auditory components that this item contributes to the chunk. LO
- AUDITORY COMPONENTS P39
- VISUAL COMPONENTS v5 v7 V8
- RECODING INFORMATION
t
TYPE OF ITEM DIGIT, VOWEL,OR CONSONANT AUDITORY COMPONENTS CONTRIBUTED TO CHUNK
L
2nd POSITION P25
etc
FIG.13. Recoding information in long-term memory.
To understand the value of this information it is necessary to observe what the model is attempting to do. The recoding routine is collecting a set of auditory components for each of the three items which it will then put together to form a set of auditory components that will describe the sound of the name of the new chunk. For example, if the sequence of items is DUP, the recoding routine goes to the recoding sublist for item D and retrieves those auditory components contributed by the letter D when it appears as the first letter in a chunk. Next, the recoding routine
Computer Simulation of Short-Term Memory
171
goes to the recoding sublist for the letter U and retrieves the auditory components that the letter U contributes to the chunk when it is in the second position. Finally, it goes to the recoding sublist for P and retrieves the auditory components contributed by P when it is in the third position. Note that vowels only contribute to forming new chunks when they are in the second position (a CVC). Hence, the sublist for the letter A in Fig. 13 contains only information that concerns A’s contribution when it is in the second position. For consonants, this sublist is broken into two parts, one containing information for when the consonant is in the first position and another containing information for the third position. l o The sublist for digits, on the other hand, contain information regarding the auditory components contributed by the digit when it is in any one of the three positions. The specific set of auditory components that the recoding routine gathers up for DUP is Pa, P30, and P1. This set of auditory components defines the name of the new chunk. The question now is “What is done with the chunk?” The answer is that several things occur. First, a new structure in LTM is established that represents this chunk. If this is the first chunk formed from the sequence of items, it represents the 38th item in LTM. Unlike the other items in LTM which have three substructures (visual, auditory, and recoding information), the structure representing the chunk has only an auditory substructure. Naturally, this substructure contains the names of the chunk’s auditory components. A second set of events concerns changes in STM brought about by the recoding process. Without going into detail, a memory structure representing the chunk is added to the location containing the memory structure for the first item in the chunk. For epample, if DUP were the first three items in the sequence, a memory structure for the new chunk would be added to the structure stored in M1, which is where the information about D is located. The chunking structure does not replace the structure for the letter D, but instead becomes another part of the overall structure in M1. The details of how this structure is set up will not be presented here. It is sufficient to note that the memory structure for the chunk contains substructures for each of the auditory components which in turn contain the name of the component, a time tag as to when it was stored, and a decay function describing its rate of loss. I n addition, this memory structure for the chunk contains link information which connects it to the location of the next item in the sequence following the last item of the chunk (in this example, M4). l o The acoustic contribution to the chunk made by a consonant in the first or third position may be a function of the vowel that appears in the second position. However, for this initial formulation of the recoding process such effects are being ignored.
172
Kenneth R.Laughery
It should now be possible to see how the recoding process can bring about an improvement in the overall performance on the task. When the model returns to the memory structure at a later time to rehearse or to retrieve the items for responding, it starts with M1 and attempts to retrieve the chunk. It does this by retrieving the auditory components of the chunk that are remembered, sorting them through the net and recognizing the chunk (which is now stored in the net structure representing LTM), updating the components of the chunk, and then going to the link structure to find the location of the next item. If successful, it finds M4 and proceeds in the usual fashion. Since the number of components that make up the chunk is less than the sum of the components that make up the individual items, and since in dealing with a chunk there is no search for memory locations as when the items are retrieved individually (routine S l l ) , the time required to rehearse the chunk is considerably less than the time required to rehearse, on an individual basis, the items that make up the chunk. Thus, the existence of chunks adds considerably to the efficiency of the overall processing. Of course, during the respond routine it is necessary to decode the chunk into its three items. This procedure has not been worked out in detail in the present version of the model. It seems reasonable to propose a process in which the auditory components of this chunk are taken as input, the LTM scanned, and items retrieved that are consistent with these components. Such a procedure would predict certain types of errors. For example, if the sequence KAV had been chunked, during the response phase two sets of items might be retrieved that would satisfy the auditory components of this chunk, CAV and KAV. Consequently, one would expect the letter C to occur as an intrusion error for K under such conditions. A study by Pinkus and Laughery (1967) indicates that this is precisely what happens. The above discussion of recoding has focused upon pronounceability as the rule for combining items. To date, relatively little attention has been given to recoding on the basis of meaningfulness. Frankly, it appears that such procedures will probably lead to a completely new dimension of complexity in the model. However, one or two rather obvious statements can be made. It seems reasonable that a meaningful chunk is one that already exists in the L T M structure. Hence, recoding on the basis of meaningfulness seems to involve recognizing an already existing chunk, in contrast to the pronounceability situation in which a new chunk must be formed. A second obvious point is that meaningfulness of a chunk surely implies that there are definitional dimensions associated with it other than the auditory dimension describing the way it sounds. For example, the chunk IBM must conjure up information such as “computers,” “a large corporation,” and so on.
Computer Simulation of Short-Term Memory
173
E . TYPESOF ERRORSTHATTHE MODELCAN MAKE
It is worthwhile to note the kinds of mistakes the model is capable of making and what the outcomes of these mistakes may be. There are several types of mistakes or errors which, of course, can lead to differences in performance. A first type of error is one in which S simply misses the appearance of the item in the window. This can occur in a situation in which a new item comes into and is taken out of the window during an interval when the simulated S does not look at the window. This type of event can and has occurred in runs where fast rates of presentation were being simulated. When such an event occurs, the model simply continues its normal pattern of behavior as though nothing has happened in the window. The occurrence of this type of error has been reported in studies by Aaronson (1965) and Norman (1966). A second type of error results from the failure to retrieve components from STM. The previous discussion has described the manner in which these components are retrieved by routine SlOl on the basis of the decay function and length of time in store. Indeed, this type of loss represents the basic mechanism for forgetting in the model. One frequent result of this failure to remember components is the retrieval of the wrong item from the discrimination net, and it occurs when the input components do not define a unique item. For example, if the components stored in the STM structure are those defining the letter F, and if the only characteristic retrieved from STM by SlOl is P24, there are six items (among the letters) that are consistent with this input-FLMNSX. When such a situation occurs, the model selects from among the alternatives on a random basis. From this description it can be seen that the intrusion errors that the model makes tend to have auditory components in common with the correct item. This commonality of components has been demonstrated in several experiments (e.g., Conrad, 1964 ; Wickelgren, 1965a). Another interesting point concerns the outcome of an incorrect selection from the net. As was pointed out in the discussion of rehearsal, when an item is retrieved from the net the auditory components of that item are updated in the STM structure. When an incorrect item is selected, however, other things can happen. Suppose the letter F was originally stored and at the time for rehearsal the only component retrieved was P24. Suppose further that the item output from the discrimination net was the letter X. When the model updates the STM structure, two things will happen: The component that X and F have in common, P24, will be updated (the decay function changed and the storage time reset to the present value of the clock) ; and those characteristics that X has, but F does not have, will be added to the structure. These components
174
Kenneth R. Laughery
are P5 and P11. The component of F that is not a part of X, P7, is not updated. I n consequence, the letter X will probably be retrieved the next time this STM structure is examined. However, it is also possible that the letter F will be retrieved since the P7 component is still in memory Thus, because a mistake is made one time does not necessarily mean that the correct item will not be retrieved at a later point in time. This example also makes it possible to discuss a point that was postponed in the description of the net sorting routine 84. Refer to the second box in Fig. 9 and the question “ISthe list of consistent items empty?”; the point has been made that it is possible to have as input to S4 a set of components with which no item in the vocabulary is consistent. The above example in which X was erroneously rehearsed for F provides a situation in which such an event could occur on a later rehearsal or during responding. Specifically, if during a later cycle through the rehearsal process the components retrieved from this memory structure were P24, P7, and P11, no item in the vocabulary would satisfy the consistency rule. Consequently, the model would attempt to retrieve vocabulary items that have two of these components in common. The items satisfying this criteria would be F, S, and X, one of which would then be selected randomly. A third type of mistake the model makes has to do with the ordering of items. Order information is built into the memory structure by attaching to the structure for a particular item linkage information regarding the location of the next item (see Fig. 5). If in the SlI routine the situation should occur, as it frequently does, that the location of thenext item is not retrieved, the process selects randomly from among those locations containing a structure not yet dealt with in the current rehearsal or respond process. Thus, the model is capable of retrieving the correct items but obtaining them in an incorrect order-a result that is clearly consistent with actual performance data. To summarize the types of errors that the model can make: it can miss an item if the input rate is very fast; it can retrieve an incorrect item from the discrimination net that will tend to have auditory components in common with the correct item; and, it can make mistakes in the order of the items.
IV. A Sample Simulation A detailed description of the model’s performance (a trace) on a single sequence of items will now be presented. The sequence of items was the letters DYENZTUG. The experimental conditions simulated were an auditory presentation and a rate of 1.O seclitem. Also, the discrimination net contained only letters, which represents the situation in which S
Computer Simulation of Short-TermMemory
175
knows in advance that all items are letters. The specific values of the time parameters for this run are shown in Table V. The values assigned to the time parameters for this sample simulation and the simulated studies described in the next section were estimated on the basis of various data. Gregg and Olshavsky (1966) and Landauer (1962) provide results indicating that Ss implicitly recite about four letters or digits per second; naturally there is some variance in the data, but the results of most Ss are between three and five items. Four time parameters in the model contribute to the total time to rehearse (implicitly recite) an item : basic store and update, basic retrieval, net sortunique item, and check window. Since the data obviously do not provide the information needed to fit four free parameters, some “reasonable estimates” were made. Specifically, a value of 100 msec was assigned to the net sort parameter and a decision was made t o equate the values of the other three parameters. Given the processes in the model, the value of the three parameters was calculated to be 10 msec.II TABLE V VALUESOF TIMEPARAMETERS FOR SAMPLE SIMULATION AND SIMULATED STUDIES1 AND 2
Parameter
Value (msec)
Basic store and update Basic retrieval Net sort-unique item Net sort4ecision process Respond (output item) Check window
10 10 100 300 500 10
Routines using parameter
s20,s22, s52, s11 8101, S l l S4 s4
S6 53
The 500-msec value of the respond parameter was based upon unreported data from our own laboratory. The Ss were simply required to write digits and letters. Time measures of their performance indicated that 500 msec was an appropriate value. The value of the net sortdecision process parameter, 300 msec, is at this point only a “reasonable The 10-msecvalue was calculated a t a point in the model’s development when it was assumed that both visual and auditory components are stored in and retrieved from STM. In the current version of the model in which only auditory components are stored, the 10-msec value is too low. That is, with fewer components the time required to store and retrieve an item is less, and too many items are rehearsed. A recent calculation (carried out after tha computer runs were obtained) indicates that 25 msec is an appropriate value.
176
Kenneth R. Laughery
estimate.” In fact, it may not be appropriate to view this parameter as a constant, but rather a variable whose value is a function of other factors such as the number of alternatives. Some relevant events of the sample simulation are presented in Table VI. The subject executive routine S1 was entered when the clock value was zero and the window contained the auditory components, P4 and P21, of the first letter. The 52 routine took in the two components and set up an appropriate STM structure in location M1. Step 2 (cell 2 in Table VI) involved sorting the two auditory components through the discrimination net and recognizing the letter D. Following this, the two components were updated by the S5 routine. The window was then checked, and since no new items had appeared, control passed to the interitem activity routine SlO. Although Fig. 11 shows the recoding process to be a part of the interitem activity routine, it has already been pointed out that this particular process has not been a part of the model in most of the runs to date. Consequently, the interitem activity routine consisted of recycling through the rehearsal routine until the appearance of a new item in the window caused it to be exited. As described in cell 5 of Table VI, the item located in M1, D, was then rehearsed six times with the window monitored after each rehearsal. After the last rehearsal of D, the letter Y appeared in the window (Table VI, cell 6) and control transferred back to the 52 routine which took in the auditory components and set up an appropriate STM structure in location M2. I n addition, the 52 routine set up a link substructure and attached it to the memory structure in M1. This link contained information regarding the location of the next memory structure, M2. The auditory components of Y were then sorted through the discrimination net by 54,the letter Y was output, and the auditory components in M2 were updated by 55. Since no changes had occurred in the window, control again passed to the interitem activity routine SlO. At this point (Table VI, cell 7) the letter D was rehearsed three times and Y was rehearsed twice in the order DYDYD. Both components of each item were retrieved for each rehearsal, so the input to the net sorting routine S4 defined a unique item on each occasion. The result was that the net sorting routine requires less time and, thus, more rehearsal was possible. I n other words, the more the model remembers about the individual items, the faster the rehearsal takes place. Of course, the window was checked after each item rehearsal. Next (Table VI, cell 8),the letter E appeared in the window and control was again transferred to 52. The STM structure was set up in M3 and the P 2 1 component defining E was sorted through the net. Now, all of the letters containing P 2 1 (BCDEGPTVZ) were consistent with this
TABLE VI SAMPLE SIMULATION OF LETTERSEQUENCE DYENZTUG
Table cell
Window at beginning
Operations
Short-term memory structures at end (cell, component, time tag, decay function)
Clock at end (msec)
e 0
1
2
3
P4, P21
P4, P21
P4, P21
54: Take in components of D
54: Sort in net and output D
55 : Update components in M1
3
M1
M1
w
e
P4 P21
D25 10 D22
20
P4 P21
0 D25 10 D22
120
P4 P21
120 D24 130 D21
0
8
FE s
M1
m 140
c3
4
5
6
P4. P21
P4, P21
P35, P40
5 3 : Check window
%
M1
3
P4 P21
120 D24 130 D21
150
P4 P21
1020 D18 1030 D15
1050
SlOO and S3: Rehearse D six times, checking window after each rehearsal
M1
52 : Take in components of Y S4 : Sort in net, output Y 55:Update S3 : Check window
M1 M2 P4 P21
1080 D21 M2 1020 D18 P35 1030 D15 P40
1180 D25 1190 D22
F ! 4
w
1210
-a
4
TABLE VI-continued
I-
-3
00
Table cell
Window at beginning
7
P35, P40
8
9
10
P2 1
P24, P19
P12, P21
Operations
Short-term memory structures at end (cell, component, time tag, decay function)
SlOO and S3: Rehearse D three times and Y twice in order DYDYD. Check windows after each rehearsal
M1 M2 1880 D19 M2 2010 D15 P35 P4 P21 2020 D12 P40
Take in, recognize, and update E. Rehearse DYEDY. Check window after each rehearsal.
M1 M2 P4 P2 1 M3 P21
2810 D20
M1 M2 P4 P2 1 M3 M4 P2 1
3480 4100 4110 3800 3770
Take in, recognize, and update N. Rehearse DYEND. Check window after each rehearsal.
Take in, recognize, and update Z. Rehearse DYENZ. Check window after each rehearsal.
Clock a t end (msec)
1840 D23 1850 D20 2040
M1 M2 P4 P21 M3 M4 P21
3010 D17 M2 M3 2690 D20 2970 D13 P35 3140 D21 2980 D10 P40 3150 D18
4440 4400 4410 4760 4730
D16 M2 M3 D11 P35 D8 P40 D20 M4 D19 P24 P19
w
B 3170
?d 3650 D19 3610 D20 3620 D17 3930 D22 3940 D21
15
(D
C rn
P
4 4130
D15 M2 M3 4610 D18 D10 P35 4570 D20 D7 P40 4580 D17 D19 M4 M5 4930 D20 D18 P24 4890 D22 P19 4900 D21
M5 P12 5050 D22 P2 1 5060 D20
&
5080
TABLE VI-continued Table cell
Window at beginning
11
P3, P21
12
P36,D30
Clock Operations Take in, recognize and update T. Rehearse DYENZ. Check window after each rehearsal.
Takein,recognize,andupdateU. Rehearse DYENZT. Check window after each rehearsal.
Short-term memory structures a t end (cell, component, time tag, decay function) D14 M2 D9 D6 D18 M4 D17
M1 M2 P4 P21 M3 M4 P21
5380 5340 5350 5700 5670
M5 M6 P12 P21
5120 D21 M6 6000 D21 6010 D19
M1 M2
6340 6300 6310 6660 6630
P4 P21 M3 M4 P21
M5 M6 P12 P21 M7 P36 P30
M3 P35 P40 M5 P24 PI9
6510 6470 6480 6830 6790 6800 6350 7130 7140
(msec)
D17 D19 D16 D19 D21 D20
P 3 5100 D25 P21 5110 D21
D13 M2 M3 D8 P35 D5 P40 D17 M4 M5 D16 P24 P19 7000 D20 M6 M7 6960 D20 P3 6970 D18 P21 6160 D24 6170 D21
5550 5510 5520 5870 5830 5840
at, end
m 0
a
w
c
E
F 6030
D16 D18 D15 D18 D20 D19 D21 D24 D21
6 $=2
m
tz
8
?
H
B 7160
TABLE VI-continued Table cell
Window at beginning
13
P16, P21
14
15
P44
Operations Take in, recognize, and update G . Rehearse DYENZ. Check window after each rehearsal.
Take in and recognize respond signal. Branch to respond routine (S6) Respond with : P4, P21-D P35, P40-Y P21-E P24, P19-N P12-z P2l-E P36, P30-U None (guessed)--&
M
00
Short-term memory structures at end (cell, component, time tag, decay function)
M1 M2 P4 P21 M3 M4 P21
7470 7430 7440 7790 7760
M5 M6 P12 P21 M7 M8 P36 P30
7000 8090 8100 7180 6160 6170
D12 M2 M3 P35 D7 P40 D4 D16 M4 M5 P24 D15 P19 D20 M6 M7 P3 D19 P 21 D17 D21 M8 D24 P16 D21 P 21
No changes from above
No changes from above
7640 7600 7610 7960 7920 7930 6350 7130 7140
D15 D17 D14 D17 D19 D18 D21 D24 D21
7290 D23 7300 D21
Clock a t end (msec)
0
Computer Simulation of Short-Term Memory
181
input. However, E was defined uniquely because of the exact match with its auditory dimension. Hence, E was input by 54, and the M3 memory structure was updated by S5. I n the interitem activity routine, rehearsal began with the D. This time, only the P4 component was retrieved by SlOl. Since this input is consistent with both D and W, a choice between these two letters was made on a random basis in the S4 routine. I n this particular instance the correct letter, D, was chosen and both the P4 and P21 components in M1 were updated. Next, the Y and E in M2 and M3, respectively, were retrieved and rehearsed, followed by another rehearsal of D and Y. The letter N then occurred (Table VI, cell 9) in the window and was taken in, sorted through the net, and updated. Then the letters DYEND were rehearsed, with all components being retrieved for each item. The letter Z appeared in the window (Table VI, cell 10) and was taken in, sorted through the net, and updated. The rehearsal routine then rehearsed DYENZ. All components for the first four letters were retrieved. I n attempting to rehearse the Z only the P12 component was retrieved from M5. Since Z is the only item consistent with P12, howeTrer, the Z was output by S4 and both components in M5 were updated. The letter T then occurred (Table VI, cell 1 l),was taken in, recognized, and updated. The letters DYENZ were rehearsed with all components for each letter retrieved. There was not sufficient time between items for the T to be rehearsed. The next item was the letter U (Table VI, cell 12) which was taken in, recognized, and updated. Actually, the P36 and P30 components of U are also consistent with Q and W. However, the exact-match rule defined U uniquely. The first six items, DYENZT, were then rehearsed with all components for each item being retrieved. The last item, G, then occurred in the window (Table VI, cell 13) and was taken in, sorted, and updated. The letters DYENZ were rehearsed, after which the respond signal appeared in the window (cell 14). The output of the net sorting routine was then recognized as the signal to respond and control was transferred to routine S6. The respond routine started at MI and, in much the same fashion as the rehearsal routine, attempted to retrieve each of the items in the sequence. The one additional process in the respond routine was the act of outputting the item. This was represented by simply adding an increment of time to the clock which simulates X writing down the item. This respond time increment was 500 msec in the present run (Table V). The sequence of events during the response period is indicated in the last cell of Table VI. It can be seen that on the first four items the performance was perfect, all components were retrieved and the correct items output. I n M5, only the P12 component was retrieved, but since
182
Kenneth R. Laugherg
Z is the only letter consistent with this component the correct item was output. I n M6, the only component retrieved was P21. Although there are nine items in the vocabulary (BCDEGPTVZ) consistent with this component, the exact-match rule caused the latter E to be selected. Both the P36 and P30 components were retrieved from M7, and a correct response, U, was made. The SlOl routine was then completely unsuccessful in attempting to retrieve components from the structure in M8. Neither the P16 nor the P21 components of the letter G were retrieved. I n this particular simulation run, the respond routine was operating in a mode in which the simulated S had to respond with as many items as were input. Hence, a blank output from the net sorting routine was not a legitimate alternative. The result was that one of the 26 letters, Q, was selected on a random basis. If the simulation were to continue beyond this point, the subject executive routine would realize that it has finished with the current sequence and control would be transferred back to the model executive routine. This routine would reset the clock to zero, remove all of the information from the memory cells M1 through M9 (the output signal was located in Mg), check the LTM tci determine if any chunks had been added, and, if so, erase them, set up the next sequence to be simulated, and then transfer control back to the subject executive routine S1. Note that in the sample simulation no order errors were made. That is, the S11 routine that is responsible for finding the location of the next item in the sequence was always able to do so. Order errors, however, frequently do occur. For example, in another sequence in which the input was XRAQUVEP, the final output was XRAQVEUP.
V. Some Simulation Results A number of simulation runs have been carried out on the computer to explore the model’s performance. I n these runs, several task variables were manipulated in the memory-span situation to generate a variety of conditions that represent standard procedures in STM research. The value assigned to the time-charge parameters in these runs were the same as those presented in Table V.
A. SIMULATED STUDY NUMBER1: A GENERAL EXPERIMENT The first experiment simulated with the model included four variables : symbol set (letters and digits randomized without replacement) ; presentation rate (0.3, 0.5, and 1.0 seelitem); presentation mode (visual and auditory); and length of sequence (6, 8, and 10 items). I n this study, the discrimination net contained only those items that were used in making up the sequence. I n other words, when the symbol set was letters,
Computer Simulation of Short-TermMemory
183
only A-Z was in the net (along with the respond signal, of course), and when the sequence was made up of digits, only 0-9 was in the net. The 36 experimental conditions were simulated with approximately 20 sequences included for each. The output of the model was analyzed in several ways. One criterion employed was the percent of correct items. Note that this type of analysis can be done more precisely on the output of the model than on the results of an experiment using humans. I n the model, it is possible to separate completely the order errors from the item errors simply by having the model output the appropriate information (it is possible to determine from which memory cell a particular item was retrieved). I n other words, even though the model may mix up the order in which it looks for items in the different memory cells (it loses some of the link information), one can determine exactly whether the item stored in a particular memory location was output correctly or incorrectly. The analysis presented here took advantage of this information. The percent of correct items across all sequences in each condition is presented in Table VII. PERCENT CORRECT FOR
ALL
TABLE VII 36 CONDITIONS IN SIMULATED STUDYNUMBER 1 Digits Visual
6 8 10
Auditory
0.3
0.5
1 .o
0.3
0.5
1.o
87 83 73
91 78 75
88 88 69
95 84 80
90 80 80
98 84 70
Letters Visual
6 8 10
Auditory
.o
0.3
0.5
1 .o
0.3
0.5
1
75 64 60
73 69 44
73 65 51
80 73 67
65 68 59
77 74 63
In general, the effects of the variables are quite similar to results reported in the literature. (Several good reviews of relevant STM literature are available: Keppel, 1965; Melton, 1963; Peterson, 1963; Posner, 1963; Postman, 1964.) The focus of this part of the discussion of the model’s performance is on the qualitative effects of the variables manipulated, rather than quantitative comparisons between the number of
184
Kenneth R. Laughery
errors made by the model and similar measures obtained in experiments. The reason for focusing upon the qualitative aspects of the results is that although a sufficient body of consistent evidence regarding the relative effects of the particular variables exists, the results have not been quantitatively consistent. I n fact, there are sufficient data on memory-span performance, and sufficient variability in the data, that for a “reasonable” model one can almost certainly find data that will provide a good match. For example, a memory-span study (Laughery, 1963)carried out using army recruits as Ss was later done with college students resulting in across-the-board higher level of performance. It is impossible to know at this stage which Ss are being simulated. One result obtained from the simulated data presented in Table V I I is the effect ofthe length-of-sequencevariable. If the results are collapsed across the other three variables, the percent correct is 84, 77, and 67 for 6, 8, and 10 items, respectively. This inverse relationship between the number of items in the sequence and the percent correct is consistent with a variety of studies reported in the literature (e.g., Laughery & Pinkus, 1966; Mackworth, 1962, 1963). There is also evidence that performance is better with digits than with letters (e.g., Laughery, unpublished data; Mackworth, 1963).The simulation results show that 80% of the digits and 68% of the letters were recalled correctly. In the model, this difference between digits and letters can be accounted for by the fact that there are more auditory components in the names of the digits than in the names of the letters. This larger number makes it likely that more components are retrieved from STM and available as input to the net sorting routine S4 which in turn increases the probability that the correct item is retrieved from LTM. In other words, the digits are better discriminated than the letters. The effect of presentation mode indicates that performance is only slightly different for the auditory and visual conditions, 76 and 74%, respectively. Laughery and Pinkus (1966) have reported a small but significant difference in the memory span attributable to presentation mode, with performance better for auditory. Murdock (1966), using a paired-associate procedure, reported no main effect attributable to presentation mode, as did Laughery et al. (1967), using a memory-span recognition task. Actually, very little difference attributable to mode is expected from the model. The only factor leading to a difference is that in the auditory presentation the components are stored in STM by the input stimulus routine S2 and then, following the net sorting routine 54,the components are updated. The effect of this procedure is the same as giving the item an additional rehearsal when the input mode is auditory. This analysis of the model leads to the prediction of an interaction
Computer Simulation of Short-Term Memory
185
between presentation mode and presentation rate in the model’s output. More specifically, the auditory mode should lead to better performance as fast presentation rates when an additional rehearsal may be crucial, but not at slow rates when many rehearsals are being carried out. The results of the model show that indeed the difference attributable to mode is greatest at the fastest rate (0.3 second). Furthermore, this interaction is consistent with the results of two separate studies (Laughery et al., 1967; Laughery & Pinkus, 1966). The fourth variable, presentation rate, has little effect on the performance of the model. The results indicate percentages of 77, 73, and 74, for the 0.3-, 0 . 5 , and 1.0-second rates, respectively. There are data reported in the literature (Laughery et al., 1967;Pollack, 1952)indicating that performance improves as the presentation rate becomes faster than 0.5 second. Similarly, data have been reported (e.g., Bergstrom, 1907; Laughery et al., 1967) that show performance improving as the rate becomes slower than 0.5 second. However, another study using a paced recall (Conrad & Hille, 1958) has shown that performance decreased as the rate slowed from .67 to 2.0 second. This result has been confirmed by Conrad (1958) and Fraser (1958). The decay theory explanation of the results of the various studies is based upon time in store and rehearsal. Presumably, at rates fast enough to preclude rehearsal, performance improves as the rate becomes faster. The data seem to indicate that at rates faster than 0.5-second rehearsal does not occur. At slower rates, when rehearsal and other processes (such as recoding) are possible, the slower the better. The explanation for the results of the paced-recall studies mentioned above is that the strict control over S’s time limited his ability to rehearse and recode. There is a tendency (although it is certainly nothing more than a tendency) for the simulated results to be consistent with the data so far as differences between the 0.3- and 0.5-second rates are concerned-77 and 73%. A possible explanation as to why the model’s output in this simulated study is not better at 1.0 second than at 0.5 second concerns the nature of the interitem activity in the model. In all the simulated runs reported here, the interitem interval was devoted exclusively to rehearsal. There is evidence (Laughery & Pinkus, 1968; Pinkus & Laughery, 1967)that Ss are able to recode at a presentation rate of 1.0 second. Thus, if the recoding routine were added to the model and the time parameters set so that recoding can occur at this rate, the model should reflect an improvement in performance between 0.5 and 1.0 second. In addition to analyzing the simulated results with respect to the effects of the four manipulated variables, the output of the simulation was also examined in terms of intrusion errors. This analysis involved
186
Kenneth R.Laughery
noting which items were substituted for other items when errors occurred. Of the 40 most frequent substitutions, 33 were pairs that had a phoneme in common. By way of comparison, the matrix of listening errors published by Conrad (1964) indicated that 29 of the 40 most frequent substitutions had a common auditory component. An analysis of the percent of total errors which were auditory intrusions showed 61% for the model and 49% for Conrad’s data. Hence, it appears that the model makes slightly more errors that have an auditory component in common with the correct item than did Conrad’s Ss. There was one rather startling inconsistency in the letter intrusions ; namely, the frequent substitution of the letter E for B, C, D, G, P, T, V, and 2. The data in the literature indicate that the intrusions between pairs of items in this set is much more evenly distributed. Because of the exact-match rule for defining s unique item in the net sort, however, the only intrusions that can occur between items in this set in the model is the letter E being substituted for the others (unless of course, all components are forgotten and one of the others occurs by chance). The same problem exists with respect to the letters U (in the set UQW) and I (in the set IY). The intrusion errors between digits were also analyzed. For the model’s output, 67% of total errors were auditory intrusions. The same calculation for a standard memory-span experiment (Laughery , unpublished data) showed 25% auditory intrusions. Here, the model appears to be less consistent with data than was true for the letters. A rank order correlation (Spearman) between the intrusion matrices of the model and the experiment produced a rho value of +.197 (t = 1.79) that was modestly significant ( p < .05). A probable explanation for this less-thanencouraging similarity between the model and data with respect to digit intrusion errors has to do with the lack of a recoding process in the current version of the model. If asked, most Ss indicate that fhey recode digits into larger numbers (chunks). The names of these larger numbers are long and contain many phonemes. As a result, most of the chunks have auditory components in .common, and the distinction between items on this basis begins to break down. Incidentally, there is evidence for little or no systematic confusions between digits (Conrad, 1959; Wickelgren, 1965a). A final analysis performed on the output of these simulation runs involved looking at serial position errors. The curves for the 8- and 10-item sequences are shown in Figs. 14a and 14b. I n Fig. 14a, the model’s output is compared to results reported by Jahnke (1963) for 8-item sequences. As in the model, Jahnke required his Ss to output the items in the same order as they were input. Except for a slight difference at positions 7 and 8, the curves overlap very nicely. I n Fig. 14b, the model’s perform-
Computer Simulation of Short-Term Memory
187
ance on the 10-item sequences is compared to results reported by Murdock (1968). Murdock's Ss were also required to output in the same order as the items were input. Again, the curves compare quite favorably except for the last two positions, 9 and 10.
B. SIMULATED STUDYNUMBER 2 : HIGH VERSUS Low AUDITORY SIMILARITY The second experiment simulated with the model focused upon the effects of acoustic similarity between items. The sequences were constructed by drawing items randomly from an acoustically similar set, BCDEGPTVZ, and an acoustically distinct set, BHKOQRWXY. During the simulation runs for each of these conditions, the discrimination net contained only the items from that particular set. I n effect,
MODEL
JAHNKE ( 1963)
a
2 '
1
2
3
4
5
6
7
8
SERIAL POSITION
'"1 W
a
401 20
I
2
3
4 5 6 7 SERIAL POSITION
8
9
10
FIG.14. Serial position curves for 8 and 10 item sequences.
this was a simulation of an experiment in which S knew or had in front of him the complete set of letters from which the items were drawn. I n other words, guesses would be restricted to items in this set. Three other
188
Kenneth R. Laughery
variables were also manipulated in this simulated study : Presentation rate again included one item every 0.3, 0.5, or 1.0 second; presentation mode was either visual or auditory; and, length of sequence was six or eight items. The results of these runs showed effects similar to simulated study number 1 for presentation mode (78% correct for both auditory and visual) and length of sequence (81% for six items and 75% for eight items). For presentation rate, the collapsed results indicated percentages correct of 7 6 ,7 5 , and 83 for the 0. 3, 0. 5, and 1.0 rates, respectively. There was no difference between the 0.3 and 0.5 rates. However, in this study performance was better at the slower rate, 1.0 second. For the symbol-set variable, performance in the acoustically distinct condition was superior to the acoustically similar condition (86 and 70% respectively). This difference resulting from auditory similarity held under all combinations of the other three variables ; that is, there was no apparent interaction between auditory similarity and any of the other variables. These results are consistent with several published studies (e.g., Laughery, 1963 ; Wickelgren, 1965b). I n the model, this effect can be attributed to the greater number of distinctive cues available for the acoustically distinct set as compared to the similar set.
VI. Discussion and Conclusions The computer simulation approach to modeling provides certain advantages when contrasted with “verbal” and with “mathematical” theories of human learning and memory. The predominant verbal theories, while providing a rich source of explanatory concepts, have generally lacked precision. To quote Hintzman (1968, p. 156) “Such ‘verbal’ theories have been criticized on several grounds: (a) lack of criteria for deciding which of several explanatory concepts are relevant to a particular result, (b) a consequent lack of parsimony, and (c) lack of demonstration that the explanations are internally consistent, complete or adequate.” A computer simulation model, however, requires that the assumptions and concepts be sufficiently complete and precise fpr programming. Furthermore, a detailed evaluation of the model’s implications can readily be determined by executing it on a computer. A simulation model may be inadequate in other respects, but it is not vague. Mathematical models, while providing an appropriate level of precision, have lacked generality and scope. Typically, mathematical models have a relatively restricted range of application in the sense that they are unable to deal with the level of complexity possible in a computer simulation. It is recognized, of course, that the simulation effort presented in this paper has been applied only to the memory-span task. In
Computer Simulation of Short-Term Memory
189
the following discussion, however, an attempt will be made to show how the model could be modified or extended to simulate behavior in a variety of tasks that have been employed to study human memory. The point was made earlier that the model contains relatively few basic memory structures and processes. It is here argued that these few processes with the addition of appropriate task-dependent processes provide a model with considerable generality.
A. SOME POSSIBLE EXTENSIONS The value of the present model as a f i s t approximation to a theory of human memory will to a large extent be determined by its validity when applied to a variety of other tasks. It is one thing to develop a model that represents behavior in a memory-span task ;it is quite another matter to demonstrate that the basic ingredients of that model apply to other tasks. This section of the chapter will present some current thoughts concerning how the model might be applied to a number of other experimental procedures that have been used to study memory. 1, Interitem Time Distributions during Input I n the simulated studies carried out to date, the interitem times within a given sequence have been constant; that is, while the presentation rate has been varied from sequence to sequence, it has been constant within a sequence. However, the model can deal with variable interitem times within a sequence. Corballis ( 1966a,b) has reported two studies that varied interitem times within sequences. I n the first study (Corballis, 1966a), three patterns of irregular intervals were used. When compared to performance on sequences using a constant interitem interval, but with the same total time for the sequence, performance on the irregular interval sequences was superior. The second study (Corballis, 1.966b)compared performance on two time-distribution conditions : Condition I in which interdigit intervals were short at first but were gradually increased; and Condition D in which intervals were long at first but gradually decreased. The results indicated that performance on the last three items of a nine-digit list was better in Condition I, but there were no apparent differences between the two conditions on the first six items of the sequence. The model should predict results consistent with the findings of the f i s t experiment. When the total time is held constant, some interitem intervals in the irregular series are longer than any of the intervals in the constant-interval series. These longer intervals provide an opportunity for recoding that may not be possible in the constant-interval condition. This assumption seems particularly valid considering the presentation rates used by Corballis. The slowest presentation time in
190
Kenneth R.Laughery
his study was 1.5 seconds per digit (this number, of course, represents the mean time in the imegular-interval condition). It seems reasonable to expect no difference between the regular and irregular conditions in which the presentation rate (or mean rate) is slower, since at slow rates recoding could also be carried out in the constant interitem interval condition. I n the second experiment, the model should perform better in Condition I than in Condition D because there is more time available between items in the latter part of the sequence-which is where the time can do the most good. More specifically, since most of the items have been presented, the longer intervals near the end in Condition I can be used for appropriate rehearsal and recoding. It is not clear, however, that in the model’s output the difference between Conditions I and D would be restricted to the last three items in the sequence. However, since performance on the first few items of the sequences in both conditions of the Corballis experiment was near the maximum, the only question regarding the model’s output concerns the relative performance on the middle items in the two conditions. The model would probably do better on items in the fourth, fifth, and sixth positions in Condition I than in Condition D. 2. Split Span Studies Several memory-span studies have been reported which required or encouraged Ss to deal with the sequence of items as a number of subsequences. For example, Anderson (1960) used sequences of 12 numbers read in groups of 4. Following the sequence, a light was used to cue Ss as to which groups of digits were to be recalled and in what order. The results indicated that postsequence cueing for recall of only part of the sequence resulted in higher performance (on that part of the sequence) than when recalling all of the message. These data indicate that the act of outputting the Grst digits of the sequence interferes with recall of the remaining digits. I n a somewhat similar experiment by Posner (1964),Ss were presented with sequences of eight digits. Two presentation rates were used : 96 and 30 digitslminute. When Ss were required to report the digits in their presented order, the fast presentation rate was superior to the slow rate.I2 When the last four digits were reported before the first four, performance improved at the slow rate but not at the fast rate. The model should have little trouble simulating the results of these two studies provided one task-dependent mechanism is introduced‘2 Note that this result is consistent with the studies cited in Section V,A, showing that performance improves as presentation rates become faster than .5 second.
Computer Simulation of Short-TermMemory
191
multiple anchor points in the sequence. I n the present version of the model only one anchor point (the memory cell at which the simulated S enters the STM) is specified-M1. The ability to establish multiple anchor points would allow the simulated S to enter STM a t more than one memory location. I n representing the procedures used in the Anderson study, these anchor points would be M1, M5, and M9-the location of the first item in each group. For the Posner study, the anchor points would be M1 and M5. The postsequence-cueing technique is viewed as simply identifying the anchor point a t whichS enters the memory system to begin retrieval. Since the length of time that an item is in memory determines the probability of retrieval, it is easy to see why the model would match Anderson’s data. The interaction found in Posner’s results should also be predicted by the model. Because at the slower presentation rate (30 items/minute) the simulated S will have an opportunity to rehearse the first few items in the sequence, and because this rehearsal causes these items to decay more slowly, recalling the last four items before the first four should improve performance. I n other words, the last four items are being retrieved before they have an opportunity to decay beyond recall, while the first four items, because of rehearsal, can afford to wait. At the fast presentation rate (96 items/minute), however, there is not sufficient time during the interitem interval for rehearsal to occur. Hence, there will be no differential resistance to forgetting built up by some of the items, and the order of recall will be relatively unimportant. 3 . Relative Recency Judgments Morton (1968) has recently reported a study in which Ss were required to report which of two items in the sequence occurred more recently (later). It is easy to see how the model might be programmed to simulate such an experiment. The respond routine S6 would use a strategy of retrieving the items starting in M1 until one of the critical items is found and then responding with the alternative critical item. For example, if the two critical items were A and B, and A preceded B in the sequence, the model would retrieve items until it found A and then respond with B. It is also easy to see how errors might be made. If the link information defining the order of items in the sequence were lost, the model might retrieve B before A and inappropriately respond with A. Morton’s study, however, had an added twist. I n one condition of the experiment, the first critical item occurred twice in the sequence before the occurrence of the second critical item. Morton’s results indicated that in this condition the first item (which had occurred twice) was more likely to be declared the more recent item (an error) than in the condition in which the first item had occurred only once. The model would almost
192
Kenneth R. Laughery
certainly predict the opposite results, since the double occurrence of the first item makes it more likely that this item will be retrieved before the second item. An alternative strategy in the respond routine that might allow the model to match Morton’s results would involve retrieving items in the reverse order and reporting the first critical item encountered as the more recent. The reverse-order retrieval could be accomplished by modifying the model to construct backward as well as forward links. If a backward link is not remembered the model would choose an item randomly, and in the condition in which the first item occurred twice it is more likely to be retrieved (an error) than when it occurred only once. Furthermore, it is more likely to be retrieved than the more recent critical item. Thus, Morton’s results would be simulated. 4. Free Recall
I n the model, order information is represented by the link structures. An interesting issue is tvhether or not the links as conceived here represent a basic characteristic of the memory system. I n most memory span experiments, Ss are expected to maintain the same relative position of the items during output as was defined by the input sequence. Even if the experimenter does not specify this requirement during the instruction, a quick perusal of almost any set of data will indicate that most Ss make this assumption for themselves. As a result, the concept of a link might be considered task-dependent-the Ss store order information because it is part of the task. How much better would the model perform if it were allowed to freerecall items? This question refers to a hypothetical experiment in which S is urged to disregard the order and concentrate on the individual items. The experiment could be simulated by suppressing the formation and use of links in the model. Item information would simply be stored in a number of memory cells which would be randomly addressed during rehearsal and responding. The model should predict that performance in this free-recall situation is superior to performance in the task in which order information, links, is being stored (naturally, a free-recall criterion would be employed in evaluating performance in the latter situation). The reason is that the model in the free-recall task does not require time to set up the links during input (no S22 routine) nor to update the links during retrieval (in the S l l routine). This additional time would thus be available for rehearsal and recoding. It should be noted, however, that the time required to establish and update links in STM is relatively small in comparison to the total amount of processing time. Hence, the difference between the free- and ordered-recall tasks would probably not be large.
Computer Simulation of Short-Term Memory
193
5. Delayed Recall A large number of studies have been reported using the now classical Peterson and Peterson (1959) task. The procedure involves presenting a short sequence of items (e.g., three consonants) and then having S recall the items after a delay of several seconds. I n the original Peterson and Peterson (1959) study this recall interval was varied from 3 to 18 seconds. To prevent rehearsal during the delay interval, S is required t o perform some information-processing task that is presumably unrelated t o the original items. A frequently used task is counting backward by threes from some random number. The results of these studies generally show an exponential relationship between the probability of recalling the items and the length of the delay interval. The exponential decay process postulated in this model was greatly influenced by the existence of these data, and the model, with the appropriate value of the B parameter in the decay function, would have little difficulty in simulating these studies. The modeling procedure would involve nothing more than adding an increment of time t o the clock at the end of the sequence which would represent the delay interval. I n another study using the Peterson and Peterson technique, Hellyer (1962) varied the number of times a trigram was presented (1, 2, 4, and 8 repetitions) before beginning the retention interval. Hellyer’s results indicate that the asymptote of the forgetting curve increases directly with the number of repetitions. It would be a relatively simple matter for the model to represent Hellyer’s results by making the C parameter (asymptote) in the exponential equation a function of the number of times the item has been presented or rehearsed. At the beginning of each new repetition of the sequence, the model would simply change the C parameter in the decay functions for the components. Such a procedure, however, seems inappropriate since it essentially finesses a basic issue with which an extended version of the model must deal, namely, the representation of multiple-trial learning. Clearly, in a model proposing a memory system consisting of an LTM and STM, learning should be viewed as the entry of information into the LTM. During the initial stages of formulation of the model, the plan was t o design a memory system that would allow for two basic types of learning. The first type is essentially temporary learning that takes place in the STM. This type has already been described, and is represented by changing the decay rate during the updating process. Such learning is clearly temporary, for although the information is more resistant t o decay over the short term, the asymptote is still zero. The second type of learning is more permanent and is the result of transferring information from the STM into the LTM. It was anticipated that this could take
194
Kenneth R. Laugherg
place by growth of the discrimination net in a fashion similar to that proposed in the EPAM model (Feigenbaum, 1963) and, more recently, the SAL model (Hintzman, 1968).The problem currently being explored deals with what is stored in the discrimination net and how. Obviously, the solution is not simply a matter of building test nodes based upon the presence or absence of auditory components which lead to a terminal cell containing the item, because the LTM is already a discrimination net of this form. One idea is that some sort of time tag or temporal information is being added to the net, additions that mark certain components or items as being relevant to or a part of the current sequence. These time tags would stay in LTM from one sequence to the next, and might (depending upon how well they are discriminated) lead the simulated S to retrieve information that was part of a previous sequence. There is a distinct advantage to representing the more permanent type of learning by some kind of discrimination net process. Such a procedure would provide direct contact between the present model and the extremely powerful EPAM and SAL models. For example, in the memoryspan task some rather well-established techniques would be available for representing interference effects between sequences.
B. SOMEPOSSIBLE REVISIONS Several characteristics of the current version of the model are being considered for revision. A f i s t characteristic concerns the nature of the basic units of information that are stored in STM. I n the present version these basic units are the auditory components defined in Table I. Is the auditory dimension appropriate for defining the basic units of information? Hintzman (1965,1967)has published analyses of intrusion errors which indicate that another dimension, articulatory components, may also be important. The possibility of using articulatory components in STM as opposed to auditory components is particularly interesting since, as Hintzman points out, this set of components can predict all of the intrusion errors that the auditory components predict plus some errors that are not predicted by the auditory dimension. However, an alternative dimension of items, distinctive features (Jacobson, Fant, & Halle, 1952; Miller & Nicely, 1955) appears even more attractive. These features represent a finer level of analysis than the current auditory-phonemic components. In addition to predicting results that would be consistent with the available auditory and articulatory intrusion data, the distinctive features are attractive for a second reason. I n the model, the initial decay rate assigned to each of the auditory components in STM is a function of the pronunciation time of the component. As already explained, this procedure was adopted to account for data reported by Wickelgren (1965a). The use of distinctive features
Computer Simulation of Short-Term Memory
195
as the basic information units would allow the model to account for Wickelgren’s data without the assumption that decay rates are a function of pronunciation time, thus resulting in a more parsimonious model. Still another advantage of using distinctive features concerns the excessive (as compared to actual data) substitution of the letter E for BCDGPTVZ. This overly frequent intrusion is the result of the exact-match rule in the net sorting routine. With distinctive features as the basic units of information, this problem would be solved. A second characteristic of the model being reconsidered concerns the updating procedures. Resetting the time tag to the current value of the clock appears to be a reasonable and, in fact, a necessary process for a decay model. Changing the decay process in such a way that the information will be lost at a slower rate may not, however, be consistent with the data. I n an analysis of the results of a number of studies (Hellyer, 1962; Melton, 1963; Murdock, 1961) using a Peterson and Peterson (1959) technique, Fiero and Laughery (1967) found that multiple presentation or rehearsals of the input items change the asymptote of the exponential function but have a relatively small effect on the rate parameter. The present feeling is that it would not be appropriate to attempt to simulate these results by having the update routine modify the asymptote of the decay function. An alternative procedure, which was discussed in the last section, would be to have information about the item transferred to a more permanent storage or LTM. Another issue related to the decay process is the independent loss of components. Posner and Konick (1966)and Wickelgren (1966)have published results that may make the independent loss assumption untenable. I n the Posner and Konick study using a memory-span procedure, the similarity between items presented on a given trial and on successive trials was systematically manipulated in conjunction with the difficulty of the information processing interpolated between presentation and recall. The results indicated that under conditions in which forgetting proceeds independently of the effects of interpolated task similarity, it depended u p o n similarity among the stored items, and upon the difficulty of the interpolated processing. However, the effectiveness of interference was more closely related to the time the items were in store as opposed to the difficulty of the interpolated processing. The Wickelgren study used sequences of letters in which an item occurred twice. It was found that the two items following the repeated item were transposed more often than chance. The results of both studies indicate that the loss of item information from STM is a function of the other items in STM. Posner (1967) has proposed the concept of the “acid bath” to account for these data. The idea is that items or components decay, but that their rate of decay is a function of the other items or components
196
Kenneth R. Laughery
presented. Frankly, it is not clear at this point how such a concept could be incorporated into this model. It appears that some sort of mechanism would have to be developed for defining how one item’s loss is influenced by the presence of another item. A third characteristic that obviously needs further work is the recoding process. Clearly, if the model is to be an adequate representation of human performance in a STM task, it must be able to represent the kinds of recoding procedures that Ss obviously use. Two types of recoding presently being developed are based upon pronunceability and meaningfulness rules. It is difficult at this point to assess whether or not these recoding processes as presently conceived adequately represent what Ss do. The plan is to develop the model in this direction and find out. A fourth characteristic of the model being considered must be introduced rather than revised. Although a sensory storage or very-shortterm memory is envisioned as a part of the memory system, the model does not now contain such a structure. Current thoughts are that veryshort-term memory is a buffer storage which is filled when a new item appears in the window and remains filled as long as the item is in the window. The information stored in this structure would consist of the components in the window, visual or auditory. These components would decay over a relatively short interval (on the order of 1 or 2 seconds). The S monitors this very-short-term memory, and when something new appears the input stimulus routine S2 transfers the information to the STM. Thus, the S2 routine represents the attention mechanism referred to in the general model in Fig. 1. Finally, one or two comments should be made regarding the modeling activities that have been and are currently in progress. The original version of the model was programmed in IPL-5 (Newell, Tonge, Peigenbaum, Green, & Mealy, 1961), a programming language designed specifically for this type of simulation work. For a variety of reasons, the model is currently being reprogrammed in SLIP(Weizenbaum, 1963), a list-processing language based on FORTRAN. One reason for reprogramming is the large amount of computer time required to run the IPL-5 version. A good rule of thumb €or estimating run time for this version is that it would be roughly equivalent t o real time (e.g., to simulate a sequence requiring 8 seconds to present and 15 seconds to respond would take about 23 seconds of computer time). This time requirement is to a considerable extent the result of the large amount of arithmetic required in the model (computing exponential functions), which IPL-5 does not handle efficiently. The FORTRAN-based SLIPlanguage should do well with the arithmetic. A second reason for reprogramming is the changeover from an IBM 7044 computer to a CDC 6400 at the State University of New York at Buffalo. The IPL-5 language is not available for the CDC
Computer Simulation of Short-Term Memory
197
6400 while the SLIPlanguage is. As soon as the reprogramming is completed, a number of sensitivity studies will be carried out. The purpose of these studies is to explore the effects of the various parameters on the overall performance of the model. The specific parameters t o be varied are the time charges (see Table V) and the B parameter in the decay function. These simulated results should provide valuable feedback as to which parts of the model represent the most crucial assumptions about the memory system. REFERENCES Aaronson, D. Perception and immediate recall of auditory sequences. Paper presented a t the meeting of the Eastern Psychological Association, Atlantic City, April, 1965. Anderson, N. S. Poststimulus cuing in immediate memory. Journal of Experimental Psychology, 1960, 60, 216-221. Atkinson, R. C., & Shiffrin, R. M. Human memory: A proposed system and its control processes. In K. W. Spence & J . T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory. Vol. 2. New York: Academic Press, 1968. Averbach, E., & Coriell, A. S. Short-term memory in vision. Bell System Technical Journal, 1961,40, 309-328. Baddeley, A. D. Short-term memory for word sequences as a function of acoustic, semantic and formal similarity. Quarterly Journal of Experimental Psychology, 1966, 18, 362-365. Bergstrom, J. A. Effects of changes in time variables in memorizing, together with some discussion of the techniques of memory experimentation. American Journal Of Psychology, 1907, 18,206-238. Bernbach, H. A. Decision processes in memory. Psychological Review, 1967, 74, 462-480. Bower, G. H. A multicomponent theory of the memory trace. I n K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory. Vol. 1. New York: Academic Press, 1967. Brown, J. Information, redundancy and decay of the memory trace. I n “The Mechanization of Thought Processes.” National Physical Laboratory, Proceedings of Symposia. No. 10, pp. 729-746. London: H.M. Stationery Office, 1959. Cimbalo, R. S., & Laughery, K. R. Short-term memory: Effects of auditory and visual similarity. Psychonomic Science, 1967, 8, 57-58. Conrad, R. Accuracy of recall using keyset and telephone dial and the effect of a prefix digit. Journal of Applied Psychology, 1958, 42, 285-288. Conrad, R. Errors of immediate memory. British Journal of Psychology, 1959, 50, 349-358. Conrad, R. Acoustic confusion in immediate memory. British Journal of Psychology, 1964, 55, 75-84. Conrad, R. Order error in immediate recall of sequences.Journal of Verbal Learning and Verbal Behavior, 1965, 4, 161-169. Conrad, R., & Hille, B. A. The decay theory of immediate memory and paced recall. Canadian Journal of Psychology, 1958, 12, 1-6. Corballis, M. C. Memory span as a function of variable presentation speeds and stimulus durations. Journal of Experimental Psychology, 1966, 71, 461-465. (a)
198
Kenneth R. Lsughery
Corballis, M. C. Rehearsal and decay in immediate recall of visually and aurally presented items. Canadian Journal of Psychology, 1966,20, 43-51. (b) Crossman, E. R. F. W. Information and serial order in human immediate memory. I n C. Cherry (Ed.), Information theory. London & Washington, D.C.: Butterworht, 1960. Feigenbaum, E. A. The simulation of verbal learning behavior. I n E. A. Feigenbaum & J. Feldman (Eds.), Computers and thought. New York: McGraw-Hill, 1963. Feigenbaum, E. A., & Simon, H. A. A theory of the serial position effect. British Journal of Psychology, 1962,53, 307-320. Feldman, J. Simulation of behavior in the binary choice experiment. In E. A. Feigenbaum & J. Feldman (Eds.), Computers and thought. New York: McGrawHill, 1963. Fiero, P., & Laughery, K. R. Fixation in short-term memory as a function of overt rehearsal. Paper presented a t the meeting of the Eastern Psychological Association, Boston, April, 1967. Fletcher, H. Speech and hearing in communication. Princeton, N.J. :Van Nostrand, 1953. Fraser, D. C. Decay of immediate memory with age. Nature, 1958, 182, 1163. Glucksberg, S. Decay and interference in short-term memory. Paper presented a t the meeting of the Psychonomic Society, Chicago, October, 1965. Gregg, L. W., & Olshavsky, R. W. Time measures of implicit information processing as a function of task complexity. Complex Information Processing Working Paper No. 89, Psychology Department, Carnegie-MellonUniversity, 1966. Hellyer, S. Supplementary report : Frequency of stimulus presentation and shortterm decrement in recall. Joumzal of Experimental Psychology, 1962,64, 650. Hintzman, D. L. Classification and aural coding in short-term memory. Psychonomic Xcience, 1965, 3, 161-162. Hintzman, D. L. Articulatory coding in short-term memory. Journal of Verbal Learning and Verbal Behavior, 1967,6, 312-316. Hintzman, D. L. Explorations with a discrimination net model for pairedassociate learning. Journal of Mathematical Psychology, 1968, 5, 123-162. Jacobson, R., Fant, C. G. M., & Halle, M. Preliminaries to speech analysis. Cambridge, Mass.: M.I.T. Press, 1952. Jahnke, J. C. Serial position effects in immediate serial recall. J O U T ~oU f ~ Verbal Learning and Verbal Behavior, 1963,2, 284-287. Keppel, G. Problems of method in the study of short-term memory. Psychological Bulletin, 1965, 68, 1-13. Landauer, T. K. Rate of implicit speech. Perceptual and Motor Skills, 1962, 15, 646-647. Laughery, K. It. Effects of symbol set on immediate memory. American Psychologist, 1963, 18, 415. (Abstract) Laughery, K. R., & Gregg, L. W. The simulation of human problem-solving behavior. P s y c h m t r i k a , 1962, 27, 265-282. Laughery, K. R., Harris, G. J., & Ulbricht, C. Visual similarity, presentation mode and presentation rate in a short-term memory recognition task. Paper presented a t the meeting of the Psychonomic Society, Chicago, October, 1967. Laughery, K. R., & Pinkus, A. L. Short-term memory: Effects of acoustic similarity, presentation rate and presentation mode. Psychonomic Science, 1966, 6, 285-286. Laughery, K. R., & Pinkus, A. L. Recoding and presentation rate in short-term memory. Journal of Experimental Psychology, 1968,76, 636-641.
Computer Simulation of Short-Term Memory
199
Mackworth, J. Presentation rate and immediate memory. Canadian Journal of Psychology, 1962, 16, 42-47. Mackworth, J. The relation between visual image and post -perceptual immediate memory. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 75-85. Melton, A. W. Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 1-21. Miller, G. A. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 1956, 63, 81-97. Miller, G. A., & Nicely, P. E. An analysis of perceptual confusions among some English consonants. Journal of the Acoustical Society of America, 1955, 27, 338-352. Morton, J. Repeated items and decay in memory. Psycohnomic Science, 1968, 10, 219-220. Murdock, B. B. The retention of individual items. Journal of Experimental Psychology, 1961, 62, 618-625. Murdock, B. B. Visual and auditory store in short-term memory. Quarterly Journal of Experimental Psychology, 1966, 18, 206-21 1. Murdock, B. B. Serial order effects in short-term memory. Journal of Experimental Psychology, 1968, 76(4), Part 2 (Monogr. Suppl.). Newell, A., Tonge, F. M., Feigenbaum, E. A., Green, B. I?., & Mealy, G. H. Infomnation processing Languuge V manual. Englewood Cliffs, N.J. :Prentice-Hall, 1961. Norman, D. A. Acquisition and retention in short-term memory. Journal of Experimental Psychology, 1966, 72, 369-381. Peterson, L. R. Immediate memory: Data and theory. In C. N. Cofer & B. S . Musgrave (Eds.), Verbal behavior and learning. New York: McGraw-Hill, 1963. Peterson, L. R., & Peterson, M. J. Short-termretention of individual verbal items. Journal of Experimental Psychology, 1959, 58, 193-198. Pinkus, A. L., & Laughery, K. R. Short-termmemory: Effects of pronounceability and phonemic uniqueness of chunks. American Psychological Association Conference proceedings. Washington, D.C. : APA, pp. 65-66, 1967. Pollack, I. The assimilation of sequentially encoded information : 11.Effect of rate of information presentation. USAP ARDC Human Resources Laboratory Memorandum Report, 1952, No. 25. Posner, M. I. Immediate memory in sequential tasks. Psychological Bulletin, 1963, 60, 333-349. Posner, M. I. Rate of presentation and order of recall in immediate memory. British Journal of Psychology, 1964, 55, 303-306. Posner, M. I. Short-term memory systems in human information processing. Acta Psychologica, 1967, 27, 267-284. Posner, M. I., & Konick, A. F. On the role of interference in short-term retention. Journal of Experimental Psychology, 1966, 72, 221-231. Postman, L. Short-term memory and incidental learning. I n A. W. Melton (Ed.), Categories of human learning. New York: Academic Press, 1964. Sperling, G. The information available in brief visual presentations. Psychological Monographs, 1960, 74 (11, Whole No. 498). Sperling, G. Successive approximations to a model for short-term memory. Acta Psychologice, 1967, 27, 285-292. Waugh, N. C., & Norman, D. A. Primary memory. Psychological Review, 1965,72, 89-104.
200
Kenneth R. Laughery
Weizenbaum, J. Symmetric list processor. Communications of the Association for Computing Machinery, 1963,6, 524-536. Wickelgren, W. A. Acoustic similarity and intrusion errors in short-term memory. Journal of Experimental Psychology, 1965, 70, 102-108. (a) Wickelgren, W. A. Short-term memory for phonemically similar lists. American J m r n a l of Pqchology, 1965, 78, 567-574. (b) Wickelgren, W. A. Associative intrusions in short-term recall. Journal of Experimental Psychology, 1966,72, 853-858. Wickelgren, W. A. Exponential decay and independence from irrelevant associations in short-term memory for serial order. Journal of Experimental Psychology, 1967, 73, 165-171. Wickelgren, W. A., & Norman, D. A. Strength models and serial position in shortterm recognition memory. Journal of Mathematical Psychology, 1966,3,316-347. Yntema, D. B., & Trask, F. P. Recall as asearchprocess.Journa1 of Verbal Learning and Verbal Behavior, 1963, 2, 65-74.
REPLICATION PROCESSES IN HUMAN MEMORY AND LEARNING Harley A . Bernbach CORNELL UNIVERSITY ITRACA. NEW YORK
I . Introduction ............................................ I1. Basic Properties of the Theory . . . . . . . . . ................. A . Storage ..............................................
I11.
IV .
I3. Retrieval ............................................ C . Forgetting ........................................... Serial-Position Effects in Short-Term Memory . . . . . . . . . . . . . . . A . The Experimental Paradigm ........................... B . The Mathematical Model ............................... C . Fit of the Model to the Data ............................ D . Explanation of Serial Position Phenomena . . . . . . . . . . . . . . . E . Assumptions about Interfering Events . . . . . . . . . . . . . . . . . . . Some Other Short-Term Memory Tasks ..................... A . Single-Item Recall .................................... B Memory for Individual Paired Associates . . . . . . . . . . . . . . . . . C Continuous Paired-Associate Learning . . . . . . . . . . . . . . . . . . . D . A Continuous Memory Task ............................ Repeated Presentations and Learning ....................... A . Repeated Presentations in Single-Item Recall . . . . . . . . . . . . . R . Paired-Associate Learning ............................. Some Evidence for Rehearsal Processes ..................... A Short-Term Memory with Children ...................... B . Payoffs and Serial Position Effects ...................... C . Incentives in Single-Item Reca.11. ....................... D . Instructed ‘‘Release’’from PI .......................... Concluding Remarks ..................................... A. Confidence Ratings and Recognition Memory . . . . . . . . . . . . . B . ultiprocess Memory Models ........................... References..............................................
. .
V. VI
.
VII .
.
201 202 202 203 204 206 207 208 211 212 215 215 216 217 218 219 223 223 226 231 231 233 234 235 236 236 237 237
I . Introduction This chapter proposes a fairly general theory of simple verbal memory. describing its basic features and giving some of its implications . The theory has been developed primarily to give an account of the processes that generate many of the phenomena observed in experiments in shortterm verbal memory; paired-associate learning. and the like . I n many respects. the theory is very explicit and detailed; for some simple situations it has been possible to construct mathematical models to fit the data . Other features of the theory are less explicit (or are incomplete) 201
202
Harley A. Bernbach
and their discussion is admittedly abbreviated. They are mentioned to suggest the potential generality of the conceptual system. To ease the task of describing in detail the basic features of the theory, the sort of data to be discussed will be limited. The experiments considered will involve the storing and remembering of unrelated “items.” An item will be defined simply as enough information about something the experimenter has presented, such that the experimenter will be satisfied that the subject is “correct” if he retains that information until the time of test. Thus, an item may be a nonsense syllable in a short-term memory task, the associative link and response for one item of a pairedassociate list, and so on. By restricting consideration to “unrelated” items, I mean not only to rule out the effects of structure (e.g., memory for sentences), but also to make a rather strong assumption. It will be assumed that the learning and remembering of each item in a list may be considered as an independent experiment. Thus, the theory will make probability statements (about items) which are independent of the momentary states of other items in memory. This assumed independence of performance may be faulty for items that are not in memory at the time of test. For example, if all items in a list except one are remembered, the response ensemble for the last remaining item will be severely limited, with some corresponding effect on guessing. As a first approximation, however, guessing behavior for nonremembered items will be assumed to be random at a chance level. Within these boundary conditions, the basic features of the theory and its implications in a variety of memory tasks may now be discussed.
II. Basic Properties of the Theory First, we shall consider the basic assumptions made by the theory about the storage of items in memory, retrieval, and the nature of forgetting. A. STORAGE Whenever an item is presented for a long enough period of time to be processed through the perceptual system, an internal representation, to be called a replica, is stored in memory. Following this initial storage, additional replicas of the same item are stored by further presentations or rehearsal of the item. The rehearsal may be overt, covert, or below the level of awareness, although it should be affected by factors relating to the relative importance of rehearsing the item being considered or other material to be remembered. The question of how much any particular item is rehearsed will be discussed later in considering serial position phenomena.
Replication Processes in Human Memory and Learning
203
The precise nature of the stored replica can be quite complex. The information stored about an item may well be more than the minimum amount necessary to satisfy the criterion for an item being in memory, i.e., that the subject can make a “correct” response. The initial replica may contain incidental information such as the color of the background or other cues present when an item is presented. Indeed, something of this sort must be true in order that incidental learning occur. Furthermore, even the rehearsed replicas may not be identical. The simplest version of the theory will be presented in this chapter, however; thus, replicas will be treated as simply as items have, that is, as single units. The result of storage and rehearsal of an item is a stack of replicas of this item. For reasons that are discussed below, this stack is assumed t o be a push-down stack rather than simply a pile. That is, the subject can be aware a t any time only of whether or not he has at least one replica, but not of how many replicas he has. This is analogous to the stack of plates behind a luncheonette counter, which pushes down as plates are added so that its visible level does not depend on the number of plates in the stack, but only on whether or not the stack is empty. One important implication of the push-down stack assumption is that rehearsal of an item is independent of its number of replicas. Thus, an item will be rehearsed according to factors related to the entire set of items to be remembered, but will be neither favored nor discriminated against for having a particularly large or small number of replicas. The one exception to this rule, of course, is that a forgotten item (i.e., one for which there are no replicas) cannot be rehearsed.
B. RETRIEVAL The primary reason for the push-down stack assumption is that it conveniently leads to the further assumption that the probability of a correct response is independent of the number of replicas. That is, it will be assumed that the response probability is one as long as there is a t least one replica present a t the time of test. If there are no replicas in memory, we will say that the item is forgotten and that the probability of a correct response to it will be a t the chance level. These assumptions imply that the replica theory is an all-or-none theory in one important sense. At the time of test, an item either has a t least one replica (is in memory) or has no replicas (is not in memory). When considering retrieval in short-term memory experiments, then, we can effectively consider the item to be in one of only two states, remembered or forgotten. The implication of this assumption for confidence ratings and recognition memory will be discussed later, following the
204
Harley A. Bernbach
arguments of Bernbach (1967a). Basically, however, the push-down stack assumption means that the subject can make no response dependent on his knowing how many replicas he has of a remembered item. This all-or-none assumption can be correct only as a first approximation, unless the item actually contains only one bit of information. For example, subjects can often report the first letter of a nonsense syllable correctly, although they are scored incorrect on the entire item. The problem stems from the earlier assumption that the item could be treated as a unit. Recognizing this as a first approximation, however, we shall pursue this discussion on that basis until it leads us into serious trouble.
C. FORQETTINQ Although the number of replicas is assumed to have no effect on performance, it will affect memory. Thus, the number of replicas of an item affects its likelihood of being forgotten. The assumption is that forgetting consists of the loss of replicas because of storage interference when other material is stored in memory. It follows that forgetting (loss of all the replicas of an item) is less likely to occur in a given period if the number of replicas of that item in store a t the beginning of the period is high. Unlike a strength theory, however, the effect is assumed to be on the probability of forgetting only, rather than on both retention and recall probability. (Itis more accurate to assume that storage interference is an increasing function of the similarity of the item presented and the item it interferes with. Such an assumption is required to handle data on the effect of similarity of interfering material upon forgetting. However, this topic will not be reviewed in this chapter, and the assumption has been stated for completeness.) There remains the question of the precise effect of storage interference on the stack of replicas. If the question of just what conditions lead to storage interference is sidestepped, there are two extreme points on a continuum of assumptions about the effect of storage interference. Firet, it could be assumed that each replica is lost with equal probability when an interfering event occurs. That is, the loss probability for each replica in a stack is independent of whether other replicas in the stack are lost or not. At the other extreme, one might assume that this loss probability is totally dependent on the loss of other replicas such that an interfering event can cause the loss of at most one replica in the stack. The independent replica loss assumption implies a particular form for a forgetting curve. If there are a series of interfering events, each of which causes the loss of any replica with some equal probability, the
Replication Processes in Human Memory and Learning
205
function relating the probability of forgetting to the number of interfering events, called the forgetting function, will be a simple exponential decay function. This follows regardless of the number of replicas in the stack at the start. There is considerable experimental evidence against this forgetting function, however, from a variety of experimental paradigms. It appears that the short-term forgetting function is not exponential, but instead exhibits an S shape, with a plateau effect at the beginning of the curve. (This basic shape is shown in Fig. 2 . ) The assumption that a single replica is lost per forgetting event can lead to this S shaped curve as long as the number of replicas at the start is probabilistic. That is, it must be assumed that the number of replicas
NUMBER OF INTERFERING EVENTS FIG.1. Theoretical forgetting curves for various values of N , the number of starting replicas.
is not the same for every item in an experiment. This can be demonstrated graphically. Suppose the forgetting period started with just one replica in the stack. In this case, the forgetting function would be exponential as shown in Fig. 1, which assumes a probability of .75 of losing a replica on each interfering event. Suppose, however, that there were two replicas at the start of the forgetting period. Since only one can be lost at a time, the probability of forgetting after only one interfering event is zero. The forgetting curve then starts to build up as shown in Fig. 1, also with a probability of .75 of losing one replica after each interfering event. Similarly, an extended plateau will result from a larger number of starting replicas.
Harley A. Bernbach
206
The effect of a probabilistic distribution of starting quantities is shown in Fig. 2. Here, a loss probability of .75 and a Poisson distribution with a mean of 3 has been used. Thus, the curve represents simply a weighted average of functions similar to those in Fig. 1, but it displays the S shape that is characteristic of real data in short-term forgetting. An intermediate assumption, in which the loss of a replica is neither totally independent nor totally dependent on the loss of others could yield a similar S-shaped curve. Bowing to parsimony, in most of the following discussion the assumption is made that only one replica is lost at a time. A mathematical model incorporating this assumption does a creditable job in accounting for data in short-term memory. 1.0 ' (3
5
w (3
a
.9 .8
.7
2 .6
I
t
l
l
l
l
l
l
l
l
l
0 I 2 3 4 5 6 7 8 9 10 NUMBER OF INTERFERING EVENTS FIG.2 . Theoretical forgetting curve assuming a Poisson distribution of the number of starting replicas.
With the assumptions so stated, we may now develop an explicit mathematical model for application to data of short-term memory experiments. This will not only show the theory's ability to account for data, but should help to clarify the meaning and function of the basic assumptions. 111. Serial-PositionEffects in Short-Term Memory One of the simplest memory tasks to which the replica theory can be applied is one in which a short list of items is presented, and then just one of them is tested for recall. For such a task, it is simple to develop an explicit quantitative model and to apply it to the data. Further, such
Replication Processes in Human Memory and Learning
207
data generally exhibit some reliable effects of serial position on memory performance. In this section, a model is applied to such data, and phenomena such as primacy and the effects of list length are explained in terms of the replica theory. A. THE EXPERIMENTAL PARADIGM The experiment to which the mathematical model is applied was reported by Phillips, Shiffrin, and Atkinson (1967). On each trial, the I .o
’7-1
I
I
I
I
I
I
I
I
.9 v)
w
v)
z
0 .7
w
a k .6 0
w
a
g
.5
0 LL
0 .4 .3
2
g
a
.2
0
-
L.3
.-.L.4 L=5 Lz6 *x L=7 L=8 01-h LSII 0-0 L=14 1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3
SERIAL POSITION OF TEST ITEM
FIG.3. Probability of a correct response as a function of serial position for several list lengths L (from Phillips et al., 1967).
subject was shown, one at a time, a set of cards, each bearing one of four color patches (blue, green, black, or white). After its presentation, each card was placed face down in a row in front of the subject. The size of the set. of cards (called “list length” here) varied across trials, with list lengths of 3, 4, 5, 6, 7, 8, 11, and 1 4 being used. The subject’s task was to name the color on one of the cards pointed to by the experimenter after presentation of an entire set. Since the subject did not know which
208
Harley A. Bernbach
card would be tested, she had to try to “remember” the entire list in order to perform the required task. The basic data from this experiment are shown in Fig. 3, in which is plotted the proportion of correct responses as a function of serial position. These positions are numbered in such a way that the abscissa reflects the number of other items intervening between presentation and test. These data show several characteristics that are typical of experiments on short-term memory for simple materials. The most obvious of such features, of course, is the relatively smooth forgetting function over short time intervals (each item was presented for 2 seconds). Of greater interest are the detailed shapes of these curves and the interrelationships the data show for various list lengths. Notice first that the forgetting functions all exhibit the characteristic S shape discussed in the previous section. (Actually, these curves show a reverse S shape, as the functions reflect the probability of remembering rather than that of forgetting.) Also very noticeable is the strong primacy effect characteristically found in memory and learning tasks involving ordered lists. The primacy effect appears to exist even at the shortest list lengths, with the earliest items in each list (those with the most intervening items between presentation and test) showing better performance than those in the middle of the list. A final property of these data is one that has been labeled intralist proactive inhibition (PI). This is a phenomenon that was studied extensively by Murdock (1961, 1963b). Consider for example, the points for each list length at an abscissa value of 3. For each of them, there were exactly two other items intervening between presentation and test, and the time from presentation to test was the same. The poorer performance observed in longer lists must be attributed to items preceding that at serial position 3. This PI effect is generally observed at all serial positions. Murdock (1961, 196313) found the same phenomenon in short lists of paired-associates when only one item was tested after a single presentation of each list. This PI effect appears to be quite general in short-term memory, and must be explained by any adequate theory or model.
B. THE MATHEMATICAL MODEL To develop a mathematical model from the assumptions of the replica theory, it is necessary to specify a probability distribution of the number of rehearsals occurring in any particular time interval. A distribution that has been adopted by the writer with some success is the Poisson. During any time interval in which rehearsal is occurring, the number of rehearsals is a random variable assumed to have a Poisson distribution. Thus, if K is the number of rehearsals, the probability that K
Replication Processes in Human Memory and Learning
209
equals some particular value k is specified by the equation, e-A p(K=k)=Xk-. (1) k! The Poisson parameter X is the mean number of rehearsals to be expected in the time interval. The rationale for using a Poisson distribution here is fairly straightforward. Suppose, for example, that a rehearsal period were 3 seconds long. This could be broken up into n small time intervals, each At seconds long. For purposes of exposition, let n = 10, so that At = . 3 seconds. Now, let us assume that during each interval At there is an equal probability~ that a rehearsal will occur. With these assumptions, the number of rehearsals occurring in the entire 3-second interval would have a binomial distribution with parameters n and p, and a mean equal to the product np. For values of np near 2 or 3, and a value of n of about 10, the Poisson distribution with X = np is an excellent approximation to the binomial, producing minor differences only in the third decimal place. The Poisson has the advantage that it has only one parameter, and sidesteps the question of how many small At’s to arbitrarily fit into the larger time interval. For such reasons, the Poisson distribution will be adopted as representative of rehearsal. The time course of events in storage is assumed to be as follows. Immediately upon presentation of the item by the experimenter, a single replica is stored in memory. The remainder of the presentation period is used for two distinct rehearsal activities. The first is repetition of the presented item. Exactly k, additional replicas of the item are stored (so that the item now has 1 + k, replicas), where k , is a random variable having a Poisson distribution with parameter A,. The second type of rehearsal activity is general rehearsal of any item in the list that has been presented but not yet forgotten. To simplify the quantitative model, it will be assumed that during general rehearsal items are chosen to be rehearsed by a random draw, with replacement, from the set of all items already presented but not yet forgotten. This set includes the item currently being presented. The assumption of a random draw may not be as outlandish as it first appears. To be sure, subjects are probably engaging in more systematic rehearsal strategies on any particular trial, such as starting at the beginning, starting at the end, rehearsing only the middle items, or the like. The model is not designed to be applied to data from individual subjects on single trials, however, Rather, it is intended for data grouped across both subjects and trials. Given the diversity of rehearsal strategies available on any subject-trial, it is perhaps not too unreasonable to suppose that any item in memory is as likely to be rehearsed as any other when some item is rehearsed.
210
Harley A. Bernbach
The effect of this assumption is that the repetition of the presented item is followed by k, rehearsal events, where k, has a Poisson distribution with parameter A,. Each rehearsal event results in the addition of areplica of one of the rehearsable items, and each such item has an equal probability of receiving that replication. Implicit in the foregoing is the assumption that only one item can be rehearsed at a time. This assumption is a fundamental difference between this rehearsal model and a consolidation theory. The latter supposes that items are rehearsed simultaneously until they are “consolidated” into long-term memory. The reason for the current assumption about one-at-a-time rehearsal will become clearer in discussing the effect of rehearsal on primacy. Finally, it is necessary to state precisely what constitutes the “interfering events” mentioned above. Since storage interference is assumed to be the basic cause of forgetting, an interfering event for a particular item should occur when a replica of some other item is stored. A question remains concerning the relative amount of interference from a presentation and storage of a replica of a new item as compared with the amount from storage of replicas from rehearsal of other previously stored items. At first, it will be assumed that.only presentation of a new item acts as an interfering event and that replications are not interfering. Later, some alternative assumptions will be discussed. For now, the forgetting rule will be as follows: When an item is presented, each other item in memory may independently lose one replica with probability 6. It is important to note that this is not a finite-store or fixed-slot model, with items pushed out to make room for others as they come in, but a storage interference model in which any or all of the items in memory might lose a replica when a new item is stored. The value of 6 presumably depends on item similarity. In principle, a different value of 6 could be required for each pairing of the new interfering item with those already in memory. For simplicity, the list will be assumed to be homogeneous, however, and just one value of this forgetting parameter estimated. The model may be summarized as follows: (1) On presentation, a single replica of the presented item is stored in memory. (2) This is followed by repetition of the presented item, such that k, additional replicas of that item are stored, where k, has a Poisson distribution with parameter A., (3) Repetition is followed by k, general rehearsal events, where k, has a Poisson distribution with parameter A., On each rehearsal event, one additional replica of an item is stored, and each item currently in memory is equally likely to receive the additional replica.
Replication Processes in Human Memory and Learning
211
(4) When a new item is presented, any or all of the other items in memory may lose exactly one replica with probability 6. ( 5 ) If there is a t least one replica of an item present a t the time of its test, the probability of a correct response for that item is 1. If it has no replicas, the probability of a correct response t o it is chance. The chance rate is assumed to equal l/n, where n is the number of response alternatives.
C. FITOF THE MODELTO THE DATA A chi-square minimization procedure was used t o estimate values of the three parameters, A,, A, and 6, so that a best fit of the model t o the data would be realized. This procedure is a common one, and its properties have been discussed in detail by Atkinson, Bower, and Crothers (1965). The minimization process gives not only estimates of the appropriate parameter values, but also an index of the statistical goodness-of-fit of the model to the data. Unfortunately, analytic expressions for the predictions of recall performance are not readily derived from the model. Thus, fairly complicated techniques using a high-speed computer were required for the parameter estimation. These techniques, and a validation of them, have been described in detail elsewhere (Bernbach, 1969). Values of the parameters were selected so as to minimize a chi-square measure of the difference between predicted and observed performance, simultaneously using the data for list lengths of 5, 6, 7, 8, 11, and 14. The best-fitting predicted curves were compared with the data in Fig. 4. The estimated values of the parameters were as follows: 6 = 0.98; A, = 2.65; A, = 2.83. The goodness-of-fit measure was xZ = 46.9 with 48 df. Since the value of the x2 statistic was no greater than its expected value, which is equal t o the number of degrees of freedom, this is an excellent fit. The fit is particularly impressive in view of the fact that six different forgetting functions were fit simultaneously with a single set of three parameters. The parameter values are of some interest. The probability 6 of a replica being lost because of an interfering event is almost 1. This means that almost every time a new item was presented in this experiment, every other item in memory lost one replica. The values of A, and A, indicate that an average of slightly more than five rehearsals (counting both item repetition and general rehearsal) occurred during the 2-second presentation interval, along with the initial coding and storage of a replica of the item. Particularly if rehearsal is covert, or if it is a replication process below the level of awareness, this does not seem to be excessive. Further, it is instructive to note that, in lieu of instructions t o the contrary, the subjects appeared to divide their time about equally between the two kinds of rehearsal.
Harley A. Bernbach
212
D. EXPLANATION OF SERIAL POSITION PHENOMENA While it is gratifying that the mathematical model fits the data, the power of the replica theory lies in its abilityto explain memory phenomena in terms of the processes involved. Therefore, this section will discuss the relationships between the Phillips et al. data and the processes assumed by the theory. The quantitative model makes it fairly simple to pinpoint the “causes” of various features of these data. The first feature of the data to be discussed is one that, in a sense, is their most obvious feature. This is the fact that it was possible to obtain systematic and relatively smooth forgetting functions for these
.7 -
.6 -
L.7
--
--
0
L =8
*:
FIG.4. Observed and theoretical serial position curves for Phillips et al. ( 1967) study.
Replication Processes in Human Memory and Learning
213
list lengths and to obtain them within the time periods involved (i.e., less than 30 seconds retention interval for any item). Of course, the replica theory predicts smooth forgett.ing functions in this type of experiment. Its assumptions about probabilistic interference with stored replicas as new items are presented guarantees this, as shown earlier in Pig. 2. Further, the theory predicts that, in general, forgetting of an item is directly related to the number of other items intervening between its presentation and test. This empirical finding is generally taken for granted, and virtually all current theories of verbal learning and memory expect and explain it. This was not always the case. Only within the past decade has shortterm memory (as studied with immediate memory or memory span techniques) ceased to be treated as a phenomenon completely unrelated to more general memory tasks such as paired-associate learning. The current trend toward considering short-term memory in more general theories of verbal learning owes a great deal to the important discussion by Melton (1963) of the implications of short-term memory research for more general theories of memory. The reason this is mentioned here is that the replica theory makes no distinction a t all between memory and learning. Learning is simply considered to be the effect of repetition or practice on memory. Thus, the details of the processes involved in short-term forgetting provide the basis for the more general replica theory of memory. The effects of repeated trials on memory will be discussed in some depth later in this chapter. The details of the experimental results follow from the details of the assumed memory processes. For example, the observed S shape in the forgetting function is the shape predicted by the model. This is because of the assumption that, for each item in memory, one replica a t most is lost when an interfering item is presented. Thus, this feature of the data is explained by the model as resulting from the nature of replica loss from stacks of replicas in memory. The general shape was shown in Fig. 2 in connection with the presentation of the assumptions about replica loss. The primacy effect is a direct result of the theory’s assumptions about general rehearsal. The first item presented in each list had uninterrupted use of the entire general rehearsal period t o store additional replicas. Items presented later in the list have t o share this t,ime period with rehearsal of other items. This follows from the assumption that general rehearsal involves a random selection from the pool of to-be-remembered items still in memory. Thus, early items have an opportunity t o build up greater resistance to forgetting than do later ones. This resistance to forgetting for early items must be great enough to
214
Harley A. Bernbach
counteract the opposing effect of the number of intervening items. That is, since more other items intervene before test of early items in the list, the early items have greater opportunity to lose replicas. With appropriate weighting, opposing effects such as these lead to a U-shaped function, with a primacy effect evident along with a recency effect. The weighting of these effects provided by the assumptions of the replica model is an appropriate one in this regard, as is clear from the shape of the predicted forgetting functions in Fig. 4. On the other hand, by assuming that items were rehearsed simultaneously rather than one at a time, parameter values could not be found such that that model would predict as large a primacy effect as evidenced by the data. This should not be taken as evidence against simultaneous rehearsal, however ;the outcome is strongly dependent on the other assumptions of the replica theory. Within the context of the theory, however, these data bolster the assumption that only one thing at a time is occurring in the memory system. The final experimental finding remaining to be explained is the PI phenomenon. At any serial position on the graph, performance is poorer the longer the list, despite the fact that the number of intervening items, and therefore the number of interfering events, remains the same. An effect such as this is generally described as PI, i.e., other items presented prior to an item are “causing” greater forgetting of that item. The explanation of this PI effect given by the replica theory is not that more replicas of an item are lost because of the presentation of other earlier items. I n fact, the probability of losing any number of replicas is independent of list length as long as the number of intervening items remains constant. However, forgetting, defined as losing all replicas of an item, depends not only on replica loss but also on the number of replicas at the start of the retention interval. As pointed out in the discussion of primacy, the earlier an item occurs in a list (or equivalently, the fewer the number of other items preceding its presentation), the longer the time available for general rehearsal and the greater the number of replicas stored for that item. Thus, the item in the shorter list, being an earlier item, exhibits greater resistance to forgetting. Similarly, an item in a longer list starts its retention interval with fewer replicas and gives every appearance of PI from other items presented earlier. It is clear that, according to the replica theory, intralist PI and primacy are precisely the same phenomenon. Both are caused by differential amounts of general rehearsal of any item as a function of the number of to-be-remembered items in memory. This view contrasts with theories that minimize the role of rehearsal, supposing that primacy is an anchoring effect or the like while PI is allegedly caused by the effect of prior
Replication Processes in Human Memory and Learning
215
associations formed on the learning of each item. However, theories that stress rehearsal or postpresentation processing in this type of memory task, e.g., the buffer model of Atkinson and Shiffrin (1968), claim the same equivalence of primacy and PI as does the replica theory.
E. ASSUMPTIONS ABOUT INTERFERING EVENTS In the original discussion of forgetting in the replica theory, loss of replicas was related to a purely theoretical construct, the interfering event. When formalizing the theory for the mathematical model, the additional assumption was made that an interfering event occurred only on presentation of a new item to be stored in memory. This assumption was not arbitrary, but was required by the model’s analysis of the data. To show this, let us consider some alternative assumptions about interfering events. In the first place, one might assume that an interfering event occurs whenever a replica of another item is stored, whether on first presentation or on rehearsal of the other item. This assumption can be incorporated in the model, and it seems as reasonable as the assumption that was finally adopted. However, the model did not fit the data when the forgetting rule was modified to read that each item in memory loses one replica with probability y whenever another item is presented or rehearsed. This constraint made it impossible to find parameter values such that the predictions of the model closely approximated the data. It appeared that an attractive intermediate position on this issue would be to assume presentation of a new item t o cause loss of a replica for each item in memory with probability S, while rehearsal of an item causes the other items t o lose a replica with probability y . The model just rejected is a special case of this general model, one in which it is assumed that y = 6. The model that was finally adopted is also a special case of the general model, one in which y is assumed to equal zero. The x2 minimization procedure for the above general model yielded parameter estimates of S = .98 and y = .03. Further, the value of the x2 goodness-of-fit statistic was 46.2, hardly any better than the value of 46.9 obtained for the model assuming y = 0. That is, the estimates were virtually unaffected when different replica loss probabilities were permitted for repetition of the current item and general rehearsal from the list. Under the circumstances, it seemed parsimonious to adopt the simpler assumption that only presentation of a new item results in an interfering event in this type of experiment. IV. Some Other Short-Term Memory Tasks I n the previous section, some details of the replica theory were described in the context of a single empirical paradigm, that of the Phillips et al. ( 1967) experiment. It is now possible to show how the replica theory
216
Harley A. Bernbach
explains data from other short-term memory experiments and to point out some further implications of the theory.
RECALL A. SINGLE-ITEM One of the most common procedures used to study short-term memory is the one first reported by Peterson and Peterson (1959). Basically, this procedure is as follows. A single item, say a nonsense syllable, is presented to the subject for a time long enough to be certain that he has perceived the item. This presentation period is followed by a retention interval containing a Jiller task that is designed to prevent or at least minimize rehearsal. An example of such a filler task is to have the subject count backward by threes from an arbitrary three-digit number that has been presented immediately following the nonsense syllable. The length of the retention interval is a primary independent variable and might be anywhere from 0 to 30 seconds or longer. The basic findings in such experiments are that recall performance is nearly perfect when the test follows the presentation immediately, but this performance declines with longer retention intervals. This results in a forgetting function (plot of recall probability versus length of retention interval) showing no discontinuities and decreasing monotonically from a starting point at probability one. Further, these forgetting functions often show the reverse S shape found by Phillips et al. (1967), or at least they do not drop quickly enough at the start to be fit well by an exponential decay function. The replica theory treats the presentation and storage of an item the same way here as in the simple list experiment. A single replica of an item is assumed to be stored when the item is presented. The remaining time before the onset of the retention interval, cued usually by the presentation of the number from which to start counting backward, is spent rehearsing the item, building up additional replicas. Again, this rehearsal is assumed to be probabilistic, and the number of replicas added is a random variable with a Poisson distribution. This .produces the appropriate form for the early part of the forgetting function regardless of what is assumed about the nature of interfering events during the retention interval . One of the interesting implications of these assumptions about presentation and rehearsal concerns the results that would be predicted in an experiment in which two items were presented, consecutively, before the retention inverval began. The presentation of the second item could cause the loss of a t most one replica of the first item. On the other hand, more general rehearsal time would be available to the first item than to the second during their presentation periods. Hence, one can well expect the first item to start the retention interval with more replicas
ReplicationProcesses in Human Memory and Learning
217
than the second, and therefore with more resistance to forgetting. Peterson and Peterson (1962) reported such an experiment and found better recall performance for the first of the two items presented. During the retention interval, it is assumed that the subject divides his time between the filler task (e.g., counting backward) and rehearsal of the to-be-remembered item. Forgetting, then, is caused by storage interference that results from the storage of filler material such as numbers or groups of numbers. Since these interfering “items” are not as similar to the to-be-remembered item as other members of a relatively homogeneous list would be, they should be less disruptive than the other items in such a list. That is, since the value of 6 depends on similarity, one would expect a considerably smaller value of 6 in a single-item recall task than in the experiment discussed in Section 111. I n the only singleitem recall experiment to which a quantitative version of the replica theory has been applied, a 6 equal to .25 was obtained, as compared to .98 for the prior list experiment. The single-item recall model will be developed later when the effect of repeated presentations on short-term memory is considered. The assumption that the subject spends some of his time rehearsing can be used to interpret a finding about this type of memory task first reported by Posner and Rossman (1965). They used filler tasks which varied in the amount of information (measured in terms of information theory) per unit of time that the subject had to process during the retention interval. If it takes more time to process a greater amount of information, it follows that there would be less time available for rehearsal as the filler task contains more information. Since rehearsal is assumed to build up more replicas of the to-be-remembered item and, therefore, greater resistance to forgetting, the replica theory would predict recall performance to be inversely related to the information content of the filler task. This was Posner and Rossman’s (1965) finding, and one that has been replicated by Crowder (1967).
B. MEMORYFOR INDIVIDUAL PAIRED ASSOCIATES By stretching slightly the boundary condition on this discussion (i.e., to treat to-be-remembered items as units), it is possible to consider memory for paired associates. I n fact, the Phillips et al. (1967) experiment discussed previously involved a paired-associate task. That is, the positions of the items were the stimuli, the four colors were the responses, and the information to be remembered was the association between them. This association was the “item” considered by the replica theory as applied to those data. Obviously, the view that a paired-associate item is unitary will not suffice for a large class of experiments (e.g., those involving substantial response learning). Nevertheless, it happens that
218
Harley A. Bernbach
many of the fundamental empirical findings in the area of memory for paired associates can be explained by the model while treating the paired associate as a unitary item. One of the common methods used to study short-term memory for paired associates is the probe technique introduced by Murdock (1963b). I n his experiments, lists of paired associates (typically word-word pairs) were presented once each for study, and after each list’s presentation the subject was tested for cued recall of just one pair in the list. The basic paradigm is similar to that of the Phillips et al. (1967) study, so Murdock’s findings of recency effects and of PI have already been accounted for earlier by the replica theory. An additional observation of Murdock’s requires further discussion here. Over a range of list lengths, Murdock (1967) observed in his forgetting functions an asymptote in recall probability greater than zero. Earlier, Murdock (1963a) had attributed this finding to the existence of separate short-term and long-term memory stores, so that the asymptote reflected the proportion of items that had entered long-term memory when presented instead of a short-term store from which they could be forgotten. However, more recently, Murdock (1967)has pointed out that such an asymptote could be predicted from a model incorporating a single memory store as long as some type of postpresentation processing permitted items to be “learned” as well as forgotten during their retention intervals. The replica theory, because of its assumptions about rehearsal, can make the prediction of such an asymptote. For any number of replicas at the start of a retention interval, the forgetting process is, mathematically, a random walk. That is, items may either gain one, lose one, or keep the same number of replicas during each small time interval At. Further, this is a specific kind of random walk process, known as the “gambler’s ruin,” which has received considerable attention in probability theory. As Feller (1957, Chapter 14) demonstrates, such a process may very definitely have an asymptotic loss probability greater than zero, with appropriate values of the process parameters. Murdock’s findings concerning asymptotes are, therefore, completely consistent with the application of the replica theory to memory for individual paired associates.
C. CONTINUOUS PAIRED-ASSOCIATE LEARNING For completeness, we will discuss another of the most common techniques for studying short-term memory and learning, the continuous paired-associate learning paradigm used by Peterson, Saltzman, Hillner, and Land (1962). The subject was presented with a very long series of “trials.” On each of these, he saw either a stimulus and response pair
Replication Processes in Human Memory and Learning
219
(study) or a stimulus alone (test).When a stimulus alone was presented, the subject attempted to recall the response that had been paired with that stimulus earlier. By varying the number of other items intervening between the study trial and the test trial, Peterson et al. (1962) were able to obtain very strict control over the conditions of memory and learning. The application of the replica theory to this experimental paradigm is much more complicated than application to the tasks discussed so far. For example, both test trials and study trials are interfering events, but they may differ in the probability that they will cause loss of replicas. Thus, it would be necessary to keep a record of the exact nature of the sequence of intervening items in order to make a precise statement about forgetting during any interval. H:owever, the general finding that recall performance decreases with the number of intervening items would be predicted by the replica theory in any case. Rehearsal of items during subsequent presentation of other items is more complicated in this procedure than in the Phillips et al. (1967) task, for example. With the exception of the first few items in the long series, rehearsal of all to-be-remembered items is impractical, if not impossible. When the 100th item is being presented, it is not clear what the probability is that the 98th or 99th item will be chosen for general rehearsal. Thus, while one would assume items to be rehearsed during presentation or test of subsequent items, the amount of rehearsal cannot reasonably be related to the status of other items in the series and must be left as a free parameter. Such rehearsal will, however, make the forgetting function at least appear to be approaching a non-zero asymptote. This is commonly the appearance of empirical forgetting functions from continuous paired-associate learning. Another prediction of the replica theory is borne out in this type of experiment. Since specific rehearsal of the presented item is assumed, the forgetting function should have an S shape, or at least drop more slowly than an exponential curve. Peterson and his colleagues have reported a number of experiments indicating this finding.
D. A CONTINUOUSMEMORYTASK Atkinson and Shiffrin (1968) based a great deal of the discussion of their “buffer model’’ on data from a continuous memory task. Their task differed in one very significant way from the continuous paired-associate learning task of Peterson et al. (1962). They used a small set of stimuli (either four, six, or eight in the set) and changed only the responses paired with them to create new items. Thus, once a subject had given a response to a stimulus, he was immediately shown the same stimulus paired with a new response. This new pair was the item to be remembered.
Harley A. Bernbach
220
A t any given time, the subject had to hold in memory only s items, where s is the size of the set of stimuli.
The basic data from one such experiment (Atkinson, Brelsford, & Shiffrin, 1967) are shown as the observed points in Fig. 5. Note that the shapes of the forgetting functions are markedly different from those found in other short-term memory procedures, particularly with regard to those items having very few intervening items between study and test. The typical recency effect is that shown in Fig. 2 ; i.e., an S-shaped curve. This was not only observed by Phillips et al. (1967), but has been repeatedly observed by Peterson and his colleagues. The curves in Fig. 5 , on the other hand, show a large loss after just one intervening item, and then a rapid turn and leveling off. v)
$1.0
z
g
.9
v)
.8 I-
a
0”
.7 .6
.5 .4
5 .3 .2 0
8
.I
LL
a
0
I
2
3
4 5 6 7 8 9 1011 1 2 1 3 1 4 1 5 1 6 1 7 NUMBER OF INTERVENING ITEMS
FIG. 5. Observed and theoretical forgetting curves for stimulus set sizes, 4, 6, and 8 (data from Atkinson et al., 1967).
8 =
A possible explanation for this discrepancy between the Atkinson et al. (1967) data and the predictions of the replica theory is that an important boundary condition for the applicability of the theory has been violated. The violation is an interesting one, however, and one that can be readily described analytically. Specifically,this task probably violates the condition that structure not be a relevant consideration in the processing involved in the task. Most memory theorists hold that structure is an aid to memory. Items held in a structured form may therefore be assumed to be much more resistant to storage interference than are ordinary replicas. Thus, if a subject can choose between creating structure in memory or rehearsing individual items, the former is the better strategy. (This assumption undoubtedly has many important implications. Lacking even a definition of “structure,” however, I have begged
Replication Processes in Human Memory and Learning
22 1
the issue by proclaiming the effects of structure to be outside the scope of this chapter.) Given this assumption about the effect of structure as well as the replica theory, however, a very simple model for the continuous memory task can be constructed, which can be called the structured register model. First, since there are only s stimuli involved in the experiment, it is assumed that the subject creates in memory a structured register, containing the s stimuli with their current responses. Items in the register are assumed to be less susceptible to loss than are unrelated replicas. The existence of such a register, or ordered list of stimuli, is given some support by unsolicited verbal reports from a number of subjects in these experiments who reported on the strategies they adopted during the experimental sessions (Brelsford, personal communication). The model proceeds with the following assumptions. Upon presentation of an item, the subject stores just one replica of this item in memory. As in normal short-term memory experiments, this replica may be lost, with probability 6, on presentation of each succeeding item. Instead of rehearsing, however, the subject attempts to place the new response into the appropriate slot in the register. He accomplishes this with a probability c, that depends on the size of the register. The item in the register can be lost on presentation of succeeding items with probability y . Finally, a correct response will occur if either the replica or the item in the register is present at the time of test. Figure 5 shows the results of fitting this model to the experiment by Atkinson et al. (1967), which used values of s equal to 4, 6, and 8. As can be seen, the fit is quite good. The estimated value of 6 was found to be .91, which is consistent with the value of .98 obtained for the Phillips et al. (1967) study. As predicted, however, the value of the probability of forgetting from the structured register was very much smaller than 6, y equaling .02. The values of cs, .60, .51, and .45 for s = 4, 6, and 8, respectively, are not as instructive, reflecting simply the greater difficulty of placing an item in the correct space in a larger register. One of the particularly interesting aspects of the Atkinson et al. (1967) continuous memory task is that subjects can be forced to rehearse, storing additional replicas of presented items, even though they do not naturally do so. If rehearsal is forced, the S-shaped recency curve of more traditional memory tasks should intrude upon the data from this dontinuous memory task. I n a study by Brelsford and Atkinson (1966) that is discussed in some detail by Atkinson and Shiffrin (1968), this was effectively done. The experiment was a study of the effects of overt versus covert rehearsal in a continuous memory task. The experimental procedure was as follows. The covert rehearsal condition was identical to the Atkinson et al. (1967) task. That is, during
Harley A. Bernbach
222
the time that a stimulus-response pair was being presented the subject simply “studied” it quietly. Brelsford and Atkinson assumed that rehearsal occurred covertly under these circumstances. I n the overt rehearsal condition, the subject was required to read the item aloud twice during the study portion of the trial. According to the structured register model, no rehearsal at all went on in the “covert rehearsal” condition. Rather, after storing a single replica upon presentation of the item, it is assumed that the subject immediately directed his attention t o the register. The important aspect of the “overt rehearsal” condition was that the subject was required to read the item aloud twice. Since the basic assumptions of the replica
I-
OVERT COVERT
a .2
2 0 U
_I
a ~
I
1
I
--I
I
0
o I
I
I
I
I
I
I
NUMBER OF INTERVENING ITEMS
FIG. 6. Observed and theoretical forgetting curves for “overt and covert rehearsal” (data from Brelsford & Atkinson, 1966).
theory are not being abandoned while the structured register model is being used, the assumption is made that the subject stores two replicas of the item before he attends to the register. It is necessary to add the consideration that both the test and study portions of intervening trials are interfering events, in the original sense that they can cause loss of a replica from the stack. This was irrelevant in the previous discussion of the structured register model. As long as there was only one replica to lose, 6 represented the probability that it was lost on either the test or study portion of the intervening trial. In the present case, this must be considered in order to permit some forgetting after one intervening item, given that there were two replicas at the start as shown in Pig. 1.
Replication Processes in Human Memory and Learning
223
The results of the application of the model to the overt-covert experiment are shown in Fig. 6, which shows observed and theoretical forgetting functions for both conditions. Again, the fit is quite good, although the only difference between the two theoretical functions is the number of replicas assumed to be stored in addition to response replacement in the register. I n particular, the larger number of replicas provides an improvement in performance in the overt over the covert condition, and this difference disappears as the replicas are completely forgotten. More important to the replica theory, the overt condition shows quite cle9rly the predicted S shape in the forgetting function.
V. Repeated Presentationsand Learning In general, repeated presentations of an item lead to an improvement in recall performance after any specified length of time. This is as true for individual items in short-term memory as for the members of a list in paired-associate learning. The replica theory, because of its assumptions about the building up of the stack of replicas by repetition and rehearsal, is well suited to account for these phenomena. I n this section, this will be demonstrated by two applications of the theory, to singleitem recall in short-term memory and to paired-associate learning. PRESENTATIONS IN SINGLE-ITEM RECALL A. REPEATED The effect of repeated presentations on short-term memory was demonstrated clearly by Hellyer (1962). I n his experiment, consonant trigrams were given either one, two, four, or eight consecutive presentations before retention intervals of 3, 9, 18, and 27 seconds. The data from this study are shown in Fig. 7. The basic result is that there is less forgetting at any retention interval when there have been more presentations. Hellyer’s finding has important implications for the relationship between memory and simple verbal learning, such as paired-associate learning. Such data as these suggest, as was stated earlier, that simple verbal learning is nothing more than the effect of repeated presentations on short-term memory. For example, if the data a t a retention interval of 27 seconds in the Hellyer experiment were plotted with the number of repetitions as the abscissa, a typical growth function, or learning curve, would result. It will later be shown that the features of the replica theory that permit it to account for data like Hellyer’s also permit it to give an account of paired-associate learning. The obvious effect of repeated presentations on memory, according to the replica theory, is that more replicas of the repeated item are added to the stack of replicas already in memory. This in turn leads to an increased resistance to forgetting during the retention interval. It remains to be shown that this leads to the precise changes in the forgetting
Harley A. Bernbach
224
function that have been observed empirically. To do this, a quantitative version of the replica theory for single-item recall will be presented, and real data will be compared with its predictions for the effect of repeated presentations. For reasons that need not be discussed here, the nature of the filler tasks during the retention interval in the Hellyer (1962) study preclude ready application of the mathematical model to those data. Fortunately, data was available from an unpublished study by Fuchs and Melton (personal communication) that was similar in method and results to Hellyer’sexperiment, but that lends itself more readily to precise quantitative analysis. They presented sets of three four-letter words either one,
1
2 PRES.
I PRES.
I
I
I
3
9 RETENTION
I I 18 27 INTERVAL (SECONDS)
FIG.7. Forgetting curves in single-item recall for 1, 2, 4, and 8 presentations (from Hellyer, 1962).
two, three, or four times and then tested recall at retention intervals of 0 , 4 , 8, and 16 seconds. Their data are shown as the “observed” points in Fig. 8. (The entire set of three words is treated as the “item” here and Pr (C) indicates the proportion of items for which all three words were correct and in the right order.) The replica model for single-item recall is quite a bit more complicated than the one for the short-list procedure of Phillips et al. (1967). In effect, two different tasks must be considered, one at presentation of the item and the other during the retention interval. This is because the retention interval is filled with a task very different from the learning of items similar to the to-be-remembered item. (In the Fuchs and Melton study, the filler task was counting backward by three.) In particular, the replica model assumes that the subject divides his time between storing numbers in memory and rehearsing the to-be-remembered item.
Replication Processes in Human Memory and Learning
225
To show how the model handles this, consider a small time interval, say 4 seconds. This may be divided up into a large number of smaller time intervals of duration At seconds. During any At-second interval, it is assumed that the subject may rehearse, or that his counting may lead to storage interference, or that neither may happen. By reasoning similar to that used before in justifying use of a Poisson distribution in the replica model, it is assumed that during any time interval there are k, events, where k, has a Poisson distribution with parameter A,. Each event may lead to addition of a replica from rehearsal with probability a, loss of a replica from storage interference with probability (1- a) 6, and no change otherwise. The size of the time interval chosen is irrelevant
4 PRES.
3 PRES. r
l I PRES. o .4L
u 0
4
8
16
RETENTION INTERVAL (SECONDS)
FIG.8. Observed and theoretical forgetting curves for 1, 2, 3, and 4 presentations (from Fuchs & Melton, personal communication).
to the fit of the modef;for example, the value of A, for a 4-second interval is just twice that for a 2-second interval. This trade-off between time and the Poisson parameter is one of the many useful properties of the Poisson distribution. The model for presentation of an item is the same as the one used previously. That is, the first presentation of an item results in the storage of 1 + k , replicas, where k , has a Poisson distribution with parameter A,. This is equivalent to assuming that items are rehearsed the same number of times on any presentation. This assumption yielded a prediction of a much greater effect of repetition, in terms of reducing the amount of forgetting, than was observed. Thus, the mathematical model incorporating this assumption yielded an extremely poor fit to the data. It was clearly necessary to assume less rehearsal on succeeding presentations in order to make the model fit the Fuchs and Melton data. The assumption eventually adopted was only one of many that would have done equally well; thus, the details of this assumption do not appear to be critical. It was assumed that the rehearsal parameter A, underwent
226
Harley A. Bernbach
an exponential decrease as a function of the successive presentations. That is, we assume that each presentation leads to the storage of 1 + k," replicas (where the superscript n refers to the number of the presentation), k," has a Poisson distribution with parameter A,", and A", is determined by the equation
Both A,' and 6' are free parameters to be estimated from the data. The best-fitting forgetting functions, using a x2 minimization procedure on the four curves simultaneously, are shown in Fig. 8. The considerable variability in the data makes it difficult to see how well these curves fit the data. However, an indication of the goodness-of-fit is given by the obtained minimum value of the x2 statistic, which was 14.7 with 11 df. This is definitely a respectable fit. The estimated parameter values were cc = . 2 2 , 6 = .25, and 6' = .06; the time-dependent parameters were A,' = 7.9 for a subject-paced presentation interval, and A, = 1.7 per 1-second interval. The extremely low value of 6' suggests that virtually no rehearsal accompanies presentations following the first, and that their only effect is to provide one additional replica per presentation. A model assuming 6' = 0, however, produces a significantly though very slightly poorer fit. One other striking feature of the forgetting functions generated by the replica model is that there is a more pronounced S shape in the recency curve for more presentations. This is to be expected, since greater replica buildup at the start, whether from presentations or rehearsal, leads to a longer lasting transient based on stepping down the stack of replicas, one at a time. This relationship between presentations and the shape of the recency curve is especially clear in the curves of Fig. 7 , from the study by Hellyer (1962).
B. PAIRED-ASSOCIATE LEARNING I n a typical paired-associate learning experiment, subjects learn a list of paired associates, say 10 items long, by repeated presentations of the entire list. I n the interests of brevity, this discussion will be limited to experiments involving the anticipation procedure. I n the anticipation method, a single stimulus is shown to the subject, and he must anticipate the correct response. The correct stimulus-response pair is then shown to give information about the correct response. This is followed by test of the next stimulus in the list, and so on until the list is completed. When all items have been presented the list is usually shuffled and the entire process repeated. A complete run through the list, with each item having a test and study interval, is referred to as a trial.
Replication Processes in Human Memory and Learning
227
As expected, the proportion of correct responses increases as a function of the number of trials. The subject begins on Trial 1 with only chance performance, but after several trials eventually learns to make the correct response to each stimulus. Early theories of paired-associate learning referred to this process as “acquisition” of the list, but most modern theories treat this as a case of the effect of repeated presentations on memory for the individual items in the list. The replica theory falls into the latter class. Since the theory deals with individual items, it is first necessary to redefine a trial and clarify some terminology. For each item, a trial begins with presentation of the stimulus-response pair, continues with the test and study of other intervening items, and ends with a test of the original item, The function relating the probability of a correct response to the number of tests is the learning curve. Since the first test precedes the first trial, the learning curve must start at chance performance. A model that is stated in terms of individual items in this way can be as readily applied to repetition in continuous paired-associate learning as to paired-associate list learning. For ease in discussing the theory here, the shuffling between presentations of the list will be ignored. Thus, for a 10-item list, it will be assumed that a trial (for a specific item) always contains 9 other intervening items between study and test. The same model is applicable to repetition of an item in a continuous pairedassociate series when that item always has 9 intervening items between presentations, and an anticipation method is used for each item. According to the replica theory, a trial proceeds as follows. On presentation of the item, 1 +k, replicas are stored, where k, is a Poissondistributed random variable with parameter A,. Each intervening item is assumed to cause the loss of one of these replicas with probability 6 . Further, during presentation of each intervening item, general rehearsal is assumed to add k, replicas of the original item, where k, has a Poisson distribution with parameter A,. The next presentation of the item is assumed to add an additional 1 + k, replicas, starting the process again. On the test, the probability of a correct response is 1 if there are any replicas in memory, and chance otherwise. To demonstrate the performance of the model, we will assume arbitrary values for the parameters. These are A, = 1.0, A, = .5, and 6 = . 5 . Further, for simplicity in some of the calculations to follow, it is assumed that the probability of a correct response by chance, when there are no replicas of the item in memory, is equal to zero. The learning curve generated by the model under these circumstances is shown in Fig. 9. This learning curve is very much like that typically found in simple paired-associate learning experiments. A rather indirect way will be chosen to show how the replica model’s
Harley A. Bernbach
228
predictions compare with real paired-associate data. This method is used primarily because the replica model for paired-associate learning is extremely complex mathematically, and generating predictions from it is a time-consuming task even on a high-speed computer. The grid-searching procedures used for estimating the best-fitting values of the three parameters generally require calculation of the predicted data literally hundreds of times. Thus, fitting the model to the data is a prohibitively laborious process. What will be done instead is t o show that Monte Carlo “data” generated by the replica model, with arbitrarily chosen parameters, compare very well with the predictions of a well-known model for paired-associate learning which evidence suggests is adequate. This model is a three-state Markov model proposed by Atkinson and Crothers (1964). These investigators assumed that an item is in one of three states
!;I/ a .2
.I 0
I
I
2
3
I
4
I
5
I
6
I
7
I
8
I
9
I
1
I
0
TRIAL NUMBER
FIG.9. Theoretical learning curve for paired-associate learning.
a t the time of test: state G, state S , or state L. State G is the guessing state and is the state of all items not in memory. The probability of a correct response in this state is chance, which in the current example is equal to zero. States S and L represent short- and long-term memory, respectively. An item in state S is in memory, so that a correct response will occur, but it is susceptible t o forgetting, with probability f , during the subsequent trial. State L,on the other hand, is an absorbing state ; an item in state L is “learned” and cannot be forgotten during the course of the experiment. On any trial, it is assumed that an item not yet in state L may be learned, moving t o state L, with probability a. This model is represented mathematically by the following transition matrix and response probability vector
Replication Processes in Human Memory and Learning
L
1
0
0
s
a
(1-a)(l-f)
(1 - a ) f
(1 - a ) ( l - f )
(1 - a ) f
G
Pr (correct) = 1 - f ( 1 - a)"-'
229
1 ill 101
(3)
where n is the number of trials on an item. Values of a and f were estimated to minimize the sum of the squared differences between the points on the learning curve generated by the replica model and by the threestate Markov model. The values obtained were a = .312 and f = .857. The learning curve generated by the replica model is almost identical to the best-fitting (by the least-squares criterion) prediction of the threestate model, as shown in Table I. Table I also shows the best-fitting exponential learning curve, which is not as good an approximation of the replica model data as is the curve predicted by the three-state model. The close correspondence between data generated by these two models extends beyond the simple learning curve, including sequential statistics as well. One way to show this is to compare the predictions of the two models for the probability of each exact response sequence (e.g., correcterror-correct) on any particular response triple. I n this case, since the probability of an error is assumed equal to one on the first test, there is no information gained by including Test 1. On the other hand, the prediction becomes considerably more complex for the replica theory on later tests. Thus, the response triple on Tests 2, 3, and 4 was chosen.
Harley A. Bernbach
230
TABLE I
PREDICTED VALUESOF
THE
LEARNING CURVE
Probability of a correct response Test
Replica model
1 2 3 4
5 6 7 8
9 10
Three-state model
Exponential
.ooo
.ooo
.ooo
.402 .607 .731 .812 .866 .904 .931 .949 .963
.411 .595 .722 .so9 .868 .go9 .938 .957 .971
.354 .583 .731 .826 .888
.928 .953 .970 .981
The comparison of probabilities for each of the eight possible sequences is shown in Table 11. Again, parameter values of A, = 1.0, A, = .5, and 6 = .5 were adopted for the replica model, and the values of a and f obtained from the learning curve fit were used for the three-state Markov model. As before, the comparison between the two models is extremely close. Since the three-state Markov model has often been shown to be quite adequate as a representation of empirical paired-associate learning data, the close correspondence between the predictions of the two models TABLE I1 PROPORTIONS OF RESPONSE SEQUENCES PREDICTED BY THE REPLICA MODEL AND THE THREE-STATE MODEL
Sequence" Test 2
Test 3
Test 4
Replica model
Three-state model
.347 .006
.374 .013 .014 .020 .224 .020 .141 .195
.024
.034 .208 .034 .143 .204
C
= correct
response ; E
=
error response.
Replication Processes in Human Memory and Learning
231
appears t o indicate that the replica model is also an adequate model for paired-associate learning. It might be argued that parsimony alone should lead to rejection of the replica model since its predictions are so close to those of the simpler three-state model. The three-state model does not, however, give a good account of the details of short-term memory and forgetting, such as the precise shape of the forgetting function in the experiment by Phillips et al. (1967). On the contrary, for simple paired-associate learning, it is more appropriate t o consider the three-state model as an approximation to the replica model. That is, an item would be in state G if there were no replicas in memory, state L if there were enough replicas to make intertrial forgetting extremely unlikely, and state S if there were an intermediate number of replicas. The parameters a and f, then, would be average values adequate for gross predictions in paired-associate learning, but not adequate for predicting the details of short-term memory experiments.
VI. Some Evidence for Rehearsal Processes So far, the existence of rehearsal processes has simply been assumed, and some of the implications of that assumption have been discussed. For example, the primacy effect was explained as resulting from differential rehearsal. I n this section, studies giving independent evidence for rehearsal, or some similar form of postacquisition processing, will be discussed. No single one of these studies provides very convincing evidence by itself, since each could probably be explained in some other manner. However, the total picture definitely favors the view that rehearsal processes are fundamental to short-term memory. A. SHORT-TERM MEMORYWITH CHILDREN A few years ago, a study of short-term memory with young children was reported (Atkinson, Hansen, & Bernbach, 1964) that used a procedure very similar to that used by Phillips et al. (1967). The subjects were 4- and 5-year old children. On each “trial,” eight different cards bearing pictures of animals were shown to the child, one a t a time, and then placed face down on the table in front of him. After presentation of the eighth card, a duplicate of one of the eight cards was shown to the child, and his task was t o point t o the card on the table that matched the cue card. The forgetting function, obtained by plotting the probability of a correct response against the serial position of the card in the list, showed neither of the serial position characteristics that have been attributed to rehearsal in our previous discussion. That is, the recency portion of the curve did not exhibit the S shape that the replica
Harley A. Bernbach
232
theory would predict assuming repetition of the item being presented, and there was no primacy effect, as would have been predicted from general rehearsal of items in the set. To explain these differences between short-term memory by children and adults in very similar tasks, it was hypothesized that the children did not label and rehearse the items, but rather remembered them in some more direct, image-like manner. This hypothesis was tested experimentally (Bernbach, 1967b), using the procedure of Atkinson et al. (1964) with 5-year olds, except that four difficult-to-label colors were substituted for the animal pictures.
u)
W
1.0.
v)
z
8 u)
W
a
t W 0
a a LL
0.5’
0
z
s
01
I
A
I I I B M 1 POSITION IN LIST
Fro. 10. Serial position curves with and without labels in short-term memory with young children.
The primary variable in the experiment was whether or not the children were given labels for the colors. I n one condition, the cards were simply presented visually to the subject and then laid face-down on the table in front of him. For the test, the experimenter pointed to the back of one of the cards and asked the subject to recall by selecting the color that matched that of the test card from a wheel containing the four colors. I n the other condition, the children were taught labels for the colors (the names of similar colors) as pretraining. On presentation, the subject had to name the color on each card before it was laid down. Similarly, on the test he had to name the color on the card pointed to and point it out on the color wheel as well. The results are shown in Fig. 10. As predicted, in the condition with labels both primacy and the S-shaped recency curve are evident. I n the
Replication Processes in Human Memory and Learning
233
condition with no labels, there is no S shape and no statistically significant primacy effect. These data can be explained as follows. If 5-year olds do not naturally label these colors when memorizing and thus do not have the type of coded internal representation used for rehearsal processes, neither of these rehearsal-associated phenomena would be observed. On the other hand, these subjects were able to label the colors, as evidenced by the reliability with which they named them on presentation. Once labels were available, rehearsal mechanisms could come into play, and the rehearsal-associated phenomena were observed. One need only assume that adults naturally label items like colors to explain why adults exhibit rehearsal effects naturally.
B. PAYOFFS AND SERIAL POSITION EFFECTS Mechanisms like rehearsal should be at least partly under the subject’s control, particularly with regard to his choice of what to rehearse at a given time. If this is true, it should be possible for an experimenter to exercise some control over rehearsal by an appropriate application of payoffs. If serial position effects are caused by rehearsal mechanisms, as claimed by the replica theory, then payoffs should have an observable effect on them. An unpublished study testing this hypothesis was carried out in 1966 at Cornell University by Patricia Kupchak. She duplicated the Phillips et al. (1967) procedure, with the exceptions that the digits 0-9 replaced the color patches on the stimulus cards and list lengths of two through nine were used. Further, only the longer lists (lengths seven, eight, and nine) had items tested at all their serial positions, and the data considered are from these lists only. The shorter lists were included so that the subject could not predict in advance the length of any list. Three payoff conditions were used. In the symmetric condition, the subjects (collegestudents) were awarded six points for a correct response, regardless of the item’s serial position. To encourage rehearsal of the item being presented, in the repetition condition a correct response was worth 15 points when the item tested was the most recent one presented, and 14 points when it was an earlier one. Conversely, in the rehearsal condition, general rehearsal was encouraged by rewarding correct responses to the most recent item with only 1 point, while earlier items were worth 10 points. The point assignment was designed to equalize the expected winnings across conditions. Each point was worth one-fifth of a cent, and a subject could win as much as $1.50 above the $1.00 base pay for a 45-minute experimental session. The results are shown in Fig. 1 1 , which groups the data from list lengths seven, eight, and nine, and shows only the average across all serial positions with the exception of the first and last. If a fairly constant
Harley A. Bernbach
234
amount of total rehearsal (the sum A, + A,) is assumed, the rehearsal condition should not only increase the primacy effect, but also decrease performance on the last item in the list. Just the opposite results should be found in the repetition condition, with the last item showing better recall and the first item showing poorer recall. Performance in the symmetric conditiomshould fall between at both ends of the lists. These are the findings shown in Fig. 11, in which performance on the last item is ranked from repetition to symmetric to rehearsal, with just the opposite ordering for the first item. 1.0
’
a (L
0
.9
LL 0
z 9 I-
(L
0 Q
8 a
.8
.7
-
-* -
---o-REPETITION REHEARSAL SYMMETRIC
I
I
LAST
MIDDLE
I FIRST
POSITION IN LIST
FIG.11. Serial position curves for three payoff conditions.
C. INCENTIVES IN SINGLE-ITEM RECALL Experiments by Weiner and Walker (1966) and Weiner (1966) showed an effect of incentives on single-item recall that appeared attributable to the processing of the to-be-remembered item during the retention interval. High- and low-incentive conditions were mixed in a series of trials during an experimental session, and the temporal placement of the incentive was a critical variable. I n the first experiment (Weiner & Walker, 1966), the incentive cue accompanied presentation of consonant trigrams for short-term recall after either 4.67 or 15 seconds. The incentive cue, signaling the payoff for recall of a given item, was the background color associat,ed with visual presentation of the items. It was found that there was virtually no effect of incentive at the shorter (4.67-second) retention interval, but that recall performance was significantly better under high-incentive than low-incentive conditions at the longer (15-second) interval.
Replication Processes in Human Memory and Learning
235
This result is readily explained by the replica theory if it is assumed that the promised payoff affects the amount of rehearsal during the retention interval only. It is somewhat surprising that incentives did not also appear to affect rehearsal a t the time of presentation (i.e., no effect at the short retention interval), since such an effect was observed in the serial position study just discussed. Nevertheless, an incentive-caused increase in rehearsal during the retention interval would lead to greater resistance to forgetting a t high incentives by increasing the number of replicas in memory, and this would produce the observed effect. The possibility that the incentives might be affecting retrieval rather than processing during the retention interval is effectively ruled out by the results of a series of experiments reported by Weiner (1966). He varied the Weiner and Walker (1966) procedure by placing the incentive cue just before or consecutive with the cue for recall. I n four such studies, Weiner found no effect on recall of giving incentive information only at the time of recall. It is clear that the incentive effect had its locus during the retention interval. This is consistent with the replica theory’s previous assumption that the subject divides such a time period between attending to the filler task and rehearsing the to-be-remembered item.
D. INSTRUCTED “RELEASE”FROM PI Bjork (personal communication) has conducted a series of unpublished studies whose results fit well with the rehearsal notions of the replica theory. He used the technique developed by Murdock (1961, 1963b) to study short-term retention of individual paired associates. I n Murdock’s experiments, as in the Phillips et al. (1967) study, there were marked PI effects. That is, an item with a given number of other items intervening between its presentation and test was recalled better the fewer the number of items preceding it in the list. As noted before, the replica theory explains such PI effects as resulting from rehearsal of those preceding items (Section 111,D). If this is the case, a “release” from the PI effect should occur for any item if subjects do not rehearse the items preceding it. I n Bjork’s experiments, subjects were shown short paired-associate lists. Each list might or might not have contained a cue somewhere after the start of the list. As an example, in one study the background color on visual presentations of items was changed from white t o red at some point; and the cue was the change in color. If the cue did appear, only those items following the cue were tested. That is, subjects were instructed that the cue indicated that they could “forget about” the items they had just seen, as the item tested would be from the set to
236
Harley A. Bernbach
follow. I n our example, only an item from the set with red backgrounds would be tested. Bjork coqsistently found a release from PI following appearance of the cue. For example, suppose a subject were shown a 10-item list with the cue preceding the fifth item. His performance on the last six items of this list was the same as his performance on a simple six-item list. When no cue was shown in the 10-item list, however, performance was clearly better on the 6-item list than on the last 6 items of the longer list. Again, these results would be predicted by the rehearsal processes of the replica theory.
VII. Concluding Remarks
A. CONBIDENCERATINGS AND RECOGNITION MEMORY I have previously proposed a decision theory of retrieval from memory that is applicable to recognition memory and to confidence ratings in recall (Bernbach, 1967a).According to this theory, subjects match either the presented item (recognition) or their own response (recall) with those items in memory and base their confidence ratings on how good a match they are able to obtain. The theory also assumes that, at least with respect to the matching process, an item can be in one of only two states: state R if it is remembered, and state N if not. Thus, there will be only two distributions of the goodness of the match, one for each state. Using signal detection theory, it was shown that this retrieval model was at least as good in accounting for recognition memory data as models with more than two memory states (e.g., Wickelgren & Norman, 1966). Further, it appeared that the analysis of confidence rating data from recall experiments gave some unique support to the two-state assumption. My purpose here is to point out the compatibility of the replica theory and the two-state assumption. I n the original discussion of the stacking of replicas, a push-down stack was assumed. That is, it was assumed that the subject could “see” only the top of the stack and, therefore, base his response only on whether or not at least one replica was present. Thus, for purposes of the subject making a confidence rating, each item would be effectively in just one of two states: state N if there were no replicas present, and state R if there were one or more replicas in memory. The number of replicas in excess of one would thus have no effect on confidence ratings that would necessitate adding additional states. Because of this push-down assumption about the stack of replicas, the replica theory is compatible with my decision theory for recognition memory and confidence ratings (Bernbach, 1967a), which has proved capable of accounting for recognition data.
Replication Processes in Human Memory and Learning
237
B. MULTIPROCESS MEMORY MODELS Most models of human memory currently being developed make the assumption that memory is a multiprocess system (Norman, 1969). There is little doubt that there exists some very-short-term sensory memory (lasting less than 1 second) as part of the perceptual processing system (Sperling, 1960; Neisser, 1967). However, the replica theory is not addressed to the processes involved in the input and coding of sensory information, in short, with the creation of the replica ;it is rather a theory of postperceptual memory. Similarly, the retrieval of information from memory after long periods of time is probably more complex than the simple retrieval processes considered here. One example is the “tip of the tongue” phenomenon studied by Brown and McNeill (1966). The existence of such complex retrieval processes should not be taken as evidence for any fundamental discontinuity between short- and long-term memory with regard t o the properties of the memory system. This chapter has been limited t o discussion of experimental situations in which retrieval from memory is extremely easy. However, in the discussion of confidence ratings, the replica theory was seen to be perfectly compatible with a workable theory for a more complex retrieval problem. The prime issue concerns the assumption of a separate short-term memory store, whether calledprimary memory (Waugh & Norman, 1965) or a bufle. (Atkinson & Shiffrin, 1968), having different properties from a longer-term memory system, with both systems involved in performance in short-term memory or simple verbal learning tasks. The important question is simply whether or not it is necessary t o postulate more than one such memory system in order to account for data from experiments in this area, or indeed for any data at all. It has been shown in this chapter that the single-store replica theory is more than adequate in handling short-term memory and simple verbal learning data. Elsewhere (Bernbach, 1969), it has been shown that the replica theory is adequate to handle some data that have been claimed to be direct evidence for separate short- and long-term memory systems. As long as the replica theory continues to explain data such as these, one may consider the widespread assumption that memory is multiprocess to be a premature retreat from parsimony. REFERENCES Atkinson, R. C., Bower, G. H., & Crothers, E. J. Introduction to mathematicd learning theory. New York: Academic Press, 1965. Atkinson, R. C., Brelsford, J. W., Jr., & Shiffrin, R. M. Multiprocess models for memory with applications to a continuous presentation task. Journal of M a t h ematical Psychology, 1967, 4, 277-300.
238
Harley A. Bernbach
Atkinson, R. C., & Crothers, E. J. A comparison of paired-associate learning models having different acquisition and retention axioms. J m r n a l of Mathematical Psychology, 1964,1, 285-315. Atkinson, R. C., Hansen, D. I?,, & Bernbach, H. A. Short-termmemorywith young children. Psychonomic Science, 1964, 1, 255-256. Atkinson, R. C., & Shiffrin, R. M. Human memory: A proposed system and its control processes. I n K. W. Spence & J. T. Spence (Eds.), The psychology of learning and motivation: Advances in research and theory. Vol. 2. New Pork: Academic Press, 1968. Pp. 89-195. Bernbach, H. A. A forgetting model for paired-associate learning. Journal of Mathematical Psychology, 1965, 2, 128-144. Bernbach, H. A. Decision processes in memory. Psychological Review, 1967, 74, 462-480. (a) Bernbach, H. A. The effect of labels on short-term memory for colors with nursery school children. Psychonomic Science, 1967,7, 149-150. (b) Bernbach, H. A. A single-store model for post-perceptual memory. I n D. A. Norman (Ed.), Models of human memory. New York: Academic Press, 1969. Brelsford, J. W., Jr., & Atkinson, R. C. Recall of paired-associates as a function of overt and covert rehearsal procedures. Paper presented at the meeting of the Psychonomic Society, St. Louis, October 1966. Brown, R., & McNeill, D. The “tip of the tongue” phenomenon. Journal of Verbal Learning and Verbal Behavior, 1966,5, 325-337. Crowder, R. G. Short-term memory for words with a perceptual-motor interpolated activity. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 753-761. Feller, W. An introduction to probability theory and its applications. Vol. 1. New York: Wiley, 1957. Greeno, J. G. Paired-associate learning with short-term retention : Mathematical analysis and data regarding identification of parameters. Journal of Mathematical Psychology, 1967, 4, 430-472. Hellyer, S. Supplementary report : Frequency of stimulus presentation and shortterm decrement in recall. Journal of Experimental Psychology, 1962, 64, 650. Melton, A. W. Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 1-21. Murdock, B. B., Jr. The retention of individual items. Journal of Experimental Psychology, 1961,62, 618-625. Murdock, B. B., Jr. Short-term memory and paired-associate learning. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 320-328. (a) Murdock, B. B., Jr. Short-term retention of single paired-associates. Journal of Experimental Psychology, 1963, 65, 433-443. (b) Murdock, B. B., Jr. Theoretical note: A fixed-point model for short-term memory. Journal of Mathematical Psychology, 1967, 4, 501-506. Neisser, U. Cognitive p,sychology. New York: Appleton-Century-Crofts, 1967. Norman, D. A. (Ed.) Models of human memory. New York: Academic Press, 1969. Peterson, L. R., &Peterson, M. J. Short-term retention of individual verbal items. Journal of Experimental Psychology, 1959, 58, 193-198. Peterson, L. R., & Peterson, M. J. Minimal paired-associate learning. Journal of .Experimental Psychology, 1962, 63, 521-527. Peterson, L. R., Saltzman, D., Hillner, K., & Land, V. Recency and frequency in paired-associate learning. Journal of Experimental Psychology, 1962, 63, 396403.
Replication Processes in Human Memory and Learning
239
Phillips, J. L., Shiffrin, R. M., & Atkinson, R. C. The effects of list length on short-term memory. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 303-31 1. Posner, M. I., & Rossman, E. Effect of size and location of informational transforms upon short-term retention. Journal of Experimental Psychology, 1965, 70, 496-505. Sperling, G. The information available in brief visual presentations. Psychology Monographs, 1960, 74, No. 11 (Whole No. 498). Waugh, N. C., & Norman, D. A. Primary memory. Psychological Review, 1965,72, 89-104. Weiner, B. Effects of motivation on the availability and retrieval of memory traces. Psychological Bulletin, 1966, 65, 24-37. Weiner, B., & Walker, E. L. Motivational factors in short-term retention. Journal of Experimental Psychology, 1966, 71, 190-193. Wickelgren, W. A., & Norman, D. A. Strength models and serial position in shortterm recognition memory. Journal of Mathematical Psychology, 1966,3,316-347.
This Page Intentionally Left Blank
EXPERIMENTAL ANALYSIS O F LEARNING TO LEARN' Leo Postman UNIVERSITY OF CALIFORNIA BERKELEY, CALIFORNIA
I. Introduction ............................................ 11. The Role of Warm-up. ................................... A. Warm-up versus Learning to Learn. .................... 13. Warm-up and Retention.. ............................. C. An Experimental Investigation of the Role of Warm-up in Recall ............................................... 111. Two-Stage Analysis of Nonspecific Transfer. . . . . . . . . . . . . . . . . . A. Response-Integration Skills ............................ B. Transfer as a Function of Method of Practice and Class of Materials ............................................ IV. Whole versus Part Learning.. ............................. A. TheExperiment ...................................... B. Implications ......................................... V. Acquisition of Transfer Skills. ............................. A. Rule-Governed Behavior in Specific Transfer Paradigms.. .. B. Changes in Transfer as a Function of Practice.. ........... VI. The Effects of Practice on Recall.. ......................... A. Retroactive Inhibition as a Function of Stage of Practice.. . B. Learning to Learn and Proactive Inhibition. . . . . . . . . . . . . . VII. Conclusions ............................................. References..............................................
241 242 242 248 250 256 257 260 263 266 272 273 273 274 285 286 293 295 296
I. Introduction I n experimental studies of human learning it is conventional to divide sources of transfer into two major classes: specific and nonspecific. Specific transfer effects depend on manipulated similarity relations between the components of successive tasks, as in the acquisition of new responses to old stimuli. Nonspecific transfer effects are assumed to be independent of such similarity relations and hence are attributed to the development of higher-order habits or learning skills. The classification This chapter represents a substantially expanded version of the writer's presidential address to the Western Psychological Association in 1968. The preparation of the chapter was facilitated by Grant M H 12006 from the National Institutes of Mental Health. The writer thanks Janat Parker, Marian Schwartz, and Rose Zacks for experimental assistance. 241
242
Leo Postman
of transfer as nonspecific constitutes, however, a definition by default. While the changes in performance subsumed under this heading may not represent the carrying over of specific discriminations or responses from one task to the next, they must, nevertheless, reflect circumscribed habits and skills which are subject to experimental manipulation and analysis. The guiding rationale of the present studies is that there is no fundamental discontinuity between the principles of specific and nonspecific transfer. Each learning task calls for a hierarchy of habits and skills. These habits and skills vary with respect to the range of situations to which they can be applied once they have been acquired. I t is only in this sense that components of transfer fall on a dimension of specificitynonspecificity : They must in principle be assumed to be equally specifiable and, in the absence of evidence to the contrary, governed by the same laws.
11. The Role of Warm-up A. WARM-UPVERSUS LEARNING T O LEARN The usual point of departure in the analysis of nonspecific transfer has been the distinction between warm-up and learning to learn, both of which are assumed to contribute to the improvement in performance as the learner moves from one task to the next. The concept of warm-up is rooted in the analysis of motor learning and refers to the development of the postural set required for the efficient performance of a given task (Ammons, 1947). The set is built up and maintained as long as practice continues but dissipates during rest intervals. The application of the concept to verbal learning carried over an emphasis on perceptual-motor set as a determinant of performance (Irion, 1948). Thus, in the rotelearning situation warm-up denotes the establishment of appropriate adjustments for the reception of stimuli and €or an optimal rhythm of responding. These attentional and postural adjustments not only facilitate the performance of the prescribed responses under a given experimental arrangement, but also provide a feedback of stimulation which becomes part of the distinctive context of the learning activity. As in the case of motor performance, the set dissipates when the individual leaves the situation and must be reestablished when he returns to practice a new task. By contrast, learning to learn has been taken to denote the acquisition of higher-order habits and skills which produce more or less permanent changes in the subject’s mode of attack on the task (Thune, 1951). The ability t o generate effective mediators and other useful mnemonic devices falls under this heading. The distinction between warm-up and learning t o learn is thus based on two major criteria: (1) the nature of the dispositions that serve to
Experimental Analysis of Learning to Learn
243
facilitate performance, and (2) the degree of temporal persistence after the termination of practice. The difference between perceptual-motor adjustments and learning skills represents a classification that may or may not be analytically useful. Given such a classification, however, the question of how transitory the transfer effects of the two classes of dispositions are must be answered empirically rather than on a priori grounds. The fact that improvements in performance are greater within than between experimental sessions (Thune, 1951) does not necessarily validate the criterion of temporal persistence. What is carried over from one session to the next may include perceptual-motor adjustments that have become conditioned to the experimental situation. On the other hand, the instrumental habits that constitute learning to learn are undoubtedly subject to forgetting. There is empirical support for the latter supposition (Newton & Wickens, 1956, Experiment 11).It is not possible, therefore, to differentiate between the two components of nonspecific transfer on the basis of temporal persistence ; instead, their contributions must be determined by systematic manipulation of the conditions of training. Either a learning or a nonlearning task can function as a warm-up activity as long as it serves to establish the perceptual-motor set appropriate to the test task. A relevant nonlearning task constitutes a pure warm-up activity; the performance of a learning task provides an opportunity for both warm-up and learning to learn to occur. The difference between the amounts of nonspecific transfer produced by the two types of activity provides an estimate of the contribution to the observed improvement of learning to learn over and above that of warm-up. The results of a classical study by Thune (1950),in which such a comparison was made, led to the conclusion that warm-up has much greater weight than learning to learn in increasing efficiency of performance during a given experimental session. Specifically, it was found that speed of paired-associate learning benefited as much from a warm-up task that involved no learning, viz., color guessing, as it did from prior practice on another paired-associate list. However, all the subjects in the experiment had had experience in paired-associate learning on the previous day. It is possible, therefore, that the experimental treatments interacted with the effects of the earlier learning experience. For example, the warmup task may have been as effective as it was because it served t o reinstate habits acquired during the previous verbal practice. To assess the relative contribution of the two components of nonspecific transfer in the initial stages of improvement, it is necessary to compare the effectiveness of a pure warm-up activity and of a learning task with naive subjects. The results of such a study by Schwenn and Postman (1967) suggested a rather drastic change in the evaluation of the role of warm-up.
Leo Postman
244
The transfer task was the acquisition of a list of paired associates, with familiar adjectives as both stimuli and responses. There were four experimental treatments which represented the factorial combination of two kinds and two levels of prior training. The preliminary activity consisted of either s learning task or a nonlearning task. The learning task (LT) was the acquisition of a list of paired associates unrelated to those in the transfer list. The nonlearning task (NLT) was a pure warm-up activity, viz., number guessing, which simulated as closely as possible the experimental arrangements of paired-associate practice but did not provide any opportunity for learning. The two levels of prior practice were 4 and 10 trials. The four experimental treatments will be referred
gt 8
+
n
E
b31
LT-4
+-- L T - I 0
2
I -
O---
NLT-4 NLT-I0
x**-.-*
Control
to as LT-4, LT-10, NLT-4, and NLT-10, respectively. A control group received no prior training. Figure 1shows the performance of the various groups on the common transfer task. The main findings are as follows: ( 1 ) The warm-up activity produces no facilitation whatsoever at the lower level of practice but does have a small positive effect at the higher level. (2) Facilitation by a prior learning activity is clearly apparent at both levels of practice and builds up far more rapidly than do the pure warm-up effects. Nonspecific transfer effects appear to grow at a negatively accelerated rate as a function of learning experience, and at a positively accelerated rate as a function of warm-up. Some of the charac-
Experimental Analysis of Learning to Learn
245
teristics of the changes in performance under the two conditions of training will now be examined. In Condition LT the negatively accelerated increase in performance as a function of degree of prior practice indicates a rapid approach to the limit of improvement. This conclusion is supported by an analysis of the relation between the initial level of performance and the amount of gain produced by the training experience. For purposes of this analysis, the correlation was determined between the number of correct responses on the first three anticipation trials during the acquisition of the training list (this score was available for both Groups LT-4 and LT-10) and the amount of improvement shown on the corresponding trials during testlist learning. Thus, if s’s score was 10 for the first list, and 15 for the second list, the measures used in this calculation were 10 and +5. The values of r were -.57 ( p < .05) and - .70( p < .01) for Groups LT-4 and LT-10, respectively. It should be noted that the large majority of SS showed an improvement, but the gains were greater for initially slow than fast learners. This constraint on the degree of improvement does not represent a ceiling effect in the strict sense of the term; on the test-list trials in question, the scores of the fastest Ss were of the order of 50%. Rather, there appears to be a limit on the nonspecific practice effects under the present experimental arrangement. It would not be correct to conclude, however, that fast Ss derive less benefit from prior practice than do slow Ss; rather, the opportunity to manifest improvement is more restricted for the former than for the latter (cf. Duncan, 1960). Given the negative correlation between initial level of performance and amount of gain, practice should serve to reduce within-group variability. Table I shows the standard deviations as well as the means of the numbers of correct responses given during the 10 trials of test-list learning. In spite of the increases in mean scores, the standard deviation is equal to that of the control for Group LT-4 and falls below it for Group LT-10. By contrast, the standard deviations of both NLT groups exceed those of the control group. A t the higher level of practice, the variances for the two experimental treatments differ significantly, P(15,15) = 2.83, p < .05. The greater variability of the test-list scores under the NLT than under the control condition suggests that the effect of the warm-up activity was beneficial €or some Ss and detrimental for others. Such a pattern of changes would be expected if the training activity were a source of both facilitation and interference, with the relative weight of the two components varying from one S to the next. The fact that the mean gains in performance are a positively accelerated function of the number of training trials is consistent with this interpretation, if it is assumed that the positive component builds up at a faster rate than the
Leo Postman
246
negative one. The change in task characteristics, i.e., the shift from a guessing to a learning activity, is a probable source of interference which is gradually outweighed by the development of an appropriate set. It is reasonable to suppose that these components carry differential weight for individual Ss; the more initial difficulty S experiences in adjusting to the physical features and the temporal constraints of the experimental procedure, the greater should be the perceptual-motor component of nonspecific transfer. The changes in overt error rate produced by the two types of training tasks are summarized in Table I. The mean percentages of misplaced responses to the base of opportunities (total presentations minus correct responses) are shown.2 The within-group correlations between the numbers of correct responses and the percentages of correct responses are also listed. The mean error rates are highest in Condition NLT and lowest in Condition LT, with the control group occupying an intermediate TABLE I PERFORMANCE MEASURESIN TEST-LISTLEARNING", ~
Condition Control NLT-4 NLT-10 LT-4 LT-10
~~
Mean correct, trials 1-10
SD
Percent errors
SD
Correlation
41.7 41.0 51.8 57.9 65.8
15.4 21.8 20.0 15.8 11.9
21.6 26.6 25.2 15.9 17.2
12.5 15.0 13.2 11.1 10.4
-.09 -.09 .02 -.37 -.63 ~~
Schwenn and Postman, 1967. I n this analysis and subsequent ones, percentages were transformed to arcsines.
position. The variation among conditions is significant, F (2,75) = 4.32, p < .02. The within-group correlation is essentially zero for the naive Ss in the control group. The correlation remains near zero after warm-up training, but becomes increasingly negative after practice on a learning task. In considering the implications of these findings, we must recognize that the error rate is influenced both by the amount of competition Percentages to the base of total emissions were presented in the original report of this study. Percentages to the base of opportunities, which can vary independently of the number of correct responses, were used here for purposes of obtaining correlations with level of performance.
Experimental Analysis of Learning to Learn
247
between correct and incorrect associations and S’s criterion for overt responding. The concept of criterion here refers essentially to the degree of subjective certainty about the correctness of an association which must be present before S is prepared to respond overtly (cf. Murdock, 1966). Since the repertoire of required responses usually becomes available to the learner well before the correct associations are established (Underwood & Schulz, 1960), the level of overt responding may vary widely as a function of S’s criterion. Both factors may contribute to the reduction in error rate in Condition LT: The development of efficient modes of attack on the learning task entails a reduction in intralist interferences and also disposes Ss to raise their criterion so as to minimize the occurrence of errors. The increase in error rate in Condition NLT suggests that the warm-up activity served to lower the criterion level. It is not likely that prior warm-up leads to an increase in associative interference. However, the guessing task used in the training phase required a high rate of responding which was apparently carried over to the test list. Maintenance of a high level of responding implies the adoption of a relatively lax criterion. Consider now the trends in within-group correlations. The lack of correlation between error rate and level of performance for naive Ss indicates that the frequency of misplaced responses reflects primarily variations in the criterion rather than in susceptibility to interference. The emergence of a pronounced negative correlation in the test phase for Condition LT suggests that experienced Ss who find the test task easy are disposed to adopt a stringent criterion in the interest of avoiding errors. Subjects who were fast to begin with as well as initially slow ones whose performance shows a major improvement fall into this category. On the other hand, the Ss who find the test task difficult, primarily slow Ss who fail to improve, adopt a lax criterion because they are motivated t o attempt as many responses as possible in order to achieve an acceptable level of performance. The argument then is that in the absence of prior experience with the task requirements the choice of criterion is unrelated to ability ; given such experience, fast learners are more likely than slow ones to adopt a stringent criterion so as to minimize errors. The performance of the warm-up task does not provide relevant experience. Hence, the correlation between error rate and performance remains at zero. As a result of the guessing requirement during the training phase, there is a general tendency toward relaxation of the criterion for both fast and slow learners. The lack of correlation between level of performance and error rate in Condition NLT is consistent with the findingof Underwood ( 1952) that color naming during intertrial intervals raised the error rate but that there was zero correlation between the latter and speed of learning. Thus, shifts in the criterion appear to be jointly
248
Leo Postman
determined by S’s ability level and the nature of his prior experience in the experimental situation. The results are contrary to the assumption that warm-up represents a major component of nonspecific transfer, at least in the early stages of improvement. Rather, the gains in performance of naive SSappear to depend primarily on the skills relevant to the verbal characteristics of the task, e.g., the readiness to make representational or implicit associative responses to the items in the list, the construction of mediational links, the optimal utilization of the available learning time for rehearsal, and the discrimination between correct and incorrect alternative responses. Such habits appear to be acquired or, if they are already in S’s repertoire, reactivated rapidly in the context of a learning situation. On the other hand, a task permitting only the development of perceptual-motor adjustments to the physical arrangements of the experimental situation results in a very limited amount of facilitation. Our conclusion is clearly at variance with that of Thune (1950))who found equal amounts of nonspecific transfer from a learning and a nonlearning activity. A basic difference between the two studies is that Thune’s Ss had prior experience with a learning task, whereas the Ss in the present experiment were naive. For the naive S , the opportunity to develop an effective mode of attack on the learning task is an essential condition of improvement; unless he is able to do so, warm-up per se has little functional value. As the learner becomes more experienced, warm-up activities may gain in effectiveness because they serve to reinstate the total complex of habits relevant to the mastery of a new task. Thus, the relative effectiveness of different training procedures in producing nonspecific transfer may depend critically on the amount of s’s prior experience with learning tasks.
B. WARM-UPAND RETENTION One of the defining characteristics of warm-up is rapid dissipation over time once S leaves the experimental situation. The consequent loss of set may contribute to the decline in performance typically observed on delayed tests of recall. It follows that reinstatement of the set by means of a warm-up activity should reduce the amount of forgetting. This hypothesis was tested in a well-known experiment by Irion (1949). A list of 15 paired adjectives was learned for 10 trials and relearned either immediately or after an interval of 24 hours. When the delayed test was preceded by a trial of color naming designed to reinstate the original performance set, recall was significantly higher than in the absence of such a preliminary warm-up activity. In fact, Ss given a delayed test after warm-up did not differ from those tested immediately. These results
Experimental Analysis of Learning to Learn
249
implied that much if not all of the retention loss for materials learned in the laboratory is attributable to loss of set. Such a conclusion would, of course, call for a drastic change in theoretical interpretations of forgetting. However, attempts to replicate Irion’s empirical findings have proved unsuccessful. I n spite of careful attempts to duplicate the conditions of the original study, Rockway and Duncan (1952) failed to obtain a difference in delayed recall as a function of prior warm-up. Furthermore, neither the amount of warm-up nor the degree of similarity between the response requirements in the learning task and in the warmup activity influenced the level of retention loss. In a subsequent study, Dinner and Duncan (1959) did observe a positive effect of warm-up on recall, but only when the degree of original learning was higher than in E o n ’ s experiment. The conditions of warm-up prior to recall were varied in a number of ways in a recent series of experiments by Lazar (1967), but the results remained almost entirely negative. I n light of the empirical failures, a reexamination of the rationale of Irion’s experiment is called for. The prediction of a beneficial effect of warm-up on recall stems from the assumption that the dissipation of the appropriate perceptual-motor set contributes significantly to the losses normally observed on delayed tests of retention. There is little independent support for this assumption ; in fact, there is some indirect evidence against it. If warm-up decrement were a major determinant of forgetting, there should be a clear divergence of the retention functions for paced and for unpaced tests since the former would presumably depend more heavily than the latter on the integrity of S’s perceptualmotor set. Actually the scores obtained on the two types of test do not differ greatly and follow the same temporal course (Houston, 1966). The differences that are found are not necessarily attributable to loss of warm-up since more responses with long latencies will be given on an unpaced than on a paced test. I n any event, the failure ofthe functions to diverge substantially indicates that loss of warm-up is not a critical factor in long-term forgetting, even under conditions of paced testing. There is another way in which a loss of set may be conducive to forgetting, viz., a decline in S’s disposition to limit himself to the repertoire of required responses. During acquisition, such a response set is established rapidly. Exposure to the learning materials appears t o activate a selector mechanism (Underwood & Schulz, 1960) which leads S to restrict himself to the prescribed responses. It appears that response terms are activated selectively by virtue of the recency of their occurrence in the experimental situation. Once the appropriate repertoire has been established, it continues to be dominant as long as S remains in the experimental situation. Thus, a selective response set becomes as-
250
Leo Postman
sociated with the experimental ont text.^ This set will dissipate after the termination of practice but will presumably be reinstated when S returns for a delayed test of recall. The more complete the reinstatement of the original set, the more fully will the availability of the repertoire of required responses be restored. The possibility of a beneficial effect of warm-up on recall can now be viewed in a new perspective : The question becomes whether or not an activity that serves to reinstate the appropriate selective response set will serve to raise the level of recall. This question shifts the emphasis from the perceptual-motor characteristics of performance to the strictly verbal requirements of the experimental task. An activity such as color naming that is introduced for the first time just prior to the retention test cannot be expected to reinstate the appropriate selective set. However, if such a procedure were made a part of the original learning procedure and then were reintroduced before the delayed test, it may contribute to the reinstatement of an effective set. This possibility was examined in an experiment which will be described in the next section.
C. AN EXPERIMENTAL INVESTIGATION OF THE ROLEOF WARM-UP IN RECALL 1. Design
The temporal locus of a warm-up activity was manipulated in a 2 x 2 factorial design. Such an activity was either present or absent at the time of original learning, and the same was true at the time of recall 48 hours later. With N designating the absence, and W the presence, of warm-up, the four treatment combinations were N-N, N-W, W-N, and W-W. The conditions of learning and of testing are indicated by the first and second letter, respectively. Reinstatement of the response set at the time of recall should give an advantage to Condition W-W over Condition N-N. The inclusion of the remaining two conditions makes it possible to evaluate separately the effect of the experimental treatment at each temporal locus as well as the interaction. 2. Learning Task
The learning task consisted of a list of 12 paired associates, with single letters as stimuli and two-syllable adjectives as responses. There were three lists, and two pairings of the stimulus and response terms per list. The six different lists were used equally often. Learning was by the anticipation method at a 1 : l-second rate. Performance at this fast rate was expected to be highly sensitive to manipulations of set. The lists were A selective response set as defined here should be distinguished clearly from the perceptual-motorset implied by the concept of warm-up.
Experimental Analysis of Learning to Learn
25 1
presented in four different random orders. Original learning comprised a study trial and six anticipation trials. 3 . Warm-up Activity
The warm-up activity was the same as that used in the study by Schwenn and Postman (1967). The ostensive purpose of the procedure was to determine how closely S’s pattern of guesses would correspond to a purely random arrangement of items, the items being the numbers 1 through 4 inclusive. The S was instructed to guess a number from 1 to 4 each time a cue symbol (#) appeared in the window of the drum. He was informed that the shutter would lift shortly thereafter so that he would be able to compare his guesses with the sequence on the tape. The instructions made it clear that the order of numbers was strictly random, so that numbers could be repeated within a trial. A different random sequence was used on each trial. In order to integrate the two activities as closely as possible, number guessing preceded performance of the paired-associate task on each trial, with the rate of presentation kept constant throughout. At the start of a trial the word NUMBERS appeared on the tape, indicating that the guessing phase was about to begin. After six guesses had been completed, the appearance of the instruction WORDS signaled the shift to the paired-associate phase. (The latter instruction also appeared at the beginning of each trial in Condition N. Under both conditions the appropriate instruction was presented during the 2-second intertrial interval.) This procedure was followed in the warm-up conditions both in original learning and in the test of recall. It will be noted that owing to the interpolation of the number-guessing activity the intervals between successive paired-associate trials proper was 12 seconds longer in Conditions W than in Conditions N. Distribution per se is necessarily confounded with the nature of the activity used to fill the intertrial intervals ; the net effect on performance at a single temporal locus can be assessed in Conditions W-N and N-W. 4. Recall Test
Retention of the paired-associate list was tested 48 hours after the end of original learning. There were five paced-recalltrials. The responses were not shown during the test, i.e., when the shutter lifted at the end of the anticipation interval the window of the memory drum remained blank. 5 . Subjects
The Ss were undergraduate students at the University of California who were naive to paired-associate learning. There were 24 Ss in each of the four groups. Assignment to conditions was in blocks of four, with
Leo Postman
252
one S per group in each block. The running orders within blocks were determined by means of a table of random numbers, as were the assignments to the different lists, subject to the restrictions of balancing. 6. Acquisition
The mean numbers of correct responses given during the six trials of original learning are presented in Table 11. There were no reliable differences between the groups treated alike in acquisition. The mean scores are higher in Condition N than in Condition W, but not significantly so, F (1,94) = 2.70, p > .05. Successive probability analyses were carried out to determine the expected recall scores following the two kinds of practice. As Table I1 shows, Condition N again exceeds Condition W. The difference between the mean expected values is significant, F TABLE I1 EFFECTS OF WARM-UP : MEAN MEASURESOF ACQUISITION
Condition
Total correct
Expected recall
Percent misplaced responses
N-N N-W
19.92 20.67 15.96 15.83
4.07 4.60 3.31 3.32
6.42 12.89 19.47 17.71
W-N
w-w
(1,94) = 8 . 2 2 , < ~ .01. It can be concluded that theinterspersed warm-up activity had a detrimental effect on acquisition. That finding is noteworthy in view of the presumed importance of an appropriate perceptualmotor set under a very rapid exposure rate. It appears that the interference produced by the requirement to shift periodically from one task to another outweighed whatever advantage was derived from the setinducing activity. The possibility of negative effects exists whenever the shift from the warm-up to the formal learning activity entails a change in task characteristics. The present experimental arrangements make such effects clearly apparent. The possibility cannot be ruled out, however, that it was the increase in the duration of the intertrial interval rather than the shift from one task to another that was responsible for the depression of performance in Condition W. If so, the conclusion regarding the limited effectiveness of warm-up per se would still hold. As in the earlier experiment, the warm-up activity served to increase the rate of overt errors in learning. The mean percentages of misplaced responses (Table 11) are significantly higher in Condition W than in Condition N, F (1,94) = 1 3 . 6 8 , < ~ .01. Again, the high rate of responding
Experimental Analysis of Learning to Learn
253
established during the warm-up phase is carried over to the learning task. The within-group correlations between error rate and the number of correct responses are not significant and are of the same order of magnitude under both conditions : .25 and .16 €or Conditions N and W, respectively. Thus, the results of the error analysis parallel those of the previous study in all essential respects. 7. Recall
The results of the recall test are summarized in Table 111. To take account of variations in degree of original learning, loss scores were determined for each S by subtracting the obtained from the expected score. TABLE I11 EFFECTS OF WARM-UP : MEAN MEASURESOF RECALL PERFORMANCE
Condition
Number recalled
Loss score
Percent misplaced responses, trials 1-5
N-N N-W W-N
2.45 3.16 2.37 2.31
1.62 1.44 .94 .95
8.06 13.67 12.32 11.07
w-w
The amounts lost are somewhat lower when original learning included warm-up practice than when it did not, but this difference is not significant, F (1,92) = 2.97, p > .05. The introduction of the warm-up activity at the time of recall has no influence on performance, nor is there any evidence for an interaction of the conditions of training and testing, both F s < 1.00. Since Conditions N-N and N-W do hot differ, Irion’s finding again fails to be duplicated. The expected superiority of Condition W-W over the remaining groups has failed to materialize. A systematic trend emerges, however, when performance on successive recall trials is considered. Figure 2 presents the relevant data. The retention losses decline progressively for all groups. The improvement is clearly greater for Condition W-W than for the other three groups that show closely comparable changes. The overall linear trend is significant, F (1,92) = 41.86, p < .01. While the interaction of the conditions of training and testing in the trend analysis falls short of significance, orthogonal comparisons confirm the superiority of Condition W-W over the remaining groups, F (1,92) = 6.16, p < .02, and the absence of reliable differences among the latter, F < 1. Thus, the prediction that Condition W-W will have an advantage on the retention test is supported,
Leo Postman
254
but only with respect to the amount of gain on successive recall trials. Condition W-W was expected to have an advantage at recall on the assumption that the performanceof the warm-up activity would facilitate the reinstatement of S’s selective response set. The delayed emergence of the predicted difference indicates that such facilitative effects developed gradually, i.e., it required more than a single warm-up trial to reestablish fully the context that had obtained during original learning. -.25
0 .25
g
i 50
5:
g .75 c
0
r” 1.00 1.25
-
PT’
N-W
.a*
1.50
f
On the terminal test trial, the amount recalled in Condition W-W slightly exceeds the expected value. Thus, the limit of possible improvement appears to have been reached. As Table I11 shows, there is no systematic variation in the percentages of misplaced responses given during the recall trials, all Fs < 1.00. Comparison of the percentages in recall and in original learning shows a decline in the error rates for the groups given warm-up practice during acquisition, but little change for those trained without warm-up. The treatment at the time of recall did not influence the error rate. The difference between the percentage of errors in original learning and in recall
Experimental Analysis of Learning to Learn
255
was determined for each S; the distributions of difference scores were then subjected to an analysis of variance. The amount of change was significantly greater for Conditions W-N and W-W than for Conditions N-N and N-W, F (1,92) = 7.22, p < .01. None of the other sources of variation was significant. Thus, the pronounced effect of the warm-up activity on error rate evident during acquisition is no longer present during recall. Apparently, all Ss tend to adopt a relatively high criterion on a test of recall (cf. Murdock, 1966); moreover, on a delayed test without feedback, responses are less available than during acquisition so that it is difficult to maintain a high rate of responding. As in original learning, the correlations between error rate and number of correct responses are not significant in recall. One additional result of the error analysis is of interest. As usual, the importation of responses from outside the list was extremely rare in original learning. The total numbers of such errors given during the six acquisition trials were 1, I , 3, and 5 for Conditions N-N, N-W, W-W, and W-N, respectively. The corresponding frequencies for the five recall trials were 1 , 4 2 , 3 3 , and 43. Thus, all groups that engaged in the warm-up activity at least once show a striking increase in the number of extralist intrusions at recall. Such a high level of importations is rarely observed in the recall of a single list. The fast presentation rate limits S’s opportunity to edit his responses but cannot by itself be responsible since there are virtually no importations in Condition N-N. It is more likely that the alternation between the warm-up activity and the learning task, in conjunction with the fast rate, weakens S’s set to limit himself strictly to the responses within the list. This effect becomes first apparent on the test of recall when the correct responses no longer have the advantage of recency of presentation. The main conclusions from the results of this experiment may now be summarized. (1)A warm-up activity that requires alternation between tasks and is conducive to a low criterion of overt responding has an adverse effect on speed of acquisition. ( 2 ) The performance of such a warm-up task at the time of recall does not in itself reduce the amount of retention loss. (3) The reintroduction during a delayed test of retention of a warm-up activity practiced in the course of acquisition favors progressive improvements in recall as the test procedure is continued. The recurrence of an activity that had been part of the context of acquisition is believed to facilitate the reinstatement of S’s set to limit himself to the repertoire of required responses. There is nothing in the pattern of results pointing to a significant influence of perceptual-motor adjustments as such. In general, the available evidence casts doubt on the assumption that the establishment of an appropriate perceptual-motor set is a major
256
Leo Postman
component of nonspecific transfer, or that the dissipation of such a set contributes appreciably to retention losses on delayed tests of recall. At this point, it may be useful to raise the question of whether or not it is reasonable to postulate the development of an effective perceptualmotor set that is independent of the learning requirements of a given task. For example, as the trends in error rates indicate, the optimal level of responding in the learning situation may be different from that established by the warm-up activity. The appropriate rhythm of responding may, therefore, have to be established in the context of the learning situation itself and in addition is likely to change as practice progresses. The same considerations apply to attentional adjustments. During the acquisition of a list, S must not only conform to the explicit response requirements but must also give time and attention to such implicit activities as rehearsal, searching for mnemonic devices, and editing of responses. Typically warm-up tasks, e.g., color naming or number guessing, call for quite different forms of information processing. Thus, the pattern of implicit activities relevant to the mastery of a learning task is not likely to be established during a prior warm-up period. It is not suggested that perceptual-motor adjustments do not contribute to learning proficiency, especially under condition of rapid pacing ;what is being called into the question is the efficacy of a nonlearning activity in developing such adjustments. Finally, in the assessment of the role of warm-up in recall, it is important to distinguish between the reactivation of a perceptual-motor set and the reinstatement of the context that obtained during original learning. The experiment reported above yielded evidence of only the latter source of facilitation. III. Two-Stage Analysis of Nonspecific Transfer
We turn now to the main body of our experiments which is concerned with the analysis of learning to learn proper. The focus of these studies is on the identification of the component skills that are responsible for the increases in the proficiency with which successive learning tasks are performed. Our approach has been to devise conditions of transfer in which particular assumptions about the components of learning to learn could be subjected to experimental test. For purposes of identifying potential sources of nonspecific transfer, let us consider the habits and skills that are likely to come into play during the acquisition of a single list of paired associates. It has proved useful to think of the process of acquisition as comprising two stages, viz., response learning and associative learning (Underwood & Schulz, 1960). If the required responses are not already in S’s repertoire, they are integrated during the first of these stages and become units that can
Experimental Analysis of Learning to Learn
257
enter into new associations. If the individual responses are familiar, the S must learn to restrict himself to the items in the list. Thus, the repertoire of appropriate responses becomes available during the first stage ; during the second stage the prescribed associations between stimulus and response terms are established. The stage analysis of the process of acquisition a t once suggests a corresponding division of the components of learning to learn into two broad classes related to response learning and associative learning, respectively.
A. RESPONSE-INTEGRATION SKILLS The speed with which response integration is accomplished may be expected to increase as a function of practice. The task of integrating new response units such as trigrams may be regarded as akin t o serial learning. That is, the prescribed sequence of elements in a trigram constitutes a miniature serial list. As a result of prior experience with such materials, the S should develop effective means of learning such sequences, for example associating the successive letters in a forward order, making use of positional cues, using coding devices which permit ready decoding, and so on. When interest centers on the assessment of these particular skills, it is necessary to hold as constant as possible the improvement attributable to other components, e.g., increased eEciency in the associative stage. With this purpose in mind, the following design was used in an experiment by Postman, Keppel, and Zacks (1968). The learning materials were lists of six paired associates, with numbers as stimuli and trigrams as responses. The trigrams were of either high (H)or low (L) meaningfulness as defined by the probability of the letter sequences in written English. Each experimental S learned in succession two lists that conformed to the A-B, C-D paradigm. That is, the stimuli in the two lists were different, and there was minimal overlap of the elements in the response terms. The experimental treatments consisted of the four possible combinations of lists-HH, HL, LH, and LL. It was expected that lists of low meaningfulness would provide more effective training in response integration and also would be more sensitive t o the effects of such training. To each experimental group there corresponded a control group whose first list consisted of six paired associates with numbers as stimuli and highly available responses, viz., names of the days of the week. The Ss in the experimental and control groups were yoked: A control S received exactly the same number of trials on the first list as the experimental S with whom he was yoked. The control and the experimental Ss then learned the same second list. The control groups provide an empirical baseline for the amounts of gain resulting from factors other than practice in response integration. Learning on List 1 under the experimental treatments was to a criterion of one perfect
Leo Postman
258
recitation plus one additional trial; List 2 was learned to the same criterion or for 10 trials, whichever took longer. Figure 3 shows the acquisition curves for the test lists. I n each condition the experimental group surpasses the control group. It will be noted that the differential transfer effects emerge gradually. It is likely that under the experimental treatments specific interferences from the H-H
L- H
-x o.--
L- L
2
4
6
8
1
Experimental Control
H-L
0
Trial
FIG.3. Test-list performance for experimental groups learning various combinations of first and secondlists and for their yoked control groups. From Postman, Keppel, & Zacks (1968).
prior lists had to be overcome before the positive effects of practice could manifest themselves fully. Owing to the limited number of letters in the alphabet, such specific interferences cannot be avoided completely when successive lists of trigrams are learned. When the total performance in the test stage is considered, neither first-list nor second-list meaningfulness is found to have a reliable influence on the amount of transfer. The expected trends are present, i.e., there is an apparent increase in transfer effects when a list of low rather than high meaningfulness is used in either the training or the test phase. However, the differences are small and fall short of significance. A reasonable inference from this finding is that response-integration skills are readily generalizable over a wide range of meaningfulness. The higher-order habits developed through practice are those relevant to the establishment of any novel response units from outside S’s repertoire. Alternatively, the meaningfulness of the response terms may influence
Experimental Analysis of Learning to Learn
259
the pattern of habits developed during the training phase, but under each condition of practice the net transfer effects may be equivalent for the two types of test task. Such an interpretation gains some plausibility when it is recognized that skills pertaining t o response integration and to the associative learning of newly integrated units are likely to develop concurrently. Gains in the associative component should be greater after practice on a list of high than of low meaningfulness ; the reverse should be true for the component of response integration proper. To the extent that both sources of facilitation are effective in the test stage, there would be only small differences in the net amounts of nonspecific transfer. These considerations make it clear that it is difficult to delimit sharply the contribution of increased skill in response integration per se whenever the training and the test tasks require associative learning. However, since the amount of prior practice was strictly equated for the experimental and the control groups, the net advantage of the former shows at the very least that the ability t o integrate and associate novel response units increases as a function of relevant experience. This conclusion is strengthened by the fact' that such nonspecific transfer apparently outweighed the opposing effects of specific interference. Finally, the use of lists of high and of low meaningfulness made it possible to examine the influence of task difficulty on the distribution of gain scores. As noted previously, there is typically a negative correlation between initial speed of learning and amount of subsequent gain which reflects the operation of a ceiling effect. The opportunity for fast Xs to show improvement should increase as the test task becomes more difficult ; moreover, the effectiveness of practice may become increasingly dependent on X's ability. In accordance with this expectation, the correlation between speed of first-list learning to criterion and the amount of improvement indexed by the reduction in the number of trials to criterion on the second list drops from -,63 ( p < .01) for Condition H-H to -.33 ( p < .05) for Condition L-L.4The greater the number of Ss whose initial scores are near the ceiling of performance, the higher is the correlation. The constraint on the possible amount of improvement serves to reduce the variability of the scores in the test stage, and this fact will in turn be reflected in the correlation between first-list and second-list criterion scores. This correlation is .37 for Condition H-H and .66 ( p < .01) for Condition L-L. Such a difference between the two conditions was to be expected on the basis of the degree of relation between initial scores and amount of improvement. More generally, it is worth noting that low correlations between scores on successive lists need not imply a lack of Note that the correlation is between initial speed of learning and amount of improvement, i.e., the lower the number of trials to criterion on the first list the less reduction there is in the number of trials on the second list.
Leo Postman
260
communality between tasks; rather, they may be a consequence of differential gains by fast and by slow 19s.
B. TRANSFER AS MATERIALS
A
FUNCTION OF METHODOF PRACTICE AND CLASS OF
The remainder of our studies will focus on the development of associative skills. Here the opportunities for improvement are potentially great, for there is a wide variety of mediating and coding devices on which the individual can call in order to build up the prescribed associative linkages. I n the experiments that follow, the materials in the test phase were always familiar English words, so that the component of response learning was minimized and whatever improvement was observed could be attributed to practice in the formation of associations. We asked first to what extent associative skills are specific to the method of practice and the type of material used in training. In an experiment directed at this question (Postman & Schwartz, 1964), there were four conditions of training which represented the factorial combination of two methods of practice-paired-associate (PA) and serial (S) learning-and two classes of materials, viz., adjectives (A) and trigrams (T). These tasks will be designated as PA(A), PA(T), S(A), and S(T),respectively. There were two transfer tasks: a serial list or a pairedassociate list, with adjectives as the materials in both, i.e., PA(A) and S(A). Thus, for a given transfer task there were two conditions in which the method of practice remained the same in the training and the test phase, and two conditions in which it changed. Similarly, there were two conditions in which the class of materials remained the same and two in which it changed. There was no prior training in the control condition. The results for the paired-associate transfer task are shown in Fig. 4. The scores plotted are trials to successive criteria on the test list. The major findings are as follows : (1) As measured against the control baseline, all conditions of training produced increases in speed of acquisition. (2) The improvement in performance was greater when the method of practice remained the same than when it changed ;in this case, the groups with prior experience in paired-associate learning had an advantage. (3) Practice also tended to be more effective when the class of materials was kept constant, although this factor had less weight than the method of practice. The results here supplement those described earlier for response integration. (4) The differential effects of the conditions of training emerged relatively late in transfer and thus appear to have influenced primarily the associative stage. The picture was rather different when the transfer test was on serial learning. The results will not be reviewed in detail, but the two main findings may be summarized briefly. First, while all conditions of training
Experimental Analysis of Learning to Learn
261
produced substantial improvements in performance, the differences among the experimental treatments were small and appeared very late in learning. Second, internal analysis of the data, and in particular of serial-position effects, showed that the type of pretraining did influence the Ss' mode of attack on the serial task if not the overall speed of acquisition. Subjects with paired-associate training gave evidence of subdividing the difficult central part of the list into functional stimulusresponse pairs, thus continuing to apply the method of associative learning they had been practicing. Subjects with serial pretraining distributed their efforts more evenly over all items. These divergencies in mode of attack on the serial task were, however, uncorrelated with the efficiency of performance. The latter finding exemplifies a general principle that may constrain the manifestation of differential transfer effects : The more varied the modes of atta.ck that can be used in the performance
15
----
Control
P---
S(T) S(A)
,,
+-
0 .-
x"
L
/
P
....... PA(T) PA (A)
10-
o
r"
1
2
3
4
5
6
7
8
9
10
Successive criteria
FIG.4. Mean trials to successive criteria in the acquisition of a paired-associate test list after various types of prior training. From Postman and Schwartz (1964).
of the transfer task, the less likely we are to detect systematic effects on speed of learning produced by different conditions of prior practice. That is, if a given transfer task such as serial learning can be mastered with approximately equal efficiency in several different ways, then there will be several conditions of training that will result in comparable levels of performance. An analysis of individual differences in practice effects supports some of the general conclusions suggested earlier. The limits on improvement imposed by the ceiling of performance are again clearly in evidence. The
262
Leo Postman
correlation, determined as before, between the initial speed of learning and the amount of improvement shown in the acquisition of the test list is -.92 for the group learning two successivelists of paired adjectives and - .82 for the group learning two serial lists of adjectives. As expected, the correlations between the first-list and second-list criterion scores are not high- .44 and .30, respectively, p < .05 for both. It was observed previously that the correlation between error rate and speed of learning is initially near zero but becomes negative for experienced Ss. The data obtained in the present experiment provided an opportunity to check on the generality of this finding, and also to determine whether this shift in correlation depends on prior performance of the actual test task or will result from other learning experiences as well. The correlations between the percentages of misplaced responses and speed of learning for the test list are listed in Table IV. Consider the TABLE IV
MEAN PERCENTAGES OF MISPLACEDRESPONSES IN TEST-LIST LEARNING AND CORRELATIONSBETWEEN ERROR RATEAND SPEEDOF LEARNING' ~
Test list PA(A)
~
~~~
Test list S(A)
Training list
Percent
r
Percent
r
Control PA(A) PAW S(4 S(T)
17.5 7.8 12.9 10.3 16.6
-.02 -.61b -.40 -.22 -.32
29.4 31.3 24.2 23.4 24.4
.05 .34 .48c -.44 .16
Postman and Schwartz, 1964.
*p
< .01.
p < .05.
paired-associate test list first. Under the control condition, i.e., without prior training, the correlation is essentially zero. The correlation is negative for all experimental groups with prior learning experience. However, the highest coefficient and the only significant one ( p < .01) is obtained when the conditions of training and testing remain the same. That is also the treatment that shows the greatest reduction in the percentage of misplaced responses. It was suggested above that the inverse relation between speed of learning and error rate reflects the adoption of a relatively high criterion by fast learners. For the serial test task, speed of learning and error rate are again unrelated under the control condition. The expected inverse relation is
Experimental Analysis of Learning to Learn
263
present when the training and the test task are the same, but barely misses significance at the .05 level. Rather surprisingly, the correlations are consistently positive in the other conditions, i.e., the higher the error rate the greater the speed of acquisition. It is possible only to speculate about the difference in the pattern of correlations for the two test lists. As was suggested earlier, the serial task permits the learner greater latitude in mode of attack than does the paired-associate procedure. In the face of a change in task requirements, fast Ss may be more prone than slow ones t o explore new methods of practice during the test stage and, in the course of doing so, adopt a relatively lax criterion. We now return to the general implications of this experiment. The fact that for both test lists all conditions of pretraining produced significant amounts of improvement indicates that there are important communalities among the various tasks which probably comprise such fundamental associative mechanisms as the establishment of stable mediating links between successive items. However, the transfer effects are also specific to the method of practice, and to a somewhat lesser extent, to the class of materials. These differential transfer effects reflect the development of circumscribed habits which are carried over from one task to another.
IV. Whole versus Part Learning The next experiment, to be reported here in full, investigated the influence of practice on learning by the part and the whole methods. An examination of the defining characteristics of the two methods will serve to bring out the systematic implications of the problem. The classical question of interest has been whether it is more efficient to practice a task as a whole, or to divide the materials into parts and to combine the parts after each of them has been learned individually. The relative efficiency of the two methods will depend on the balance of positive and negative factors inherent in the part as compared to the whole procedure : ( 1 ) Since the difficulty of a task increases with length, the sum of the times spent in the acquisition of the individual parts will be less than the time required for the mastery of the intact whole. Thus, the length-difficulty relationship favors the part method. ( 2 ) This advantage will be reduced by the negative effects of interpart interferences. Specifically,there will be retroactive inhibition of the early parts during the acquisition of the later ones, and interferences among the parts may be expected to develop during the combination stage. There is no a priori basis for predicting whether in any given case the net outcome of these opposing influences will be favorable or unfavorable to the part method. A general prediction is generated, however, by an extension to the
264
Leo Postman
whole-part problem of the total-time hypothesis (cf. Cooper & Pantle, 1967). This hypothesis asserts that for a given task the amount learned remains invariant with the total practice time and independent of the manner in which the time is partitioned. Conversely, a fixed amount of learning time is required for the attainment of a given criterion, again regardless of the manner in which the time is partitioned. It follows that there should be no difference in speed of acquisition between the whole and the part method. With respect to the sources of facilitation and interference in part learning, the implication is that the deficit to be removed in the combination stage will be proportional to the saving in time during the acquisition of the individual parts and will remain SO regardless of the overall difficulty of the materials. Put somewhat differently, the total-time hypothesis suggests that the amounts of response learning and associative learning required for mastery of a list of given length are essentially fixed by the characteristics of the materials and remain unchanged for different subdivisions of the total task into units of practice. It may, indeed, be argued that the limited capacity of the learner makes part learning inevitable even under the whole procedure and that the mastery of the total task always requires the elimination of interpart interferences. Thus, the experimentally manipulated differences between whole and part learning may to some extent be nominal rather than functional. As for empirically observed differences between the two methods in verbal rote learning, much of the early work yielded contradictory or inconclusive results (McGeoch, 1931))which may probably be attributed to deficienciesof method, and the question received relatively little attention for a number of years. In a recent series of experiments, we have reexamined the differences between the whole and the part method with a view to testing the generality of the total-time principle and exploring some of the conditions under which it may be expected to break down. The main findings are as follows : (1)With materials ranging over a wide range difficulty, the part method has a small but consistent advantage in serial anticipation learning (Postman & Goggin, 1964). The difference in favor of the part procedure appears to be largely a function of the associative strength which accrues to the initial and terminal items of the individual parts; as a consequence, the relative difficulty of the central part of the list is less in the combination stage than in whole learning. (2) In paired-associate learning, with materials again ranging over a wide range of difficulty, the difference between the two methods remains essentially invariant at or near zero (Postman & Goggin, 1966). (3) The study just cited also shows, however, that the whole method is at a substantial disadvantage when it is contrasted with the repetitivepart rather than the pure-part method. Under the pure-part procedure,
Experimental Analysis of Learning to Learn
265
the individual parts are learned separately and then combined; under the repetitive-part procedure, parts learned earlier continue to be practiced as new ones are added. The superiority of the latter is attributable to the reduction in interpart interferences : Retroactive inhibition of the earlier parts by the interpolated learning of the subsequent ones is minimized, and differentiation among the component parts on the basis of their relative frequency of occurrence in the situation is facilitated. In general, then, the results conform to the total-time hypothesis inasmuch as gains derived from the length-difficulty relationship are counteracted by interpart interferences. Deviations from the principle of total-time invariance occur to the extent that the experimental arrangements favor the reduction of the interferences that are characteristic of the part procedure in its pure form. The localized effects specific to serial learning and the contrast between the pure-part and repetitivepart procedures are both cases in point. In light of this analysis, what influence should relevant prior experience have on part and on whole learning? In part learning, the major potential source of improvement is a reduction in the duration of the combination stage since the acquisition of the individual parts is likely to be relatively rapid to begin with. The critical skills t o be acquired, then, are those pertinent to the combination stage. Thus, S may adopt the useful habit of rehearsing the earlier parts during the acquisition of subsequent ones (thereby functionally approximating the repetitivepart procedure), and he may mitigate the difficulty of the combination stage by shifting whenever necessary from coding devices which were useful in the acquisition of individual parts to ones more appropriate to the total task. The probable importance of such reorganization of the materials in the final phase is indicated by Tulving's (1966) demonstration that there is negative transfer in free-recall learning from prior practice trials on a part of the test list. I t must also be recognized that if there are increases in the degree of learning of the individual parts, resistance to interpart interference will be enhanced. Improvements in part learning may then be expected as a function of practice and should stem primarily from a reduction in the relative difficultyof the combination stage. It does not follow, of course, that the efficiencyof part relative to whole learning is likely to increase with the stage of practice. Whole learning will improve as well; the task is the same as in t'he combination stage of part learning and the opportunities for progressive gains are at least as great. In fact, as Ss become more experienced, whatever limited differences are present initially may become less pronounced. Thus, the total-time hypothesis may be expected to apply more consistently to experienced than to naive Ss because the
266
Leo Postman
latter make more effective use than the former of the available study time under a given experimental arrangement.
A. THE EXPERIMENT In the present experiment, the effects of practice on the relative efficiencyof the whole and the part method were evaluated for serial anticipation learning. The serial task was chosen because on the basis of previous findings the part method was expected to be superior to the whole method in the initial stages of practice. Given a replication of this result there would, therefore, be an opportunity to determine whether or not the deviation from the total-time principle becomes minimized with practice. 1. Design Each of the experimental treatments comprised a training phase and a test phase. The training phase consisted of the acquisition of three successive serial lists. Half the Ss learned the three training lists by the whole (W) method, and the other half of the Ss by the part (P)method. The test phase consisted of the acquisition of a final fourth list. The method of learning in the test phase remained the same as in training for half theSs and was changed to the alternative procedure for the other half of the Ss. Thus, there were four conditions defined by all the possible combinations of methods in the successive stages of the experiment. These will be designated as W-W, W-P, P-P, and P-W. The first letter indicates the method for the training lists, and the second letter that €or the test list. The subdivision of the groups in the test phase was made with a view to determining the extent to which the learning skills developed through practice were specific to the method used in the training phase. 2. Materials The learning materials were four serial lists of 10 CVC trigrams each, with mean association values ranging from 89.6 to 93.3 according to Krueger’s (1934) norms. Thus, the meaningfulness of the items was uniformly high. Intralist similarity was minimized. Within a given list there were no duplications of either first or third letters. There were two consonants that occurred twice within the list, once as a first letter and once as a third letter. Five vowels were used twice each in the middle positions. Interlist similarity was kept as low as possible. In the construction of the four lists, each of 20 consonants was used twice as a first letter, and twice as a third letter. No sequence of two letters was used more than once, and no two trigrams had both consonants in common.
Experimental Analysis of Learning to Learn
267
For purposes of part learning each list was divided into two halves
(PIand P2).The two five-item parts of a list had no consonants in common; each vowel was used once in PI and once in P,. The two halves were used as the part learned first (PI)and as the part learned second (P,) equally often. The sequence of the parts in the combination stage was PIP, for half the Ss, and P2P1for the other half of the Ss. The order in which the lists were presented was balanced so that all lists occurred equally often in each of the four ordinal positions. There were four different list orders each of which occurred four times per condition. In the selection of these orders among the possible ones, an attempt was made to minimize duplication of letters between adjacent lists. 3 . Procedure
Both whole and part learning were by serial anticipation at a 2-second rate, with an %second intertrial interval. Learning was always to a criterion of one perfect anticipation of either the part or the whole list. All SSwere given conventional instructions for anticipation learning. The Ss in the part groups were in addition given full information about the order of events in their condition. The second part was presented 30 seconds after the attainment of the criterion on the first part, and the combination list 1 minute, 15 seconds after the end of practice on the second part. A study trial was given on each part but not in the combination stage so that recall of the two parts could be measured on the first anticipation trial. For all conditions, the interval between lists in the training phase was 1 minute, 30 seconds, and that between the third training list and the test list was 2 minutes, 30 seconds. The test phase was, of course, introduced by appropriate instructions. To facilitate differentiation between the parts two different starting symbols (* and # ) were used, one for P, and one for P,. I n the combination stage, the starting symbol was that of the part occurring first in the list. The two starting symbols were used equally often in the W conditions. 4. Subjects
There were 16 Ss in each of the four groups. Assignment to conditions was in blocks of four, with 1 S from each condition per block. The running orders within blocks were determined by means of a table of random numbers, subject to balancing restrictions. The Ss were undergraduate students at the University of California fulfilling a course requirement. Subjects who had participated during the current semester in experiments using nonsense materials or the method of serial anticipation were excluded.
Leo Postman
268
5 . Measures of Performance Since trials do not have comparable meanings in the acquisition of parts and of a whole list, speed of learning was measured in terms of the total number of presentations (number of items times number of trials). Thus, learning time is used as the dependent variable throughout. The following specific measures will appear in the analysis : T, = learning time by the whole method; T,,and T,, = learning times for the first and second part, respectively, under the part method; D = difference between T , and T,, TP2,indexing the advantage derived from the reduction in list length during the acquisition of the successive parts ; T, = duration of the combination stage ; T , = T,,+ T p 2+ T,. T , - T , thus measures the difference in speed of learning by the two methods.
+
6. First-List Learning
The mean measures of performance for the three successive lists learned in the training phase are presented in Table V. The data for the TABLE V MEANMEASURES OF LEARNING TIMEFOR LISTSLEARNED IN PHASE
THE
TRAINING
Part Gain
Whole
List
T,
T,,
TP8
T p 1 +T P z
D
Tc
T,
1 2 3
156.88 108.75 101.56
21.25 20.00 19.69
24.22 23.44 24.38
45.47 43.44 44.06
111.41 65.31 57.50
77.50 52.50 50.94
122.97 95.94 95.00
(Tv/--T,)
33.91 12.81 6.56
groups treated alike during this phase (P-Pand P-W; W-W and W-P) were closely comparable and have been combined. The results for the h s t list confirm the earlier finding of greater overall speed of acquisition under the part than under the whole method (Postman & Goggin, 1964). The value of D is sufficiently large not to be offset entirely by the amount of time required for completion of the combination stage. The difference in total learning time (T, - T,) is significant, F (1, 62) = 5.72, p < .02. The mean percentages of correct responses to the base of total emissions are given in Table VI. These percentages reflect the probabilities of 8 s ’ being correct if they attempted to make an overt response. This measure again shows Condition P to be superior to Condition W in firstlist learning, F ( 1, 62) = 19.03, p < .01. Errors of commission owing to failures of response differentiation and associative interference become more likely as the length of the task increases, and they also occur wit,h
Experimental Analysis of Learning to Learn
269
greater relative frequency on the early than on the late trials of acquisition. Since the total list is practiced from the very beginning in Condition W and only in the combination stage in Condition P, a higher overall error rate is to be expected for the former than for the latter. I t is not possible to say to what extent the use of different criteria for overt responding contributes to the inequality of the error rates. It would be plausible to assume, however, that Ss in Condition P who perform an easy task in the initial stage of practice adopt a laxer criterion than AS'S in Condition W. On this assumption, the difference in error rate would be even more pronounced if the same criterion were used by both groups. TABLE VI MEAN PERCENTAGES OF CORRECT RESPONSES TO THE BASEOF TOTAL EMISSIONS (TRAINING PHASE)
1 2 3
83.5 86.7 89.8
89.8 92.6 92.5
Further analysis supports the previous conclusion that the advantage of the part condition is attributable in large measure to a reduction in the difficulty of the central part of the list during the combination stage. The results will not be reported in detail, but it may be noted that the percentage of all errors (including failures to respond) made at Positions 5 and 6 is 17.4 for the combination stage of part learning and 24.0 for whole learning. The deviation from the principle of total-time invariance is significant but appears to represent a localized effect specific to serial learning. 7. Practice Gains in the Training Phase
As Table V shows, performance improved under both conditions of practice during the training phase. In Condition P, the values of T,, + T p Ishow little change during the acquisition of the three successive lists, F < 1, whereas there is a pronounced decline in T,.For the latter measure, F (2,62) = 6.01, p < .01. Orthogonal comparisons show that the difference between List 1 and Lists 2 + 3 is significant at the .01 level, whereas the latter do not differ reliably. It is apparent that the initial speed of acquisition of the parts is sufficiently great to minimize the possibility of practice gains. Thus, the reduction in learning time is limited entirely to the combination stage. For Condition W, the major
270
Leo Postman
improvement again occurs between the first and the second list, with a much smaller reduction in learning time thereafter. For T,, the value of F (2,62) is 15.67, p < .01. As in the other condition, the difference between List 1 and Lists 2 + 3 is significant at the .01 level, whereas that between the latter two is not reliable. The amount of improvement during the training phase is greater for Condition W than for Condition P. As may be seen in Table V, the two conditions converge because D declines more rapidly than T,.That is, the length-difficulty relationship as indexed by D becomes less pronounced as a function of practice, and the reduction in the duration of the combination stage is not sufficiently great to maintain the initial differencebetween the two methods. The interaction of stage of practice (List 1 versus Lists 2 + 3) with the condition of learning is significant at the .05 level, F (1,124) = 3.92. The convergence of the two conditions is also apparent when the percentages of correct responses to the base of total emissions are considered (Table VI). Note, however, that for this measure the differential practice effects do not become evident until third-list learning. The orthogonal comparison that is significant ( p < .05) is €or the interaction of the condition of learning with the difference between List 2 and List 3. Thus, the difference in error rate is reduced more slowly than that in overall speed of acquisition. While the differencein total learning time becomes smaller as a function of stage of practice, Condition P apparently retains its task-specific advantage, viz., a reduction in the difficulty of the central part of the list. The percentages of errors at Positions 5 and 6 remain lower in the combination stage of the part condition than during whole learning. The values for List 2 and List 3, cespectively, are 20.6 and 19.5 in the combination stage, 25.7 and 25.0 in whole learning. Thus, there is no substantial change in the magnitude of difference (17.4 versus 24.0) observed for List 1. However, as rate of acquisition increases, this localized difference carries less and less weight in determining total learning time. There are indications that the practice gains occurred in spite of the accumulation of some interlist interferences. This conclusion is suggested by the trend in scores for the first trial of the combination stage in Condition P. Since that stage was not preceded by a study trial, the scores reflect the amount retained after acquisition of the individual parts. The mean scores for the three successive training lists were as follows, with the values for PI and P, given in parentheses in that order : List 1, 5.43 (1.84, 3.59); List 2, 6.28 (2.03, 4.25); List 3, 5.53 (1.91, 3.62). The large and significant difference between P, and PI reflects the retroactive effect of the interpolated learning of the second part on the retention
Experimental Analysis of Learning to Learn
27 1
of the fist. The overall level of recall increases between List 1 and List 2 and then declines to the initial level for List 3. A studentized range test shows that stage of practice has a significant influence on recall, q (2,62) = 3.83, p .< .05. It is reasonable to suppose that the changes in the recall scores represent the net balance of practice gains and cumulative proactive effects.The trend in the scores indicates that the former develop more rapidly than the latter but that they essentially cancel each other as training continues. The occurrence of overt interlist intrusions, while infrequent as usual ( 2 for List 2 and 5 for List 3)) provides direct evidence of the buildup of interlist interference. These intrusions were all from corresponding parts of the total list. The interlist interference may be taken to be quite transitory since the duration of the combination stage for List 3 remains substa.ntially shorter than that for List 1. At least some specific interference is likely to develop whenever the same class of materials is used in successive lists. For trigrams and other nonsense materials, formal interitem similarity is unavoidable ; for words, the meaningful similarities among items become increasingly probable as new lists are added. To the extent that specific interitem similarity is conducive to interference, the positive effects of practice per se are likely to be systematically underestimated.
Test Phase
8.
The measures of performance for the final test list are presented in Table VII. The convergence between the two methods of practice remains clearly in evidence. In contrast to the results obtained for List 1, TABLE V I I
MEAN MEASURESOF LE~RNING TIMEFOR
THE
TESTLIST
Test Part
D
T,
Tp
Gain (T,-T,)
54.06 60.63
55.62 42.50
96.56 80.62
-1.56 18.13
Whole
Training
T,
~~~~
~
W P
95.00 98.75
TP, Tpx+Tm
TP, 19.69 17.19
~
21.25 20.94
40.94 38.12
there is no longer a significant difference between the whole and the part conditions, F < 1. There is a suggestion that training in whole learning is generally more effective, and especially so when the test list is learned by the whole method as well. However, the method of training is not a
272
Leo Postman
significant source of variance nor does it interact with the condition of testing, both F s < 1. The absence of method-specific practice effects is readily understandable in view of the fact that the improvement in Condition P during the training phase was found t o derive entirely from a reduction in the length of the combination stage. The task in this stage is exactly the same as in Condition W ; hence, the habits and skills developed during the training phase under one of the procedures should be readily generalizable to the alternative method. During the acquisition of the test list the mean percentages of correct responses to the base of total emissions were as follows: W-W, 89.3; P-W, 85.7; P-P, 92.9; W-P, 92.8. The difference between the two test procedures is reliable, P ( 1, 60) = 10.04, p < .01. The F s are less than 1.0 for both the condition of training and the interaction. It should be noted, however, that the percentage for W-W is closely comparable to that obtained for the third training list in Condition W, whereas P-W falls slightly below the second training list. Thus, the measure appears t o show some sensitivity to the shift in method. The main finding of interest is, however, that there is a persistent difference between the two methods in error rate but not in speed of acquisition. Another difference which remains in evidence during test-phase learning is that between the serial position curves in whole learning and in the combination stage of the part condition. The percentages of errors a t Positions 5 and 6 are 26.7, 22.6, 19.0, and 18.2 for groups W-W, P-W, P-P, and W-P, respectively. As in the case of the error rates, the difference between the first two percentages indicates a systematic influence of the condition of prior training: Xs with prior experience in Condition P may have been disposed to subdivide the total list into functional parts. B. IMPLICATIONS Some general implications of the results of this experiment deserve emphasis. ( 1 ) The principle of total-time invariance may be more generally valid for experienced than for naive Ss. The practiced X can make optimal use of the learning time available under a given experimental schedule. As a consequence, sources of differential facilitation and interference inherent in particular methods come t o carry only limited weight in determining the overall efficiency of performance. (2) Practice may serve to equalize the amounts of learning time required under alternative procedures without eliminating the distinctive characteristics of performance associated with each. It follows that even in the absence of differences in speed of learning a t an advanced stage of practice such alternative methods cannot be considered as functionally equivalent in all respecis. There may, for example, remain differences between them
Experimental Analysis of Learning to Learn
273
in susceptibility t o subsequent interference and forgetting. ( 3 ) Practice effects generalize readily between procedures that require similar skills for efficient performance. Moreover, differences in the conditions of training may be reflected in specific features of test performance without influencing the overall speed of acquisition. The same conclusion was reached earlier with respect to differential practice effects on serial anticipation learning.
V. Acquisition of Transfer Skills We turn now to an examination of those higher-order skills that come into play when efficiency of performance depends on the learner’s ability to respond appropriately to relationships between successive tasks. Such situations are exemplified by the paradigms of specific transfer where facilitation and interference depend on the similarity relations between the stimuli and responses in the successive lists. The general question asked in our experiments was whether or not repeated experience with transfer tasks would enhance X’s effectiveness in responding t o such similarity relations. Would he learn to recognize and use the defining rules of the experimental paradigms so as t o maximize positive transfer and t o minimize negative transfer! A. RULE-GOVERNED BEHAVIOR IN SPECIFIC TRANSFER PARADIGMS The conventional paradigms of specific transfer provide substantial opportunities for rule-governed behavior. Each paradigm entails a rule for responding which can in principle be used as a guide to performance, although Ss performing a single transfer task appear to show little disposition to use such rules consistently (cf. Postman, 1966; Twedt & Underwood, 1959). There are rules of inclusion and of exclusion that delimit the class of correct and of incorrect responses, respectively. The rules may be stimulus-specific or apply only to the list as a whole. When the stimuli in the successive lists are identical or similar, the rule of restriction is stimulus-specific and &heresponse that is appropriate or that is to be withheld is uniquely identified for each individual stimulus. When the successive stimulus terms are unrelated, there is only a list rule which specifies the class of responses to be used or t o be eliminated. The rules for the most commonly used experimental paradigms of specific transfer may be summarized as follows : I n A-B, C-B the stimuli in the successive lists are unrelated but the responses remain the same. The list rule of inclusion calls for the continued use of the old responses in transfer. I n A - B , A-B‘ the stimuli are identical and the responses are synonymous or associatively related. The stimulus-specific rule of inclusion is t o give to each A a synonym or associate of B.
274
Leo Postman
A-B, C-D is the control paradigm in which both the stimulus and the response terms are unrelated. There is a list rule of exclusion, Viz., t o avoid the class of old responses. A -B , A-C is the classical paradigm of negative transfer in which the stimuli remain the same but the reponses are unrelated. The operating rule of exclusion is to avoid the old response t o each of the stimuli. It may be noted here that a stimulus-specific rule always implies a list rule, but the converse obviously does not hold. Finally, in A - B , A-Br old stimuli and old responses are re-paired. There is list rule of inclusion-the old class of responses must continue to be used-and a stimulus-specific rule of exclusion, viz., the old stimulus-response pairings must be avoided. A rule of inclusion, which tells the learner what to do, should be more beneficial than a rule of exclusion which identifies potential errors but does not specify correct responses. Furthermore, a stimulus-specific rule provides more information, and hence should be more useful, than a mere list rule. Thus, the experienced learner should derive his greatest advantage from whatever ability he acquires to recognize and apply stimulus-specific rules of inclusion. B. CHANGES IN TRANSFER AS A FUNCTION OF PRACTICE 1. Gains in Transfer Performance I n our initial study investigating the acquisition of transfer skills (Postman, 1964), each S was taken through three successive cycles of transfer. Each cycle consisted of the acquisition of a set of two lists which conformed t o a paradigm of specific transfer. Four paradigms were used: A-B, C-D; A-B, A-C; A-B, A-B’; and A-B, A-Br. For a given S , the paradigm remained the same over the three sets, but different items were, of course, used in each set. The materials were paired adjectives. The first list was always learned t o a criterion of 718, and there were five trials on the second list. The speed of first-list learning increased sharply between the first and second sets and showed little further change in the third set. The amounts of improvement in first-list learning were independent of the experimental treatment. Figure 5 shows the total numbers correct on the five trials of the transfer task for the three sets. All paradigms show greater improvement with practice than does the control condition. As a result, there is a decrease in the amount of negative transfer for A-C, and t o a lesser extent for A-Br. There is also a progressive shift from a small amount of negative transfer to positive transfer for A-B’. The differences in rate of improvement are most obvious on the first transfer trial (Fig. 6). The changes in performance under the A-B‘ paradigm make apparent Ss’ increased ability to implement a stimulus-specific rule of inclusion.
Experimental Analysis of Learning to Learn
275
As can be seen from a comparison of Figs. 6 and 6, the shift toward positive transfer in A-B‘ occurs earlier for the first-trial than for the total scores. The greatest advantage of A-B’ over C-D is characteristically on the early trials of test-list acquisition ; as learning continues, the two ,A -B-’
30 F In 0)
25 0 Q In
E +
0
20
?
5
15
. I -
0
SET I
SET
II
SET
III
FIG.5 . Mean numbers of correct responses in five trials of transfer learning as a function of stage of practice and paradigm. From Postman (1964).
.‘ ..
I
II
.x A- B‘
m
SET
FIG.6. Performance on the first transfer trial as function of stage of practice and paradigm.
276
Leo Postman
paradigms converge and eventually the relation between them is likely to be reversed. On the assumption that the positive transfer effects reflect mediational chaining (A-B-B’), their relative decline in the course of acquisition may be attributed to increasingly frequent failures to distinguish between the mediator and the mediated term as the two associates approach each other in strength. It is only at the highest level of practice (Set 3) that the initial positive effects are sufficiently great to ensure a net advantage in the total scores. There are some indications that partially successful attempts at mediational chaining were made in the A-B, A-C condition of negative transfer. That is, if S were able to establish a link between B and C, he could use the old response to mediate the new one. Two findings in particular suggest such a possibility: (1) There was a small but regular increase over sets in the frequency of paired interlist intrusions, i.e., occurrences of first-list responses to the appropriate stimuli during second-list learning; (2) a test of recall for all lists, given at the end of the experimental session, showed a trend toward an increasing positive relation between the probabilities of recall for the first-list and the second-list response to the same stimulus (B and C associated with the same A). Both these features of performance are characteristic of paradigms of mediational chaining. The evidence for the use of mediational strategies in the A-B, A-C condition is, to be sure, slight and circumstantial ; it must be recognized, however, that the materials used in this condition are selected so as to make such a procedure difficult. To the extent that such strategies were, indeed, attempted, Ss were trying to convert the nominal paradigm of negative transfer into a functionally positive one (cf. Postman, 1963). AS for the fact that there was the least improvement for A-Br, this result falls into place when we realize that this paradigm entails two rules which are difficult to apply simultaneously, viz., to continue using the old class of responses but to eliminate all old A-B associations. The basic findings of this experiment, thus, are that (1) Ss develop skills, presumably the ability to recognize and to apply rules of response restriction, which facilitate the performance of transfer tasks ; and (2) the gains in performance vary with the nature of the experimental paradigm such that there are greater increases in positive transfer than there are decreases in negative transfer. These findings left open, however, an important question of interpretation. The design of the study, in which each S was exposed to only one paradigm, did not permit the conclusion that the various experimental treatments resulted in the development of paradigm-specific skills, i.e., methods of practice peculiarly appropriate t o a given transfer task. It was equally possible that all groups had developed the same highly generalizable skills, such as the ability to recognize and to apply rules of response restriction, but were
Experimental Analysis of Learning to Learn
277
able to implement them with varying degrees of success depending on the specific paradigm. To decide between these alternatives, it became necessary to investigate the transfer of learning skills from one paradigm to another. A study by Keppel and Postman (1966) was directed at this objective. The rationale of this experiment was to give to separate groups of Ss experience with different paradigms and then t o test them all under the same conditions of transfer. If the skills acquired during practice are paradigm-specific, performance in the final common task should be higher when the conditions of transfer remain the same in training and test than when they change. No such differences in the common test situation should be found if highly generalizable skills are developed during
I-
Test on A-B'
8-
Test on A-C
75 6 -
-
? i 5 -
C-D
*--A-C
L
2 4 -
s
&--p---
=3-
A-Br A-B'
C
g 2 I -
O
'
'
1
1
1
1
1
1
1
1
2
4
6
8 1 0
Trials
FIG.7. Performance on the transfer trials in the test phase as a function of the paradigms used in training. From Keppel and Postman (1966).
practice. The training phase comprised two cycles (sets of two lists each) during which a given S gained experience with one particular paradigm. The same four paradigms were used in the training phase as in the previous experiment. There were two final test paradigms, viz., A-B, A-B' and A-B, A-C. Under each of the final conditions, therefore, there was one group for which the training and the test paradigms remained the same, and three groups for which it changed. The materials were those used in the earlier experiment. Figure 7 shows the acquisition curves for the final transfer lists. The results give an unequivocal answer to the experimental question. There were no reliable differences in speed of learning in the final common tests as a function of the conditions of prior training. It follows that the transfer skills that were acquired during the training phase were nonspecific,
278
Leo Postman
or, what amounts to the same thing, they could be generalized from one
paradigm to another without any loss. Closely similar results have been reported by Martin, Simon, and Ditrichs (1966). While the performance in the terminal transfer test clearly fails t o show any differential effects of the conditions of prior training, some paradigm-specific influences were detected on a test of recall for the two final lists. I n this test, which was administered immediately after the completion of the transfer task, the MMFR procedure was used: S was provided with a sheet showing the common stimulus terms (A) and was required to write down the responses that had been associated with each stimulus (B and C; or B and B’). First-list recall was uniformly high for the A-B’ test paradigm. For the A-C test paradigm, on the other hand, the first-list scores not only showed a substantial amount of forgetting but also a systematic influence of the conditions of prior training. Specifically, those trained under paradigms of positive or zero transfer (A-B’ and C-D) recalled more than those trained under paradigms of negative transfer (A-C and A-Br). It is reasonable to suppose that Ss trained under the latter conditions developed a tendency to withhold or suppress first-list responses, which were potentially a persistent source of interference, during second-list learning. As a consequence, the availability of the first-list responses on an immediate test of retention was reduced (cf. Postman, Stark & Fraser, 1968). 2. Individual Differences in the Development of Transfer Skills Since exactly the same materials and procedures were used in the first two cycles of the two experiments discussed above (Postman, 1964; Keppel & Postman, 1966), the available data provided an opportunity to assess individual differences in the development of transfer skills with a substantial sample of Ss. The question of interest is whether the beneficial effects of practice on the performance of the transfer task are related to learning ability. Since the gains between cycles are attributed to increases in Ss’ ability to recognize and to apply paradigmatic rules of response restriction, greater improvement may be expected for fast than for slow 8s. With respect to the development of such higher-order skills, then, the usual inverse relation between initial level of performance and amount of subsequent gain may not hold. In the two experiments taken together, there were 54 S s in each of the four conditions of training defined by the paradigms used in the first two cycles: C-D, A-C, A-B’, and A-Br. For purposes of the present analysis, each of the four combined groups was divided equally into categories of fast and slow learners on the basis of the criterion scores for the first list of the first cycle. The resulting subgroups were closely comparable for the four conditions. Figure 8 shows for the two categories
Experimental Analysis of Learning to Learn
279
of learners the total numbers of correct responses in the acquisition of the successive transfer lists. The amount of change between the first and the second test is summarized in Table VIII. The table also shows the amounts of change in the scores on the first test trial t o permit conFast Ss
Slow
...
30
..ex
/* 25
ss
A- 0' C-D
&A-C 0 '
e
0 '
'5
-.--
0 '
/./A+
15
u
0 1
Cycle
'
2
2
Cycle
FIG.8. Mean numbers of correct responses in five trials of transfer learning for fast and for slow Ss as a function of stage of practice and paradigm.
sideration of early transfer effects as well as of the overall measures of test performance. The pattern of changes, as measured by the total scores, differs substantially for the two groups of learners. For fast Ss, improvement is greater than under the control (C-D) condition in both the A-C and A-B' paradigms but not in A-Br. (The gain for the latter TABLE VIII
MEAN CHANGEIN TEST-LIST SCORES BETWEEN CYCLES1 SLOWLEARNERS
AND
2
FOR
FASTAND
Trials 1-5
Trial 1 Paradigm
Fast Ss
Slow s s
Fast Ss
Slow ss
C-D A-C A-Br A-B'
-.15
.15 .18 .56 1.44
2.81 7.04 2.67 5.56
4.56 1.70 4.78 5.59
.67 .48 1.33
2 80
Leo Postman
paradigm was not reliable in the original study and was not duplicated in the second experiment.) By contrast, slow 8s show an apparent net loss relative to the control condition for A-C, a small gain for A-B’, and a null effect for A-Br. The interaction of ability grouping with paradigm is significant, F (3,208) = 3.82, p < .01. Further comparisons using Dunnett’s test show that for fast learners the amount of improvement is significantly greater than under the control condition for A-C ( p < . O l ) , but not for A-B’. For slow learners, none of the experimental treatments differs significantly from the control. Although some of the same trends are apparent in the first-trial scores (Table VIII), the Ability x Paradigm interaction is not significant in this case. It is important to note, however, that for the first-trial measures the only experimental treatment that differs reliably from C-D is A-B‘ ( p < .01). That difference remains significant at the same level when it is evaluated separately for fast and for slow Ss. The reduction in the relative gain for A-B‘ as indexed by the total scores reflects the characteristic shift in transfer effects under that paradigm which was discussed earlier (cf. Figs. 5 and 6). The present analysis shows that the improvements in transfer performance as measured against the control baseline are greater for fast than for slow Ss. For purposes of interpretation, it is useful to note first of all that slow learners show B greater practice gain than fast ones under the control treatment. This difference is in accord with the usual inverse relation between initial level of performance and amount of improvement. In the control conditions, a series of unrelated lists is learned; in fact, there is no functional distinction between first lists and transfer lists. The only principle of performance available to S is a weak list rule of exclusion-old responses should be avoided. Thus, the gains in the control condition reflect in large measure an improvement in essential rote-learning skills, and the opportunities to register such improvements are greater for slow than for fast Ss. The development of the higher-order skills peculiar to the specific paradigms of transfer represents a more advanced stage of learning to learn which can be viewed, conceptually at least, as following the attainment of the more basic skills required for mastery of the task. The fast Ss are more ready than the slow ones to embark on this phase of learning to learn, i.e., to take increasing advantage of the opportunities for rule-governed behavior afforded by the paradigmatic relations between lists. Two features of the results shown in Fig. 8 and Table VIII warrant some brief additional comment. First, the slow Ss, like the fast ones, show some evidence of developing facility in the use of mediational chains in the A-B’ condition. This paradigm entails a stimulus-specific rule of inclusion which, as was suggested earlier, should be easy to recognize and to apply. Second, there is less improvement for the slow Ss in the
Experimental Analysis of Learning to Learn
281
A-C than in the C-D condition. Thus there is an increase rather than a decrease in negative transfer. While this difference falls short of statistical significance, it represents an interesting deviation from the general trend. One possible explanation is that in the second cycle the first list is learned more rapidly and attains a higher level of strength a t criterion than in the initial cycle. Consequently, the amount of associative interference during second-list learning would be greater in the second than in the first stage of practice. The changes in effective first-list strength would be less for initially fast Ss and clearly are not great enough to mask the substantial positive practice effects. 3. Acquisition of Paradigm-Speci$c Skills
The studies discussed in the preceding section led to the conclusion that the skills developed through the performance of successive transfer tasks were not paradigm-specific. It appeared too early, however, to discard the hypothesis that paradigm-specific skills can in fact be established under appropriate conditions of training. Two possibilities seemed to warrant further exploration. First, the degree to which a S will be led to develop paradigm-specific methods of practice may depend on the utility of such specialized skills for meeting the requirements of the experimental situation. Thus, differential practice effects may be found if the “payoff” for paradigm-specific performance were greater than in the earlier studies. Second, the similarity relations between lists may have been too obvious, at least for the abler Ss, and the appropriate response rules too readily implemented to be sufficiently sensitive to the conditions of prior training. The picture might change if the similarity relations and the response rules were made more complex, so that successful performance would be more heavily dependent on the prior development of appropriate skills. One way to focus the learner more sharply on the similarity relations between successive lists is to add a requirement of recall. For example, if retention of A-B is tested after the end of A-C learning, adherence to the appropriate rules of response restriction will serve to reduce the amount of interference at recall. (The influence of prior training on MMFR performance reported above is consistent with this view.) Moreover, if S comes to expect the test of first-list recall, hs will attempt to maximize the differentiation between successive response repertoires and potentially competing associations. And the most effective means of doing so will, of course, vary with the similarity relations between tasks. There is some empirical support for the expectation that the introduction of retention tests during the training phase would foster the development of paradigm-specific skills. The relevant data to be pre-
Leo Postman
282
sented here come from an investigation of the effects of practice on retroactive inhibition which will be reported in detail below. For present purposes, a very brief summary of the relevant features of the experimental design should be sufficient. The experimental treatments comprised two cycles of training and a final test cycle. I n each cycle, Ss learned two successive lists of paired associates and were then tested for recall of the first list. The learning materials were lists of eight paired adjectives. The first list was learned to a criterion of 718, and the second
A-C Training\
I
1
2
3
4
5
6 7 Trial
8
9
1
0
FIG.9. Performance on an A-C transfer list after A-B learning for naive and for practiced 8 s . The paradigm used in the training phase was either A-C or C-D.
list for 10 trials. The test of recall was unpaced. For one group, the transfer paradigm in the training phase was A-B, A-C, and for the other group it was A-B, C-D. The paradigm in the test phase was A-B, A-C for both groups. Performance in the final test task under the experimental conditions was compared with that of a naive group which had received no prior training. The performance of the two experimental groups and of the control group on the common test task is shown in Fig. 9. Relative to the control condition, the amount of improvement is pronounced when the transfer
Experimental Analysis of Learning to Learn
283
paradigms remain the same in training and test (A-C training), but only slight when they change (C-D training). The mean numbers of correct responses on the 10 trials of test-list learning are 48.88, 50.71, and 56.92 under the control condition, after C-D training, and after A-C training, respectively. The variation among the groups is significant, F (2,69)= 3.47, p < .05. Only the group receiving A-C training differs significantly from the control ( p <: .05) by Dunnett’s test. The differential practice observed here are t o be contrasted with the absence of such effects in the earlier experiment (Keppel & Postman, 1966). No conclusive comparison between the two experiments is, of course, possible since they differed from each other in more than one respect. The results are, however, consistent with the assumption that the introduction of retention tests during the training phase is conducive t o the development of paradigm-specific skills. It was predicted that paradigm-specific behavior would become more likely as the complexity of the rules of response restriction is increased. One situation which may be expected t o satisfy the requirement of sufficient complexity is the three-stage paradigm of mediational chaining ; it also meets the additional criterion that conformance to the appropriate rule of response restriction can greatly facilitate the mastery of the test task. A conventional design used to demonstrate the phenomenon of mediational chaining comprises an experimental and a control condition. Three successive lists of paired associates are learned under each condition. For the experimental treatment, the sequence of lists is A-B, B-C, A-C. Owing t o the presence of the common B term, the acquisition of the final list can be facilitated by the mediational chain A-B-C. For the control treatment, the sequence of lists is A-B, D-C, A-C. I n the absence of the common B term, mediated facilitation cannot occur. The question in which we were interested was whether or not paradigmspecific skills, appropriate t o the experimental and the control treatment, respectively, would be developed through relevant practice. I n our recent study (Postman, 1968), the A terms were single letters and the remaining terms were familiar nouns. The procedure included a training phase and a test phase. The training phase comprised two cycles of either the experimental or the control treatment; these two treatments were also used in the test phase. The design thus included the four possible combinations of training and test treatments : I n two of these, the treatment in training and test remained the same-experimental in one case and control in the other ; in the remaining two cases there was a change from experimental t o control or vice versa. On the assumption that paradigm-specific modes of attack are established during the two cycles of the training phase, performance in the final cycle
Leo Postman
284
should be higher when the conditions remain the same than when they change. Figure 10 shows for each cycle the performance on the third list which permits an assessment of the amount of mediation. For the first two cycles, the combined scores of the groups treated alike are presented. Substantial amounts of mediation are in evidence in both cycles. Speed of learning increases in Cycle 2 under both conditions. The critical finding is that performance in the final test cycle shows a pronounced interaction of the conditions of training and testing. For both treatments, perCycle 2
Cycle 1
Cycle 3
8 c 0
g 6 0 L
$ 4 3 C
c
$ 2
I
IO
1
2
3
4
U 5
--
1
2
3
4
5
1
2
3
4
5
Trial
FIG.10. Nonspecific transfer effects in three-stage mediation : Mean numbers of correct responses in test-list learning for different sequences of experimental and control treatments. From Postman (1968).
formance is clearly better when the conditions remain the same than when they change. It is likely that Ss shifted from the experimental to the control treatment (EEC) made unsuccessful attempts at mediation during the acquisition of the final test list and hence lagged behind group CCC. Similarly, the Ss shifted from the control to the experimental condition (CCE) had probably learned to avoid the previous associates to the test-list terms as potential sources of interference and failed to make full use of the opportunities for mediation, thus falling below group EEE. The retardation produced by a shift in treatment constitutes clear evidence for the development of methods of learning appropriate to particular conditions of transfer. We conclude that experience with successive transfer situations can not only improve the learner's general ability to respond to the similarity relations between tasks, but will also lead to the development of paradigm-specific skills when such skills bear significantly on the test requirements.
Experimental Analysis of Learning to Learn
285
VI. The Effects of Practice on Recall The final problem to be considered concerns the effects of practice on the efficiency of recall. Is there learning to recall as well as learning to learn? In particular, does an experienced S develop methods for coping with interferences among different tasks? In considering these questions, we must make a clear distinction between two possible sources of improvement in recall. First, an experienced S who has come to anticipate a test of recall is likely to use methods of practice that will prepare him for the test and that will serve to reduce intertask interference. There is evidence to show that Ss readily adopt such methods. For example, in studies of retroactive inhibition (RI) naive Ss informed prior to interpolated learning (IL) about a subsequent test of first-list recall will use such strategies as rehearsing old associations during the acquisition of the new ones (Postman & Stark, 1962). When similarities between successive tasks are potential sources of confusion, attempts at list differentiation are likely to be’intensified. The finding mentioned previously that experience with appropriate tests of retention fosters paradigm-specific behavior is consistent with these assumptions. The second possible source of improvement is, of course, the development of efficient methods of recalling. The opportunities for acquiring such methods are necessarily constrained by the characteristics of the retention test. I n free recall, effective output strategies can be developed. For example, the amount of subjective organization in free recall has been shown to increase as a function of practice (Tulving, McNulty, & Ozier, 1965). Other test procedures, however, give the S far less latitude in regulating his output. Thus, in tests of retention for paired-associate lists the responses must be given one a t a time, each to the prescribed stimulus. Even for such situations there are determinants of performance that should be subject to modification by practice. These may include S’s set at the time of recall, and in particular his ability to shift from one repertoire of responses to another, and the establishment of criteria of responding which reduce the probability of errors of commission and omission. Nevertheless, the greater the constraints that the test of retention imposes on output performance, the more likely it is that improvements in recall depend on the adoption of methods of practice that increase the learner’s readiness for the test. Experience with the experimental arrangements that produce R I afford an opportunity for the development of such methods. The conventional procedure consists of the acquisition of two successive lists, followed by a test of recall for the first list. Given knowledge of the sequence of events, the temporal arrangement of the tasks permits
286
Leo Postman
adoption of the procedures to which we referred earlier : rehearsal of the old responses during the acquisition of the new ones and differentiation of potentially competing associations. A study will now be described that investigated the effects on R I of prior experience with the relevant experimental arrangements.
INHIBITION AS A FUNCTION OF STAGE OF PRACTICE A. RETROACTIVE 1. Design The design of the study again comprised a training phase and a test phase. I n each of the two cycles of the training phase, the Ss learned two successive lists of paired associates and were then given an unpaced test of recall for the first list. For one group the two lists conformed to the A-B, A-C paradigm, and for the other group to the A-B, C-D paradigm. I n the test phase, half the Ss in each practice condition were assigned to an experimental group, and the other half to a control group. The experimental treatment in the test phase again consisted of the acquisition of two successive lists followed by an unpaced test of first-list recall. The paradigm of transfer in the test phase was A-B, A-C. The control group learned and recalled a single list. There were in addition a naive experimental and a naive control group which received no training. Comparison of the terminal test performance of the practiced groups with that of the naive groups permitted an evaluation of the effects of the two types of training. The groups practiced under the A-B, A-C paradigm will be designated as P(A-C), and those practiced under the A-B, C-D paradigm as P(C-D). The experimental and the control treatment in the terminal test phase will be identified by the letters E and C, respectively. Thus, there are four practiced groups in the test phase : P(A-C)-E, P(A-C)-C, P(C-D)-E, and P(C-D)-C. The two naive groups will be referred to as N-E and N-C. The design is summarized in Table IX. 2 . Learning Materials
The learning materials were lists of eight paired adjectives. The same lists were used in a previous study in this series (Postman, 1964),in which they were likewise arranged in three successive transfer cycles. The construction of the materials is described in detail in the earlier report. For present purposes, it is sufficient to indicate the procedure used for balancing the lists. There were three sets offour lists each. Each set was used equally often in the three cycles of the practiced groups. Within a given set, each list was the first list for half the Ss and the second list for the other half of the Ss. During the training phase, the groups practiced under the A-C and the C-D condition learned exactly the same lists ; the requirements of the two paradigms were met by appropriate combinations of the lists within a set. Of course. all Ss in both the
Experimental Analysis of Learning to Learn
287
practiced and the naive groups also learned the same lists in the test phase. Thus, all comparisons of transfer performance and first-list recall are based on the same materials. 3. Procedure I n a given transfer-recall cycle the first list was learned to a criterion of 718, and the second list for 10 trials. Learning was by the anticipation method a t a 2 :2-second rate, with a 4-second intertrial interval. The list was presented in four different orders. There was an interval of 1 minute between lists within a cycle. The test of recall began 2 minutes after the end of second-list learning. The interval between the completion of the test of recall and the beginning of a new cycle was also 2 minutes. I n the control treatment (test phase), there was an interval between the T A B L E IX THEEFFECTOF PRACTICE ON RI: SUMMARY OF EXPERIMENTAL DESIGN
Test phase
Training phase Cycle 1 Group N-C
N-E P(A-C)-C P(A-C)-E P(C-D)-C P(C-D)-E
Cycle 2
OL
IL
Recall
OL
_ _
_ -
_ -
A-B A-B A-B A-B
A-C A-C C-D C-D
A-B A-B A-B A-B
A-B A-B A-B A-B
IL
_ A-C A-C C-D C-D
Recall
OL
-
A-B A-B A-B A-B A-B A-B
-
A-B A-B A-B A-B
11,
Recall
-
A-B A-B A-B A-B A-B A-B
A-C -
A-C -
A-C
end of learning and recall equal to that under the experimental treatments. The control Ss worked on a series of arithmetic problems during the retenbion interval. Retention of the first list was tested by a modified free-recall (MFR) procedure. I n the A-B, A-C condition, the appropriate second-list responses were presented along with the stimulus terms a n d S was required to call out the first-list responses. I n the C-D and the control conditions, only the first-list stimuli were presented. Four different orders of stimulus presentation were used equally often. The stimuli were presented one a t a time in the window of the memory drum, and the test was S-paced. After completion of t,he MFR procedure, S was handed his recall protocol and invited to make whatever changes or additions he wished. He was allowed to add previously omitted responses to specific stimuli or to change earlier responses. Space was also provided for writing down responses that X was unable to pair with a particular stimulus.
288
Leo Postman
4. Subjects
There were 24 Ss in each of the six groups (four practiced and two naive). The Ss were undergraduate students at the University of California who were entirely nadve to rote-learning experiments. Assignment to conditions was random, with the restriction that there be n - 1 Ss in each group before the nth S was assigned. 5. Compatibility of Groups
The mean number of trials to criterion on the first list learned during the experimental session was 12.5, with the means of individual groups ranging from 10.8 to 15.2. The variation among groups is not significant, F (5,138) = 1.42, i.e., there are no reliable differences in learning ability. 6. Training Phase
The measures of performance of groups P(A-C) and P(C-D) during the two cycles of the training phase will be considered next. The speed of first-list learning increased between the first and the second cycle under both conditions of training, from 1 1 . 7 to 4.5 for P(A-C) and from 14.5 to 5.3 for P(C-D). The difference between cycles is significant, F (1,94) = 136.81, p < .001 but does not interact with the condition of training, F < 1. Thus, in agreement with the results of earlier studies, the degree of improvement in first-list learning is independent of transfer paradigm. Table X shows the mean numbers of correct responses given on the first 2 and on all 10 trials of interpolated learning during the two cycles of the training phase. Performance is higher under the C-D than under the A-C paradigm. This difference is reliable for the scores on the first two trials, F (1,94) = 4.62, p < .05, but owing to the convergence of the groups on the later trials falls short of significance for the total scores. The test-list scores are higher in the second than in the first cycle, and this increase is significant for both measures, F (1,94) = 4.57 ( p < .05) and 5.39 ( p < .02) for the early and the total scores, respectively. The Paradigm x Cycle interactions are, however, not significant. Thus, Condition P(A-C) does not in the present case show greater improvement than does Condition P(C-D); in fact, there is a trend in the opposite direction. It was suggested earlier that the introduction of a test of firstlist recall may systematically influence s’s approach to the transfer task in subsequent cycles. Specifically, Ss may be led to maintain the availability of first-list associations during the acquisition of the transfer list. Since associative interference is greater in the A-C than the C-D paradigm, this objective is more difficult to accomplish in the former than in the latter condition. For example, old responses are likely to be rehearsed more frequently during the acquisition of A-C than of C-D, with
Experimental Analysis of Learning to Learn
289
a consequent difference in the amount of functional learning time devoted to the transfer task per se. Under these circumstances, the gains in transfer performance develop somewhat more slowly, in the initial stages of practice a t least, for the paradigm of negative transfer than the control condition. TABLE X
MEASURESOF SECOND-LIST LEARNING AND FIRST-LIST RECALL IN CYCLESOF THE TRAINING PHASE Condition
THE
Two
Cycle 2
Cycle 1 Second-list learning
P(A-C) P(C-D)
Trials 1 and 2
Trials 1-10
Trials 1 and 2
Trials 1-10
4.58 5.35
52.17 54.27
4.88 6.56
55.14 59.77
First-list recall Stringent P(A-C) P(C-D)
4.79 6.22
Lenient 5.17 6.60
Stringent 5.81 6.62
Lenient 5.88 6.62
I n the A-C condition, the number of first-list intrusions during transfer learning declines from 35 (2.6% of all opportunities for errors) in the first cycle t o 19 (1.6%) in the second cycle. The corresponding values for the C-D condition are 15 (1.6%) and 15 (1.5%). I n view of the low absolute rate of interlist intrusions, the frequencies were pooled for all Ss, and no statistical evaluation of the differences was attempted. The trends indicate, however, that list differentiation improves more rapidly under the A-C than the C-D paradigm. The mean numbers of correct responses on the test of first-list recall in the two training cycles are shown in Table X. The stringent scores are based on the numbers of responses given to the appropriate stimuli; in lenient scoring, credit is given for all first-list responses, regardless of whether or not they were paired correctly. The stringent scores will be considered first. As expected, the level of retention is substantially lower for the A-C than for the C-D paradigm. The amount recalled increases between the first and the second cycle under both conditions. In order to take account of small variations in the numbers of Ss exceeding the criterion on the terminal trial of first-list learning, the statistical
290
Leo Postman
analyses of the retention measures were carried out on loss scores, i.e., the differences between the number correct on the last acquisition trial and the number recalled. The difference between the paradigms is significant and so is the increase between the first aiid the second cycle in the amount recalled, P (1,94) = 10.57 ( p < . O l ) , 5.47 ( p < .02), respectively. The interaction is not reliable, although the rise in scores is somewhat greater for A-C than for C-D. I n the first cycle, the lenient scores are as usual higher than the stringent ones. I n the second cycle, however, the difference between the two types of scores is minimal. This shift is especially pronounced in the C-D condition. The interaction of the method of scoring with cycle of training is significant, F (1,94) = 4.45, p < .05). The lenient scores provide a maximal estimate of response availability, whereas the stringent scores index the number of items for which both the response was available and the association was intact. The discrepancy between lenient and stringent scores can therefore be viewed as reflecting the difference between the amount of response recall and the degree of retention for specific associations. On this assumption, resistance t o forgetting increases more rapidly as a function of practice for specific associations than for the response terms ; as a consequence, the discrepancy between lenient and stringent scores is reduced. An alternative interpretation is, however, that experiencedss adopt a more stringent criterion than naive ones, i.e., they are more inclined to withhold responses when they are uncertain of the appropriate pairing. There is independent evidence for the operation of such a bias in MFR procedures (Keppel, Postman, & Zavortink, 1967). This interpretation is also consistent with the results discussed earlier which point to an upward shift as a function of practice in the criterion adopted during acquisition. 7. Test Phase
For the practiced S s , speed of first-list learning is comparable t o that in the second cycle. The mean number of trials to criterion is 4.6, with individual means ranging from 4.3 to 5.1, 3’< 1. These means are to be contrasted with those for groups N-C and N-E, which are 11.8 and 10.8, respectively. The measures of second-list learning are presented in Table XI. A comparison between group N-E and the two practiced groups was reported earlier (Fig. 9) I t was found that relative t o the performance of the naive Ss those trained in the A-C condition show a significant amount of improvement, whereas those trained in the C-D condition do not. When the two practiced groups themselves are compared, the difference between tlleni is reliable for the measure of early transfer (Trials 1 and 2 ) , t ( 4 6 ) = 2.77, p < .01; aiid it approaches significance very closely
Experimental Analysis of Learning to Learn
291
for the total scores, t ( 4 6 ) = 1.95, p w . 05. The advantage of the group for which the paradigm remained the same in training and test was attributed to the development of paradigm-specific skills in the context of successive transfer-recall cycles. TABLE XI &IRASURES O F
SECOND-LIST LEARNING IN
THE
TESTPHASE
Condition
Trials 1 and 2
Trials 1-10
N-E P(A-C)-E P(C-D)-E
3.29 5.25 3.17
48.88 56.92 50.71
Figure 11 shows the stringent scores of the various groups on the final test of recall. There are increases as a fimction of practice in the amount recalled under the experimental treatment, and the gain is somewhat greater when the paradigms in training and test are the same than when they change. The cont,rol scores remain stable, but it is interesting to note that there is a slight advantage in favor of the 8 s trained under the more difficult of the two paradigms. As a consequence of the changes produced by training, the amount of R I is clearly less for practiced than for naive 8s. This conclusion is supported by the significant Treatment x Stage of Practice interaction, P ( 1 , 1 3 8 ) = 13.77, p < .01. This interaction shows that the difference between experimental and control scores is reliably greater for the naive than for the practiced Ss. While the expected trend is present, the condition of practice (A-C versus C-D) is not a significant source of variance. The lenient scores will not be reported in detail since they yield no new information. Both the stringent and the lenient scores of group P(A-C)-Ein the test phase are comparable to those in the second cycle. Thus, the major improvement in recall performance occurred after the first cycle. The question arises of whether or not the improvement in recall under the experimental treatment is attributable entirely to increases in degree of first-list learning. As acquisition becomes more rapid, associative strength at criterion may be expected to increase; resistance to interference is in turn a function of the level of associative strength at criterion. While this possibility can certainly not be ruled out, there are several considerations that suggest that such an explanation would not be complete. (1) The control scores show little or no change although there is some limited room for improvement. (Since acquisition was paced and the test of recall was unpaced, perfect scores after attainment of a
292
Leo Postman
criterion of 718 are not unusual.) The failure of the control scores to improve may be the result of cumulative proactive effects which would be present under the experimental treatments as well. ( 2 ) Practice leads to increases in speed of second-list as well as of first-list learning. Thus, there are concomitant gains in strength for both the list t o be recalled and the interfering list. (3) Probably most important, the gains in recall
2
0 (Naive)
Number of prior cycles
(Trained 1
FIG.11. The effect of practice on RI : mean recall scores of experimental and of control S s as a function of stage of practice. The A-C paradigm was used in the test phase; training was either under the A-C or the C-D paradigm.
are not related to the amount of increase in speed of first-list learning. A direct relation between the two kinds of change scores should be found if the gains in recall reflected primarily increases in the degree of learning. There is in fact, no evidence whatsoever of such a trend. I n group P(A-C)-E, that half of the Ss registering the largest improvement in first-list learning between the first and third cycles (an average reduction of 10.8 trials) showed an increase of 1.2 items (stringent scoring) between the first and third tests of recall. The other half of the Ss increased their speed of learning by 2.1 trials and their recall scores again by 1.2 items. I n view of these considerations, it remains plausible to suppose that Ss
Experimental Analysis of Learning to Learn
293
develop methods of practice that serve to maintain the strength of old associations during the acquisition of new ones. The fact that R I is rapidly reduced by relevant experience suggests that such interference may play a more limited role than has been supposed in forgetting outside the laboratory. When the individual is motivated to retain old habits as he acquires new ones, the kinds of strategies that are adopted by the experienced laboratory S may readily come into play and may permit the steady accumulation of the products of learning.
B. LEARNING TO LEARN AND PROACTIVE INHIBITION The temporal ordering of successive tasks limits the usefulness of rehearsal and related strategies for purposes of counteracting interference at recall. I n the experiment on R I that we have just described, the critical test of recall occurred immediately after the end of IL. At this point, the S’s ability t o differentiate between old and new associations is high, and rehearsal of the former will be reflected directly in the level of first-list recall. The situation is quite different when the source of interference is proactive. I n this case, delayed recall of the materials acquired last is subject to interference from prior learning ; proactive effects accumulate steadily as a function of the number of prior tasks. The available evidence suggests that such retention losses are in large measure the result of failures to discriminate between appropriate and inappropriate responses, e.g., the time tags by which successive tasks are distinguished from each other lose their effectiveness during the retention interval (Underwood & Ekstrand, 1966, 1967). Simultaneous rehearsal of old and new associations a t the time of acquisition cannot be expected to alleviate this difficulty; on the contrary, any procedure that blurs the temporal separation between successive tasks may be expected to increase the amount of proactive inhibition (PI).Moreover, as the number of prior tasks increases, simultaneous rehearsal of old and new associations becomes less and less practicable. Considerations such as these make it anderstandable that continued practice in the laboratory leads not only to progressive increases in speed of acquisition but also to concurrent declines in long-term recall for the most recent task. That is, learning to learn, as measured by speed of acquisition, and proactive inhibition, as reflected in delayed recall, are highly correlated (Greenberg & Underwood, 1950; Underwood, 1957). As the practice sessions continue, the S gains full knowledge of the sequential ordering of acquisition trials and retention tests, but there are few if any strategies at his disposal that can serve to mitigate the increasingly damaging influence of proactive inhibition. I n a recent study in this series (Keppel, Postman, & Zavortink, 1968), we had an opportunity to make some experimental observations which
Leo Postman
294
yielded a rather impressive example of massive proactive effects accumulating during an extended period of practice. We had access t o a small group of Ss of college age whose participation in a nutritional study required them t o remain confined to the laboratory for a period of 3 months. Participation in a study of memory was made part of the 8 s ’ regular schedule. The experiment comprised a series of successive cycles of learning and recall. Each cycle consisted of the acquisition of a list
I
I
I
1
2
3
I 4
I
I I 8 9 Successive three-list blocks
5
I 6
I
7
1 10
1 I1
1 12
FIG.12. Cumulative effects of practice on acquisition: mean trials to selected criteria for successive blocks of three lists. Mean trials to a criterion of loll0 are shown also for individual lists. From Keppel et al. (1968).
of 10 paired associates, with common English words as stimuli and responses, t o a criterion of perfect recitation, followed by the recall of the list 48 hours later. Immediately upon completion of the recall test, the next learning-recall cycle was begun. We obtained data on 36 successive cycles from five Ss. The sample is, of course, small and the lists could be only partially balanced over cycles. Thus, the individual data points show considerable variability but the major trends emerge quite clearly. Pigure 12 shows for successive blocks of three cycles the numbers of trials to reach five different criteria of mastery in acquisition. The mean trials t o a criterion of perfect recitation for each of the individual lists are also shown. Apart from some erratic fluctuations in the initial phase of the study, speed of acquisition increases rather steadily ; the number of trials to the criterion of perfect recitation is reduced by approximately 50y0. The magnitude of the practice effects increases with the difficulty of the
Experimental Analysis of Learning to Learn
29.5
criterion. Figure 13 shows the changes in the amount of recalled after 48 hours. The heavy dark line connecting the solid circles represents the averages for successive blocks of three cycles ; the open circles give the values for the individual lists. The level of recall drops very steeply a t first and more slowly thereafter. Whereas recall for the initial list is 70% (a typical value for naive Xs), the amounts of loss in the final stage of practice approach complete forgetting. Far from becoming more proficient in recall, the experienced learner retains less and less.
' 0
4 d
5 6 7 8 9 Successive three-list blocks
I0
1;
I2
FIG.13. Cumulative effects of practice on retention: Mean percentages of recall after 48 hours for individual lists and for blocks of three lists. From Keppel et al. (1968).
VII. Conclusions The findings that have been reviewed here support the conclusion that nonspecific transfer is subject to experimental manipulation and t o analysis into component habits and dispositions. Our experiments show that ( 1 ) for naive& a t least, the perceptual-motor adjustments subsumed under the heading of warm-up are secondary to the acquisition of instrumental skills that are directly relevant to the verbal requirements of the learning task; ( 2 ) such skills are to some extent specific to a given method of learning or a given type of materials but also generalize broadly across these dimensions ; ( 3 ) skills developed through prackice may facilitate not only the acquisition of single lists but also the discovery and implementation of the paradigmatic rules governing the relations between successive tasks ; and (4)within the constraints laid down by the conditions of testing and the temporal ordering of tasks,
296
Leo Postman
the learner can modify his methods of practice so as to increase his level of recall. More generally, the experimental findings encourage the view that such concepts as learning strategies and rule-governed behavior need not be invoked in an ad hoc fashion but can be brought under experimental control by the systematic manipulation of the conditions of practice. REFERENCES Ammons, R. B. Acquisition of motor skill: I. Quantitative analysis and theoretical formulation. Psychological Review, 1947, 54, 263-281. Cooper, E. H., & Pantle, A. J. The total-time hypothesis in verbal learning. Psychological Bulletin, 1967, 68, 221-234. Dinner, J. E., & Duncan, C. P. Warm-up in retention as a function of degree of verbal learning. Journal of Experimental Psychology, 1959,57, 257-261. Duncan, C. P. Descriptions of learning to learn in human subjects. American Journal of Psychology, 1960, 73, 108-114. Greenberg, R., & Underwood, B. J. Retention as a function of stage of practice. Journal of Experimental Psychology, 1950,40, 452-457. Houston, J. P. First-list retention and time and method of recall. Journal of Experimental Psychology, 1966, 71, 839-843. Irion, A. L. The relation of “set” to retention. Psychological Review, 1948, 55, 336-341. Irion, A. L. Retention and warming-up effects in paired-associate learning. Journal of Experimental Psychology, 1949, 39, 669-675. Keppel, G., & Postman, L. Studies of learning to learn: 111. Conditions of improvement in the performance of successive transfer tasks. Journal of Verbal Learning and Verbal Behavior, 1966,5, 260-267. Keppel, G., Postman, L., & Zavortink, B. Response availability in free and modified free recall for two transfer paradigms. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 654-660. Keppel, G., Postman, L., & Zavortink, B. Studies of learning to learn: VIII. The influence of massive amounts of training upon the learning and retention of paired-associate lists. Journal of Verbal Learning and Verbal Behavior, 1968, 7, 790-796. Krueger, W. C. F. The relative difficulty of nonsense syllables. Journal of E x perimental Psychology, 1934, 17, 145-153. Lazar, G. Warm-up before recall of paired adjectives. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 321-327. McGeoch, G. 0. Whole-part problem. Psychological Bulletin, 1931, 28, 713-739. Martin, R. B., Simon, S., & Ditrichs, R. Verbal paired-associate transfer as a function of practice and paradigm shift. Psycihonomic Science, 1966, 4, 419-420. Murdock, B. B., Jr. The criterion problem in short-term memory. Journal of Experimental Psychology, 1966, 72, 31 7-324. Newton, J. M., & Wickens, D. D. Retroactive inhibition as a function of the temporal position of interpolated learning. Journal of Experimental Psychology, 1956,51, 149-154. Postman, L. Does interference theory predict too much forgetting? Journal of Verbal Learning and Verbal Behavior, 1963, 2, 40-48. Postman, L. Studies of learning to learn : 11. Changes in transfer as a function of practice. Journal of Verbal Learning and Verbal Behavior, 1964, 3, 437-447.
Experimental Analysis of Learning to Learn
297
Postman, L. Differences between unmixed and mixed transfer designs as a function of paradigm. Journal of Verbal Learning and Verbal Behavior, 1966,5, 240-248. Postman, L. Studies of learning to learn: VI. General transfer effects in three-stage mediation. Journal of Verbal Learning and Verbal Behavior, 1968, 7, 659-664. Postman, L., & Goggin, J. Whole versus part learning of serial lists as a function of meaningfulness and intralist similarity. Journal of Experimental Psychology, 1964,68, 140-150. Postman, L., & Goggin, J. Whole versus part learning of paired-associate lists. Journal of Experimental Psychology, 1966, 71, 867-877. Postman, L., Keppel, G., & Zacks, R. Studies of learning to learn: VII. The effects of practico on response integration. Journal of Verbal Learning and Verbal Behavior, 1968,7, 776-784. Postman, L., & Schwartz, M. Studies of learning to learn: I. Transfer as a function of method of practice and class of verbal materials. Journal of Verbal Learning and Verbal Behavior, 1964, 3, 37-49. Postman, L., & Stark, K. Retroactive inhibition as a function of set during the interpolated task. Journal of Verbal Learning and Verbal Behavior, ,1962, 1, 304-31 1. Postman, L., Stark, K., & Fraser, J. Temporal changes in interference. Journal of Verbal Learning and Verbal Behavior, 1968, 7, 672-694. Rockway, M. R., & Duncan, C. P. Pre-recall warming up in verbal retention. Journal of Experimental Psychology, 1952, 43, 305-312. Schwenn, E., & Postman, L. Studies of learning to learn: V. Gains in performance as a function of warm-up and associative practice. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 565-573. Thune, L. E. The effect of different types of preliminary activities on subsequent learning of paired-associate material. Journal of Experimental Psychology, 1950, 40, 423-438. Thune, L. E. Warm-up effect as a function of level of practice in verbal learning. Journal of Experimental Psychology, 1951, 42, 250-256. Tulving, E. Subjective organization and effects of repetition in multi-trial freerecall learning. Journal of Verbal Learning and Verbal Behavior, 1966,5, 193-197. Tulving, E., McNulty, J. A., & Ozier, M. Vividness of words and learning to learn in free-recall learning. Canadian Journal of Psychology, 1965, 19, 242-252. Twedt, H. M., & Underwood, B. J. Mixed vs. unmixed lists in transfer studies. Journal of Experimental Psychology, 1959, 48, 111-1 16. Underwood, B. J. Studies of distributed practice: VI. The influence of rest-interval activity in serial learning. Journal of Experimental Psychology, 1952,43,329-340. Underwood, €3. J. Interference and forgetting. Psychological Review, 1957, 64, 49-60. Underwood, B. J., & Ekstrand, B. R. An analysis of some shortcomings in the interference theory of forgetting. Psychological Review, 1966, 73, 540-549. Underwood, B. J., & Ekstrand, R. R. Studies of distributed practice: XXIV. Differentiation and proactive inhibition. Journal of Experimental Psychology, 1967,74, 574-580. Underwood, B. J., & Schulz, R. W. Meaningfulness and verbal learning. Philadelphia: Lippincott, 1960.
This Page Intentionally Left Blank
SHORT-TERM MEMORY IN BINARY PREDICTION BY CHILDREN: SOME STOCHASTIC INFORMATION PROCESSING MODELS' Richard S . Bogartz UNIVERSITY O F ILLINOIS URBANA. ILLINOIS
I. Single Alternation ........................................
. .
A Introduction .......................................... B . A Single Subject: The Basic Ideas ........................ C TheTheory ............................. I1. A Model for Single Alternation .............................. A . An Axiomatization ..................................... B . A Markov Property for Model A .......................... C Some Theorems for a Finite Sequence from a First-Order Markov Chain Having Matrix (5) as Its Transition Matrix . . . . D Estimation ........................................... 111. Data . . . . . . . . ..................................... A . Experimen diction of a Single Alternation Sequence ... B Experiment 11: Prediction of a Single Alternation Sequence Following Prediction of a Markovian Tending-to-Alternate Sequence ............................................. C. Discussion ............................................ 1V Extension to the Effects of Intertrial Interval Duration . . . . . . . . . A. The Effect of Lengthening the Intertrial Interval . . . . . . . . . . . B Experiment 111:The Bogartz and Pederson Study . . . . . . . . . . C A Model for Within-8s Variation in the IT1 Duration . . . . . . . . D . Experiment IV: Within8 Variation in the IT1 Duration . . . . E Discussion ............................................ V . Extension to Interpolated Events ........................... A. InterferenceEffects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B A Model for the Effects of Interpolated Events . . . . . . . . . . . . . . C . Experiment V: The Interpolated Events Experiment . . . . . . . . D Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VI . Extension to Markov Event Sequences....................... A. The lmplication of a One-Trial Memo Complex Event Sequences . . . . . . . . . B . The Use of a Repetition Rule . . . . . . . . . . . . . . . . . . C . Two Models for Prediction of Markov Event Sequences. . . . . .
. . .
.
. . .
. .
300 300 302 307 312 312 317 320 324 329 329
335 339 341 341 342 348 351 353 356 356 357 359 362 363 363 364 364
I am indebted to Ellen Brewer. Pamela Parris. Dorothy Gerety. and John Love for their assistance and enthusiasm a t various times during the course of these investigations . Port.ions of this work were supported by PHS Research Grant No . HD03574.01 . 299
300
Richard S. Bogartz
D. Experiment V I : Prediction of Tending-to-Alternate and Tending-to-Repeat Sequences. .......................... E. Discussion ............................................ VII. Noncontingent Event Sequences. ........................... A. Noncontingent Event Sequences: w > . 5 . . . . . . . . . . . . . . . . . . B. Individual Differences in Rule Selection. . . . . . . . . . . . . . . . . . . C. Two Models for Prediction of Noncontingent Event Sequences with n > . 5 ........................................... D. The OffenbachExperiment .............................. E. Two Models for Prediction of Noncontingent Binary Sequences with ~ = . ........................................... 5 F. The Bogartz Study .................................... VIII. Conclusions and Directions. ................................ References . . . . . . . . . . . . . . . . . . . . . . . . . . . ....................
367 371 373 373 374 374 380 382 383 386 389
I. Single Alternation A. INTRODUCTION The binary prediction or two-choice guessing game situation has been regarded as a prototypical learning paradigm. Whether learning is viewed as changes in the probability distribution defined over a set of responses or response classes, or as changes in underlying response strengths or conditioning states that determine changes in the probability distribution, the empirical essentials for the study of learning have seemed to be present in a situation with just two responses or response classes and two events that result in modifications of the response relative frequencies. In fact, the combining-of-classes condition (Blau, 1960 ; Bush & Mosteller, 1955; Bush, Mosteller, & Thompson, 1954) implies that any learning situation involving an arbitrary number of response classes can be translated into one involving just two classes by simply lumping all response classes except one into a single second class (but see Luce, 1959, pp. 93-94 for a dissenting view). The study of choice or decision-making has also led to a number of investigators to focus on this same prototypical paradigm (Edwards, 1956; Siegel, 1964; Siegel & Goldstein, 1959). It is not surprising, then, that intensive study by workers of diverse interests has focused on the situation in which on each ofa series of trials, following a signal to respond, the subject predicts or selects which of two uncertain outcomes will occur. Students of general, comparative, and developmental psychology, studying probability learning, discrimination learning, problem solving, hypothesis or strategy selection, subjective probability and utility, and even social interaction, all seem to have been able to put this paradigm to work for them (Bush & Mosteller, 1955; Estes, 1964; Goodnow, 1958; Luce, 1959; Restle, 1961; Suppes & Atkinson, 1960). One of the major
Short-Term Memory in Binary Prediction by Children
301
positions taken in this chapter will be that yet another area of research into human behavior, the study of attention and short-term memory in young children, will find the binary prediction situation and its extensions a fruitful research paradigm. The previous study of children’s behavior in binary prediction and related tasks has been motivated by a variety of interests, from the testing of learning or decision models and predictions associated with them (Atkinson, Sommer, & Sterman, 1960; Bogartz, 1965; Craig & Myers, 1963; Siege1 & Andrews, 1962) to manipulation of situational variables (Stevenson & Weir, 1959; Stevenson & Zigler, 1958) and developmental levels (Weir, 1964). The findings of these studies and many more that have employed child subjects are difficult to integrate. Parameters such as the number of responses and the type of reinforcement schedule (contingent versus noncontingent) tend to vary from study to study, with Stevenson and Weir, two of the more productive workers, tending to use the three-choice motor response task with contingent reinforcement schedules, and many of the other workers tending to use the two-choice task, often with verbal responses, and usually with noncontingent reinforcement schedules. Often investigators have used the binary prediction situation as a setting in which they could study the effects of variables such as type or amount of reinforcement, so that the kinds of comparisons that have been made and the types of analyses and results that have been reported are not very informative as far as understanding of the basic mechanisms underlying behavior in the situation is concerned. Also, besides variation in the many different situational variables that have been manipulated, the studies have used subjects from different developmental levels and, as the work of Weir (1964, 1966) and Craig and Myers (1963) so clearly shows, there are marked differences in behavior as a function of age in both the three-choice contingent and the two-choice noncontingent situations. Obviously, understanding and integration of the diverse findings can only come with the development of theory for behavior at the various developmental levels and the discovery of laws describing the transition from one level to the next. It is equally obvious that the former must precede the latter. The formulation of any developmental law that describes the transition from one level to the next presupposes lawful description of behavior at the two levels. In this chapter, I shall propose a theory of child behavior in the binary prediction situation. The developmental level(s) t o which the theory is considered most appropriate roughly corresponds to the age range of about 39 years to about 5Q or 6years of age. The theory was first developed to explain the behavior of preschool children in a simplified prediction task in which the two events t o be predicted simply alternated from trial
302
Richard S. Bogartz
to trial (Bogartz, 1966d). Elsewhere, I have suggested some of the advantages in the use of a single alternation sequence with young children (Bogartz, 1966a). The theory was then extended to incorporate the manipulation of certain situational variables such as intertrial interval within the single alternation prediction task, and the results of a test of this extension have beenrepor-ted (Bogartz, 1966~). The successes of one model suggested by the theory in generating detailed quantitative description of the data from individual children in the single alternation prediction task encouraged a search for extensions of the theory to models for the prediction of other types of binary event sequences. Extensions first to prediction of Markov sequences of events (Bush & Mosteller, 1955), and then to prediction of noncontingent sequences were found, and these were tested using data collected in our laboratory and data collected by Peter Derks and Marianne Paclisanu at the College of William and Mary, Jerome Myers and Grace Craig at the University of Massachusetts, Stuart Offenbach at Purdue University, and Rachel Keen Clifton at the University of Minnesota, who generously made them available. The most recent work has been directed to testing of the theory as it applies to prediction of sequences containing more than two events, testing the adequacy of the basic concepts of the theory as they could be expected to apply to an immediate recall or short-term memory task that can be treated theoretically much like the binary prediction task, and extension of the theory to incorporate prediction of latency measures. Space limitations will not permit a complete treatment of the work here. The purpose of this chapter, then, is to present a detailed statement of some of the basic theoretical ideas together with some of the various extensions and the various models that have been derived, to present the results of some tests of the theory as it has been applied to the prediction situation and, finally, t o consider the adequacy of the theory in its present form, its relationship to other theoretical approaches, and a number of extensions of the theory in terms of as yet untested alternate models that employ the same basic ideas but lead to more general formulations.
SUBJECT:THEBASICIDEAS B. A SINGLE Under ideal conditions, some preschool children can continue to predict correctly the next event in a sequence of two alternating events after 50 or even 100 trials. Most cannot. In a number of experiments (Bogartz, 1966b; Bogartz & Pederson, 1966), it has been observed that almost all preschool children make errors. These errors do not occur randomly. Instead, they exhibit dependence of a relatively simple nature.
Short-Term Memory in Binary Prediction by Children
303
The theory to be presented here is an attempt to explain the mechanisms underlying the occurrence and patterning of these errors. Before a formal treatment of the theory is presented, it will be helpful to introduce some of the basic ideas by discussing portions of the data collected from one 32~-year-old-girl.~ The apparatus and procedure are described below (Experiment 1). The essentials are as follows. The child is first shown two colored marbles and required to identify correctly the two colors. He is informed that only the colors he has been shown will be used. He is then told that each time he hears the buzzer he is to predict the color of the next marble, and that after predicting he is to press a button which will release that marble. The buzzer sounds for . 3 second every 8 seconds. The marble is removed by the experimenter after it has been released into a receptacle located in plain view of the subject. For this subject, the marbles alternated black (B), white (W), B, W, . . . throughout the experiment. I n the sequence of prediction responses (PR’s) presented in Table I, the italicized PR’s are errors. The protocol actually is abbreviated in that some of the runs of correct responses have been shortened because, TABLE I AN ABBREVIATED PROTOCOL FROM A 34-YEAR-OLD PREDICTING AN ALTERNATING SEQUENCE
“BWBWW1WBWW2WBWBB,BWBWW4B5BWBW W,B7W,B9BWBWWioBiiBWBWBB,,Wi3W.. .”
for the present purposes, our attention will focus upon prediction errors and the intermittent nonpredictive verbal responses that the subject volunteered following some of those errors. Except for subscripts 4, 7, 8 , 9, and 10, the subscripts in the sequence locate the occurrence of certain verbal responses that occurred after the marble whose color had been incorrectly predicted had been released and had fallen into the receptacle. These responses are listed by subscript in Table 11. Subscripts 4, 7, 8, 9, and 10 locate the absence of a nonpredictive verbal response following an error; the significance of these absences will be clear in a moment. Inspection of the nonpredictive verbal responses interpolated between the occurrence of a predictive error and the occurrence of a correct prediction on the next trial ( 1 , 2 , 3, 5, 11,and 13)indicates one thing that they all have in common : in each instance, the final word uttered was the name of the mispredicted marble’s color. It is tempting to assume that zI am indebted to Melba Rabinowitz for her delicate assistance in the pretraining and running of this child and a number of other very young pilot subjects.
304
Richard S. Bogartz
this final word, call it a naming response for the time being, is a causal antecedent of the following correct response. Perhaps the occurrence of this response produces a trace which is entered into a short-term store and is then retrieved when the next signal to respond is given. Because of previous conditioning or association, say, the trace would then elicit the alternate verbal response on the next trial. Thus, utterances of the word “White” at locations 3 , 5 , and 11 could have produced a trace of “White,” which when retrieved on the next trial elicited the predictive response “Black”; at 1, 2, and 13, the trace of “Black” could have elicited the prediction “White” on the following trial. (Although this ad h m analysis of the behavior of a single subject can only be regarded as reasonable conjecturing, we shall see in Experiment V strong evidence that verbal responses such as these have precisely the sort of effect the analysis proposes. ) TABLE I1 THESETOF NONPREDICTIVE VERBAL RESPONSES TO STIMULI INCORRECTLY PREDICTED BY THE 39-YEAR-OLD 1. “I said white and I didn’t say black.” 2. “B1.. .” (presumably the initial portion of the word black) 3. “White.” 5. “No, no. White.” 6. “No.” 11. “White.” 12. “No.” 13. “Black.”
Now consider the very interesting responses a t locations 6 and 12. The child said “No.” This can be taken as an indication of some recognition of an error having been made. Nevertheless, in both instances the child made another error on the next trial. The recognition of an error is important because it indicates that a trace of the predictive response was still available and that it functioned as would an expectancy. Presumably, the disconfirmation of that expectancy elicited the verbal response “No.” It seems that neither the input of the visual stimulus (a prerequisite to recognition of an error having occurred) nor recognition of the error are in themselves sufficient to provide a trace that will elicit a correct response on the next trial as does the naming response (consider again location 5 where recognition of the error plus naming is followed by a correct response). The apparent requirement of a naming response as a prerequisite to later access to a trace can be taken as being essentially equivalent to what Melton (1963) meant when he said:
Short-Term Memory in Binary Prediction by Children
305
It seems t o me necessary to accept the notion that stimuli may affect the sensorium for a brief time and also the directly involved CNS segments, but that they may not get ‘hooked up,’ associated, or encoded with central or peripheral response components, and may not, because of this failure of being responded to, become a part of a memory-trace system. This view is supported by the recent work of Averbach and Coriell (1961), Sperling (1960), and Jane Mackworth (1962) which shows that there is a very-short-term visual preperceptual trace which suffers rapid decay (complete in .3 to .5 sec.). Only that which is reacted to during the presentation of a stimulus or during this post-exposure short-term trace is potentially retrievable from memory.
This statement, in turn, is essentially equivalent to what Broadbent (1963) meant when he said : It does seem therefore that in this very first second after presentation of the stimulus, before it has been categorized, the stored information decays very rapidly as a function of time and not as a function of intervening activity. On the other hand, this situation of decay immediately following the presentation of a stimulus is admittedly rather different from that of memory for a stimulus which has been categorized; or, if you will, has received at least one response.
So much for the naming response, a t least for the moment. What then elicited the errors following the two “No” responses a t 6 and 122 Suppose that in the absence of an appropriate naming response following occurrence of the event being predicted a subject retains in his shortterm store a trace of the “naming” response used to predict the stimulus event. Retrieval of this trace, i.e., the trace of the erroneous predictive response, would be expected to produce another error if, making the same assumption as for the trace of the naming response, this trace elicits prediction of the color alternate t o the color that was named, because this would produce an alternation of the prediction by the subject which in conjunction with the alternation of the event sequence would keep the subject one step out of phase with the sequence. He would alternate himself right into another error. Support for this conjecture can be found at locations 4, 7, 8 and 10, where in each instance the subject made no interpolated naming response t o the stimulus event and then made another error on the next trial. The hypothesis that a memory trace of the subject’s previous response can influence his subsequent response has of course been proposed before. Hake and Hyman (1 953) called attention t o the effects of previous responses in binary prediction, and Anderson (1960, 1964) has provided additional evidence for such effects together with some very insightful discussion of some of the theoretical problems to which such effects give rise. I n the context of the recall of visual stimuli, Broadbent (1963) has argued for the importance of response traces in the production of
306
Richard S. Bogartz
confusion errors. Conrad (1962) found that men who were shown alphabetic letters and then asked to reproduce the series they had seen made confusion errors similar to those that are made in listening to letters spoken through noise. For example, when the letter V is shown, the letter B is recalled but not the letter X, although visually the letters v and X have more in common than do V and B. Broadbent (1963) suggests that such confusion errors occur because the subjects “tended to make a verbal response to the visually presented letters, and that subsequently in recall they responded to their own previous responses : they said the letters over to themselves.” If we think of the young child who is predicting a single alternation sequence as trying to keep track of the present state of the sequence, then Hunt’s (1963) computer model for the keeping-track task studied by Lloyd, Reid, and Feallock (1960) and by Yntema and Mueser (1960) is relevant to the question of response traces. His storage rule (Rule e) which states that “when a question is answered, the answer is stored as if it had been a message” translates readily into “when a subject gives a response, a trace of this response is stored as if it were the trace produced by a naming response to an event stimulus.” (In fact, with only minor changes, and the introduction of probability laws for encoding and storage, Hunt’s model of the keeping-track task would be essentially equivalent to the conception of the young child’s prediction of an alternation sequence to be presented later). If both the nonpredictive naming response and the predictive response can enter traces into the short-term store, do both remain available for retrieval? Can they compete? The working hypothesis here will be that only one of these traces is held in storage at any given time. When a trace is entered into the short term store, it erases any other trace there. It will be shown later, however, that this assumption is not necessary in that probabilistic availability of the traces is also a tenable assumption, at least as far as derivation of the models that have been studied is concerned. I n favor of the hypothesis, it can be said that storing only one trace at a time would be an efficient, error-reducing tactic. Also, it is suggested by Broadbent’s (1963) comment that
... when information comes into the nervous system it passes through a limited capacity channel, which means that it is difficult to take in other information simultaneously. Once through this channel, items of information can be held in a short-term store, but only as long as they can pass repeatedly through the same limited capacity channel as that by which they arrived. So long as they can do this, they can be stored in this way indefinitely; but if the limited p a r t of the system becomes unavailable, perhaps through the admission of some fresh items of information f r o m the outside world, then the storage of the earlier i t e m will f a i l and they will not be available a n y longer (italics added).
Short-Term Memory in Binary Prediction by Children
307
and Hunt’s (1963) storage rule (Rule d) for the keeping-track task, “When a chunk is stored in a bin, the previous contents of the bin, if any, are lost.” Let us return now for a last look at the protocol. It must be noted that a t location 9 the subject made no overt naming response, yet made a correct response on the next trial. Two conjectures come to mind. The appropriate trace could have been provided by an unobserved, perhaps implicit, naming response. On the other hand, perhaps no trace a t all was available for retrieval and the subject simply guessed correctly. This corresponds to Hunt’s retrieval Rule b. 3 which has the subject select a random answer from the set of permissible answers if the search rules do not lead to a permissible answer. The first alternative not only accounts for the occurrence a t location 9 but also is helpful in explaining the behavior of slightly older children who often make characteristic nonverbal responses such as covering their mouth or smiling after an error, and then almost invariably make a correct response on the next trial without making any overt naming response. The second alternative implicitly assumes that short-term storage is not perfect, that information entered into the store does not necessarily remain available for retrieval indefinitely. Of course, this assumption was implicit in the previous discussion also, since if after the Sew trials required to produce conditioning of the traces, the trace of one response would always elicit the alternate response; then, in order for an error to ever Sollow a correct response, the trace would have to be vulnerable to some process such as decay or displacement by other (possibly response-produced) stimuli. Otherwise, once a correct response occurred it would always be followed by a correct response on the next trial, since if an interpolated naming response did not furnish a trace conditioned to a correct response, the trace of the previous correct predictive response would still be available to elicit the alternate, and therefore correct, response. Once the subject is in phase with the sequence (responding correctly), perfect response memory can completely compensate for any degree of inattention to the events. I n summary, then, it has been suggested that when the young child predicts a single alternation binary event sequence, he sometimes encodes an event stimulus as a memory trace which he then uses on the following trial to make his next prediction, he sometimes makes his next response by using a trace of his previous predictive response, and he sometimes just guesses. These ideas will now be progressively formalized.
C. THETHEORY The sequence of observed events and significant temporal intervals
Richard S. Bogartz
308
for a given trial is shown in Fig. 1. The trial begins with the delivery of the cue to respond, Cue,, which is shown as a point in the temporal sequence since, although its duration is of course measurable, it has always been brief and can be taken to be negligible. The following interval L,, the subject’s response latency, is terminated by the predictive response PR,, which is also taken to be of negligible duration. The experimenter determines the next three intervals D,, Ev,, and I,, the
DELAY OF E V E N T
LATENCY t
L
n
EVENT DURATION
Dn-t-
Cue,
PRn
Event,
INTERTRIAL INTERVAL
Evn
on
Event,,
off
Cuen+i
FIG.1. The temporal structure of the nth trial of a two-choiceguessing situation.
sum of which is the response-cue interval, RC,, i.e., the interval between the PR, and the cue to respond on the next trial, Cue,, . The interval D, is the feedback delay time, the time taken by the experimenter before presenting the event predicted by PR,. The interval Ev, is the period during which the stimulus event is exposed to the subject, and the interval I, is the intertrial interval. For the remainder of this section, D,, Ev,, and I, will be assumed to be constant from trial to trial. Figure 2 shows a flow diagram of the relevant processes that are assumed to underlie the occurrence of the predictive response. The
,
ATTENDER
into store, removing
Cue,
-
Enter event trace into
Emit next response. Enter response trace into store
GENE RAT0 R Hold the most recently entered trace
Apply tronsformation or guessing rule
FIG.2. Flow diagram of the processes underlying PR,.
Short-Term Memory in Binary Prediction by Children
309
occurrence of Cue, initiates the retrieval process that takes the interval L, to be completed and culminates in PR,. Retrieval begins with the transfer of the short-term store content to a response generator. The generator applies either a transformation rule or a guessing rule to this information, thereby determining the PR,. The occurrence of the PR, is followed by the encoding of its trace in the short-term store. If the subject attends to the event and makes an encoding response, a trace produced by this encoding response enters the store, erasing its previous content. The distracter in Fig. 2 represents sources of irrelevant interfering stimulation. A t any point in time, an input from either an external or internal source other than the predictor or the encoder may be delivered to the short-term store, erasing its previous content. As far as the presence in the store of information relevant to the prediction of the next event is concerned, this is equivalent to the store being emptied. Therefore, it will be said that at any point in time it is possible for a null trace to enter the short-term store. A null trace ordinarily can be regarded as any trace in the short-term store that is not a trace from the predictor or the encoder. Under special experimental procedures, e.g., when the experimenter attempts to manipulate the content of the store by asking the subject to name something, or by saying or showing something to him, e.g., during the I,, this definition of a null trace may require minor modification. The idea that would guide such modification will always be that members of one class of traces cause the generator to apply a probabilistic guessing rule in determining the next response while members of the other class can serve as the argument of a deterministic transformation rule. The concept of transformation rules is introduced to provide the theory with sufficient scope to handle somewhat different approaches taken by different children to the problem of predicting a single alternation sequence. One approach was illustrated in the discussion of the protocol in Table 11. It was suggested that there the trace of the nonpredictive, perhaps covert, verbal response “Black,)’for example, elicited the predictive response “White” on the next trial. We would say that this trace was entered by the encoder and transformed into the response “White.” Some children play the game in a different fashion, however. When predicting the BWBW . . . sequence, they are observed to begin quietly repeating over and over, not the color of the event just shown, but the other color. That is, when the black event occurs, the child begins to whisper or mouth “White, white, white . . .” until the cue is presented, at which time the response “White” is given aloud. Presumably, some children use an equivalent but covert method of storing their expectancy of the next event which they then emit at the appropriate
310
Richard S. Bogartz
time in the form of an overt response. In this case, the stimulus event determines the encoding response, as before, but the trace that is encoded does not correspond to the name of the color that was shown, but to the name of the color that will be shown. The rule that the second type of subject used to generate his next response appears to be quite different from the rule used by the first. This has interesting implications for certain aspects of the data but space limitations will not permit treatment of them here. The concepts of encoding response and transformation rule are sufficiently broad to encompass both kinds of rules. When we come to a treatment of socalled probability learning in young children, still different combinations of encoding responses and transformation rules will be found to be useful. The notion that each new trace that enters the store erases the previous trace implies that forgetting occurs as a result of interference rather than trace decay. Also, it suggests that the capacity of the short-term store is exactly one trace. The first implication is included more as it matter of convenience than as a matter of necessity or even of commitment. The interference-versus-trace decay controversy with respect to forgetting in general and short-term forgetting in particular is far from resolved (Postman, 1964) and offers little guidance in the selection of storage assumptions for forgetting in the prediction task. Some of the spirit, if not the terminology, of the decay hypothesis will be introduced as needed later. For our purposes, there will be no need to decide yet between a weakening trace or conditions that cause a trace to become increasingly more likely to be displaced (and perhaps, in fact, both hypotheses are correct.) What will be necessary is that the response-andevent traces become increasingly less likely to determine -the next response, as certain intervals, I, in particular, become longer. With respect to the assumption of a one-trace capacity, certainly there is no intent to suggest that under all circumstances young children are limited to the short-term retention of only one item. Such an assertion is easily refuted. It is intended, instead, to suggest that in single alternation prediction and related tasks it may well be the case that only one trace is stored. From a formal point of view, the decision is not crucial in that the models to be considered here can also be derived from an assumption of probabilistically differential availability of the response, event, and null traces. (Some consideration of such homomorphic relationships between various axiom sets and a specific model is given later.) From a psychological standpoint, the view subscribed to her-e is that often, if not always, the young child predicting a binary event sequence does attempt to hold only one item and can frequently be observed to engage in behavior that apparently results in just that (see above.)
Short-Term Memory in Binary Prediction by Children
3 11
What should be emphasized about the theoretical ideas illustrated in Fig. 2 is (1) their Markovian character, (2) the absence of any learning assumption, and (3) their explicit assertion of the control that a response may exert on the next response by means of its trace. By their Markovian character is meant the implication that the response on a given trial depends a t most on the conditions prevailing on the previous trial. Such information as the system in Fig. 2 may have concerning its distant past is assumed to be located in its generator rules, and these rules are taken to be fixed. Certainly, any complete treatment of predictive behavior and the processes that underlie it from the present point of view will require consideration of the way in which generator rules change. Such changes would be learning as far as the system is concerned. They would certainly be needed, for example, to permit treatment of transfer tasks in which the sequence of events changes from single alternation to, say, repetition of a single event, or vice versa. We do not know very much about young children’s behavior in transfer from one simple recurrent sequence to another (Bogartz, 1966a), so at present we will try to see how far we can proceed in the description of predictive behavior without introducing a learning assumption, thereby excluding the problem of transfer. The postulation of a trace mechanism whereby a response determines the next response has been dictated by our observations in the laboratory. One cannot watch a child predicting an alternation sequence for any length of time without seeing a variety of behaviors that demand such an assumption. The child can be so oriented that it is impossible for him to see the event exposed, yet he can continue to respond correctly (or incorrectly) for runs of trials that clearly are longer and occur more frequently than could be explained by appeal to a random guessing process. A thought experiment in which the cue and the events are eliminated and the child is simply asked to sa,y “Red-green-red-green ...,” and to continue to do so (which 4-year-olds certainly can do), demands the existence of a mechanism that can maintain this type of behavior. There is no question as to whether the child can display such response-determined responses. The only question is whether or not he does so during alternation prediction, and we hypothesize that, in general, he does. I n terms of Fig. 2, the thought experiment might be viewed as a closed loop consisting of the predictor, the store, and the generator, with the rules in the generator determined by the experimenter’s instruction to alternate. The instruction to keep responding eliminates the need for a cue to activate transfer from the store to the generator. Transfer becomes automatic and the store simply passes each response trace on to the generator.
312
Richard S. Bogartz 11. A Model for Single Alternation
A. An AXIOMATIZATION The advantages of a set-theoretical formulation from the standpoint of clarity, rigor, and precision have been discussed by Estes and Suppes (1959) and will not be repeated here. We follow their treatment here for the first model to be considered, a model for single alternation prediction by young children. It will then be evident throughout the remainder of the chapter that similar treatments can be given for defining each of the models subsequently considered. We follow their approach closely, first giving the primitive and defined notions that will be needed, and then stating the axioms that must be satisfied if a particular settheoretical entity is to be the desired model. The reader is referred to their article (Estes & Suppes, 1959) or to the more complete technical report (Estes & Suppes, 1957) for the definitions of unfamiliar mathematical terms. The reader with little mathematical background can skip lightly through Section A, reading to locate definitions of notation that appears again later, and for such verbal statements as may be helpful; he can omit Section B, read Section C with an eye toward the statistics to be used later but skipping the proofs of theorems, and read Section D another time when interested in using the simple numerical computations needed for parameter estimation. I n the latter case, the information needed is instructions for the use of Table 111. To give the first primitive notion, the sample space of a single alternation prediction experiment, some notation must first be introduced. be a set the elements of which are interpreted as Let t(’)= (tI(’),t2(’)) the short-term memory traces of the two predictive responses. The elements of the set A = (A,,A,) are interpreted as the two predictive responses such that ti(’)is the trace of A, if i =j. The elements of the set E = (E I , E, )are interpreted as the stimulus events that are being predicted such that Ai is the correct prediction of E, if i =j.The elements of the set a = (ao,a,,a2)are interpreted in order as the failure of an encoding response to occur to an event, an encoding response to an E l , and an encoding response to an E,. The set t(2)=(t t , ( 2 )is ) interpreted as the set of traces resulting from an encoding response to an event such that ti(,)is the trace produced by the encoding response a j if i = j . Finally, let t ( O ) = ( t , ‘ O , ) be the set consisting of one element that is interpreted as the null trace. We can now define a trial outcome to be an ordered 4-tuple ( s ,i,j,Ic) in which
Short-Term Memory in Binary Prediction by Children
313
u tcf)is the trace retrieved on that trial, 2
sES =
f=O
i E A is the response on that trial, j E E is the event on that trial, and k E a is the encoding response on that trial. For example, the 4-tuple (t2(1),A1,E2, a,} represents a trial on which the trace of a previous A, response was retrieved from the short-term store, the subject made an A, predictive response and, therefore, an error, since, as the value of j indicates, an E, occurred, and then encoded the E, as a memory trace t,(,) by making the encoding response a,. The sample space X can now be defined as the set of all possible sequences of trial outcomes. Thus, each point in the sample space is an infinite sequence of ordered 4-tuples and is interpreted as a possible sequence of combinations of trace retrievals, responses, events, and encoding responses. The second primitive notion is a probability measure P defined on B ( X ) ,the smallest Bore1 jield containing the field of cylinder sets of X. The existence of such a measure ensures the possibility of defining the probabilities of all of the events that will be of interest since each of them can be expressed as a cylinder set, a union of cylinder sets which is again a cylinder set, or an intersection of cylinder sets which is also a cylinder set. The third primitive notion is the real number p in the closed interval [0,1]. It is interpreted as the probability that the subject encodes the stimulus event exposed to him by making response ai if Ei occurs (i= 1,2), the consequence of which, as indicated before, is the entering of the trace tic2)into the short-term store. The fourth and fifth primitive notions are the real numbers d, and d,, both in the closed interval [0,1]. The number d , is interpreted as the conditional probability that the null trace enters the store during the response-cue interval of a trial given that the failure-to-encode response a,, occurs on that trial. The number d, is interpreted as the conditional probability that given the occurrence of an a,- or a,-encoding response on a trial, a null trace enters the store during the interval between the occurrence of that encoding response and the start of trace retrieval on the next trial. The assumption of two different values, d , and d,, rather than a single value d, is prompted not only by the desire for generality, but also by at least two psychological considerations. If the entry of a null trace into the store is essentially a random event which can occur with roughly the same probability in any small interval of time, then the waiting time for the entry of a null trace will be approximately exponentially dis-
Richard S. Bogartz
314
tributed and we can, therefore, expect that d, will be greater than d, since the response-cue interval will always be greater than the interval between an encoding response and the next cue. On the other hand, it may well be that the microevents combining to result in an encoding failure (ao)?and in particular those leading to a lack of attending to the stimulus event, may in part be the same microevents that lead to entry of a null trace. For example, the attending to an irrelevant event in the experimental environment when the stimulus event is presented would be expected to result in an a. and might also result in the encoding of that event as a null trace. This would again suggest the expectation that d, be greater than d,. The numbers p7d,, and d, are here taken to be fixed constants, and the experimental situations appropriate to determining whether or not a set of data corresponds to the model to be defined should exclude manipulations that render such an assumption untenable. In later sections, the problem of variation in these values from trial to trial will be considered in several ways. Some additional notation must now be introduced before the axioms can be stated. Let BC = A u E u a u S. Then, for any b, E BC, bc,nis the cylinder set containing all points in X in which the nth-trial outcome contains a b,. For example, Ai,nis the cylinder set containing all points in X in which the nth-trial outcome contains an Ai ;t(PA is the cylinder set of points in X containing a t in the nth-trial outcome. The last piece of additional notation is Cn which denotes an n-cylinder set of X, i.e., a cylinder set defined in terms of at most the first n trial outcomes. The axioms can now be given as part of the following
DEFINITION. An ordered 5-tuple (X, P,p,d,,d,) is a model A for single alternation prediction if, and only if, X is the sample space, P is a probability measure on B(X),p, d,,and d, are each in the closed interval [0,1], and the following three axioms are satisfied. ENCODING AXIOME l . For every Cn-, in X, every h and i (h,i = 1,2), every f (f = 0 , 1 , 2 ) , every g ( g = 1 , 2 iff # 0 ;g = 1 iff = O ) , and every n ( n = 1 , 2 , ...)>
P(ao,nlCn-l n
n E,t,n n Ai,n) = P(a0,n)= 1 - p ,
and
MEMORYAXIOMM1. For every Cn-, in X, every f, k, and r ( f , k , r = 0 , 1 , 2 ) , every g ( g = 1 , 2 iff # 0 ; g = 1 if f = O ) , every h, i, a n d j ( h , i , j = 1)2),and every n (n = 1 , 2 , .. .),
Short-Term Memory in Binary Prediction by Children
( 1 -d, d2 and is 0 otherwise. =
if
k=
{
0
315
an(
RESPONSEAXIOMR1. For every Cn-, in X , every i (i = 1,2) every f g = 1 i f f = O ) , and every n
( f = 0,1,2), every g ( g = 1 , 2 i f f f 0 ; ( n = 1,2,...),
P(Ai.nlCn-1
t:fL)
= P(Ai,nIt$fi!)
if f = l o r 2
and
i#g
otherwise. The first part of Axiom E l states that the probability that the subject fails to encode the event presented on a given trial is independent of the occurrences on all previous trials, the trace retrieved on that trial, the response made on that trial, the event presented, and the trial number, and is equal to the constant 1 - p on each trial. The second part makes the same type of statement about the occurrences of a, and a,. The subject makes an encoding response on a given trial with probability p , independent of the trial number, all previous trial outcomes, the trace retrieved, and the response made on that given trial, thereby encoding the event that occurred. The Axiom M1 gives first an independence-of-path statement similar to that in Axiom E l , indicating that the probability of retrieving any particular trace on a given trial depends a t most on the response, the event, and the encoding response of the previous trial. It then spells out the probabilistic dependence. I n words, it states that if an a,, occurs, the response trace is held with probability 1 - d, and replaced by the null trace with probability d , , but if an a , or a2 occurs, its trace is held with probability 1 - d, and replaced by a null trace with probability d,. The Axiom R1 also gives both an independence-of-path assumption requiring the probability of a response on any trial to depend at most on the trace retrieved on that trial and a specification of the dependence such that the trace of one response is always transformed into the other response, the trace of an event is always transformed into prediction of the other event, and a null trace results in random, equiprobable selection of one of the two responses.
Richard S. Bogartz
316
The axioms are summarized in Fig. 3 by the familiar branching process or tree diagram method of representation (see, e.g., Atkinson, Bower, & Crothers, 1965).A path through the process is taken by following the arrows. The probability of that path being taken is the product of the probabilities on the branches that were traversed in following the TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODING RESPONSE
TRIAL n+l MEMORY TRACE
TRIAL n+l RESPONSE EVENT PAIR
Ai,n+i Ek#h,n+i
Ai,n
h,n
A j # i,n+iEk#h,n+i
(2) th,n+i
1
E Ajth,n+i k#h,n+i
FIG. 3. A tree diagram summarizing the axioms and showing the general transition from a response-event pair on trial n to a response-event pair on trial n + 1.
arrows. For example, the probability of going from Ai,,Eh,, to Ai,n+lEkth,n+l via it0,, and t:Pi+, is (1 - p ) d , (1/2). The probability of arriving a t some particular response-event pair on trial n + 1 given a start from some particular response-event pair on trial n is the sum of the probabilities of all of the possible paths that could be taken to the former pair from the latter.
Short-Term Memory in Binary Prediction by Children
317
B. A MARKOVPROPERTY FOR MODELA We now prove a theorem which will underlie the methods of parameter estimation and goodness-of-fit evaluation to be developed and used in the remainder of this section. 1. The sequence of correct predictions and errors is a twoTHEOREM state 3rd-order Markov chain with stationary transition probabilities. The proof will be in two parts. I n part one we obtain a four-state first-order Markov chain with stationary transition probabilities in which the states are the four possible joint outcomes of one response and one event. Then, in part two, we note that the transition matrix of the four-state chain satisfies a criterion of lumpability (Kemeny & Snell, 1960, pp. 123-24) such that the pair of states corresponding to correct responses and the pair of states corresponding t o errors can be lumped to obtain the desired result.
Proof. We begin by noting that since the sequence of events alternates, we have for every f , g, h, i, j,k, n, and Cnp1, P(Ag,n+l n Eh,n+llAi,n n E j , n n tYi n Cn-1) _ ( P ( A g , n + , l A i , n n E j , n n t ~ ~ i n C n - , ) if h + j (1) 0 if h = j . Therefore, we can proceed by considering only the case h #j. Using Axioms E l and M1 and the usual methods for manipulating probabilities, and letting D, = Ai,nn Ej,nn t$fLn C n - , , P(Ag,n+ilDn)= P(Ag,n+llDn ao,n)P(ao,nIDn) + P(Ag,n+1IDnn a j , n ) p(aj,nIDn) = (1 -p)[P(Ag,n+1lDn n a0.n n t',P'n+,)P(t:I)!+11Dn n a0.n) + P(Ag,n+ilDn n a0.n n ti,li+~)P(ti,li+~lDn n ao,n)I +p[P(Ag,n+,IDn n a j , n n t'lPA+l ) P(t',P'n+IIDn n aj,n) + P(Ag,n+llDn n a j , n n t$fi+1)P(t$f)n+llDnn aj,n)I = ( 1 -p)[P(Ag,n+lItlP'n+l)dl P(Ag,n+1ltI,'A+1)(1-dl)I +~[~(Ag,.+lIt:O',+,)d,+ P(Ag,.+,ItS.f),+1)(1-d*)I* (2) Now, using this expression3 we can apply Axiom R1 and the fact that the sequence independently alternates to obtain Matrix (3)of conditional probabilities which hold for every tifL,n, and Cn-,, and in which a = p ( 1- d,) is the probability of retrieving the event trace, /3 = ( 1 - p ) (1-d, ) is the probability of retrieving the response trace, and y = 1 - a - /3 = pd, ( 1 - p )d , is the probability of retrieving the null trace.
+
+
The expression P(A1B)is not defined if P ( B ) = 0. Since p , d,, and d 2 can be a t the end points of [0,1],some of the terms in this expression may be undefined. The expression will remain correct if we define P(A1B)= 0 if P(B) = 0 in each such instance.
318
Richard S. Bogartz h
c m
hl
0
0
0
0
.. x +
8
x I
r3
m
i
hl
Qz
0
0
:
m
m
?.
0.1
I
x
r3
+
8
..
Qz
+
el ?.
. . x
x .. + ?-
0
0
x
c
c
"
I
c
4-
r
8
e
I
-
c
c
c
u" u"
I
c
- - r r w w wci w" c "
c
I
4- 8
-
-
c
c
u" u '
Short-Term Memory in Binary Prediction by Children
319
Since the conditional probabilities in Matrix (3) are the same for the intersection of every cylinder set defined in terms of the first n - 1 trial outcomes with each n Ej,n, the sequence of response-event combinations is by definition a first-order Markov chain with Matrix (3) as its transition matrix. Since each entry in this matrix is independent of n, the chain has stationary transition probabilities. This completes the first part of the proof. To show that the sequence of correct predictions and errors is a first-order Markov chain with stationary transition probabilities, we first introduce a theorem given by Kemeny and Snell (1960, p. 124) as the following
LEMMA. Let Z = {Z,,Z,, . . ., Z,} be the states of a Markov chain and let Y = {YI , Y,, . . ., Y,} be a partition of these states. Then a necessary and suficient condition for the chain whose states are Z to be lumpable with respect to the partition Y is that for every pair of sets Y iand Y j ,PkY,(the probability of a transition from the state Z, to the set of states Yj) have the same values for every Z, in Y,. These common values Pijform the transition matrix of the lumped chain. Now, using the notation AiEj to denote the state of the four-state chain in which the response A, and the event Ej jointly occur, we define the partition c = {A, El n A,E2} and e = {A,E2 n A,E,}. Rearranging Matrix (3) then, we have
Inspection of Matrix (4)shows that the response event chain is lumpable with respect to the partition into correct predictions and errors, such that the transition matrix for the sequence of c’s and e’s is c
e
Richard S. Bogartz
320
C. SOME THEOREMS FOR A FINITE SEQUENCE FROM A FIRST-ORDER MARKOVCHAIN HAVING MATRIX( 5 ) AS ITSTRANSITION MATRIX4 The analyses of the experiments to be presented in the next section will be in terms of statistics obtained from the data when the data have been coded as a sequence of correct responses and errors. The fact that the sequence of correct responses and errors is theoretically a Markov chain permits the derivation of the expected values of those statistics in terms of the parameters a , p, y of Matrix ( 5 ) .Substitution of the maximum likelihood estimates of these parameters (to be considered in Section II,D)for the parameters in the derived expressions permits a comparison of the observed values of the statistics with estimates of their expected values. To the extent that there is structural correspondence between the model and the data, there should be proximity of the observed values to these estimates of their expected values. Some theorems giving such expected values will now be proved. THEOREM 2. The probability of a correct prediction on trial n is P(c,)= P(c,) - dp”-l, where P(c,)= (a y / 2 ) (I - P)-l and d = P(c,) - P ( c ~ ) . (6) Proof. The proof follows immediately by solution of the difference equation P ( C , + l ) = ( 1 - Y P ) P ( C , ) + ( a + y m r 1 - P(cn)l (7) which is implied by Matrix ( 5 ) ,together with the fact that a + p + y = 1. The following corollary of Theorem 2 is used in presenting graphic comparisons of observed and predicted performance curves plotted as averages over blocks of trials.
+
COROLLARY. The expected proportion of correct responses in the Kth block of AT trials is
P,,=
KM
M-’
2
P(c,,)
n d K - 1 )M+I
= P(c,)- d(,B(K-l)M - ,BKM) &I-’( 1-
(8)
The next theorem gives the expected relative frequencies of 3-tuples in an N-trial experiment. The 3-tuples are counted using a sliding observation method in which the first 3-tuple is the sequence of c’s and e’s on trials one through three, the second is the sequence on trials two through four, then three through five, and so on through the final %tuple on trials For additional results in a general notation for a finite sequence from a firstorder Markov chain with stationary transition probabilities, the reader is referred to a paper developed as part of the present research (Bogartz, 196613).
Short-Term Memory in Binary Prediction by Children
321
N
- 2 through N . The counts for each of the eight kinds can then be converted into relative frequencies by dividing by the total number of observations which is of course N - 2 .
THEOREM 3 . The expected proportion of the eight possible %tuples of correct responses and errors in N trials are
Proof. Let y, denote the generic occurrence on trial n ; that is, let it stand for c, or en as the instance at hand requires. The general 3-tuple beginning on trial n , then, is y , ~ , +Y,+~. ~ The proportion of occurrences of this 3-tuple in N trials is
N-2
=(
N - 2I-I
C
n= I
P ( Y n ) P ( Y n + l IYn) P(yn+2lyn+l)
(10)
by the Markovian property. For a given 3-tuple, the two conditional
probabilities in the last line of Eq. (10) are constants K , and K, which are independent of n and can be read from the transition Matrix (5). Taking these constants outside the summation sign gives N-2
B(YYY) = ( N - 2I-l Kl K2
c
P(Yfl)t
(11)
and the remainder of the proof follows immediately by performing the sum for y, = c, and y,, = en. A run of errors in a protocol of c’s and e’s begins on each trial on which an e which is not preceded by an e occurs and ends on the first trial
Richard S. Bogartz
322
thereafter on which an e occurs that is not followed by an e (unless an e does not occur on the very next trial, in which case the run ends on the trial on which it begins). The length of a run is one greater than the value obtained by subtracting the number of the trial on which the run begins from the number of the trial on which it ends. Thus, for a given &trial protocol the number of runs of various lengths can be tabulated, and the sum of these numbers over all possible lengths (i.e., lengths less than or equal to N ) will give the total number of observed runs. The next two theorems give the expected values of the total number of runs and the number of runs of length jin a sequence of N trials. 4. The expected value of R N ,the total number of runs of errors THEOREM in N trials, is E(RN)
+ r12)+ B + y/21 P(em)
=
-
where
d(1 -/?)-'[a
+ y / 2 + ( p + y/2)p"-"
-
B I,
(12)
P(e,) = 1 - P(cm)= ( y / 2 ) ( 1- &'. Proof. Bush (1959) has shown that if x, is a random variable that takes the value 1 if an error occurs on trial n and 0 if a correct response occurs, then the total number of runs of errors in N trials is the random variable
Since x, and X,X,+~ are both binomially distributed random variables that take the value 1 or 0, the expectation of each is the probability that it takes the value 1. It follows from this and the definition of the expected value operator that
Short-Term Memory in Binary Prediction by Children
323
Substituting the right side of Eq. (6)for P(c,), performing the summations, and rearranging terms gives the right side of Eq. (12). Q.E.D. THEOREM 5 . The expected value of r j , N ,the number of runs of errors of length j i n N trials, i s
+
E(rj.N)= P e e j - ' { ( Y / 2 ) ( 1 -PI-'" -j 1 - 2(N -j)Pee + ( N - j - 1)pee2I - d(1 - p)-l [I - p " j + l - 2 P e e ( 1 - P N - j )
+ Pee2(1
- PN-j-')I}9
(15)
in which p e e ,the probability of a transition from an error to an error, is read from Matrix ( 5 )t o be /3 y/2.
+
Proof. Bush (1959) also showed that
where
The proof follows immediately, using Lemma 1 in the paper by Bogartz (1966b, p. 385) which in the present notation asserts that
E(uj,N ) = (P + ~ / 2 ) ' - ' [ ( N -j + 1)P(em) - d(1 - p N - j + l ) ( l - is)-]].
(17)
The following observations may provide some insight into Eqs. (12) and (15). They also suggest a rough check for such expressions. For large N , Eqs. (12) and (15) are dominated by N(a
+ ~ / 2P(em) )
and
NP(eco)(l -pee)2peej-1>
respectively. The term N ( a + y / 2 )P(e,) can be interpreted as the proportion of the N trials on which an error occurs and is followed by a correct response. Thus, it approximates the number of terminations of a run of errors and is therefore an approximation to the number of runs. Similarly, if it is noted that 1 - p e e = p e c and that p , , P(e,) = p , , P(cm), then NP(e,)( 1 - pee)2peej-1can be rewritten as NP(cm)pcepeej-lpecand interpreted as the proportion of trials in which first a correct response occurs, then an error followed by j - 1 more errors, and then a correct response occurs, which in turn is an approximation to the number of runs of errors of length j. We now define the statistic C,. Let wnbe a random variable that takes the value 1 if a correct response occurs on trial n and 0 if an error occurs.
Richard S. Bogartz
324
Then, for an N trial experiment, N-k ck=
2 wnwn+k.
n= I
A careful notation would require a subscript N on c k , say Ck,N, but in the present work the number of trials will always be clear. THEOREM 6. The expected value of c k , the total number of joint occurrences of correct responses E trials apart, is E(Ck)
=
[(l- Pk)p(cm) f Pkl[(N - d(1 - ,6N-k)( 1-
-
k, p(cm) (19)
Proof. N-k
N-k
2 wnwn+k= n=l
n= I
E(Ck)=E
2
p(wnwn+kz
l)
N-k
=
2 P(wn = 1)P ( W n + ,
=
l ] w n= 1)
n= 1
It is well known that P(cncklcn)is just the entry in the first row, first column ofthe matrix that is the kth power of Matrix ( 5 ) .Let r, denote that value. Then, obviously, r l = 1 - y/2, and rk= ( l - y / 2 ) r k - 1
+ ( a f y / 2 ) ( 1-rk-l).
(21)
Solving Eq. (21) we obtain rk = (1 - Pk)P(c,)
+ Pk.
(22)
Since this term is independent of n,it is taken outside the summation on the right of Eq. (20) and the proof is completed by summing the probabilities of a correct response over the first N - k trials.
D . ESTIMATION Within the context of the experimental operations and measurement procedures that have been used in the research to be reported in this chapter, it is not possible to estimate p , d,, and d,. Maximum likelihood estimates of a, and y which are functions ofp, d , , and d, can be obtained, permitting the establishment of rough bounds for p , d,, and d,, but the bounded intervals are quite broad and relatively useless. Some suggestions will be given elsewhere for an approach to this problem. This section, then, will be limited to the estimation of initial probabilities and a,8, and y. The maximum likelihood method appropriate to estimation of
P,
Short-Term Memory in Binary Prediction by Children
325
a,8,and y is rarely seen in the psychological literature and will, therefore,
be considered in detail. Until otherwise indicated, the following remarks are appropriate to estimation for individual subjects. Before finding estimates of cc, p, and y , we dispose of the problem of estimating the initial probability of a correct response, P(c,). Assuming that the initial probability of a correct response is independent of a , p, and y , as will be assumed throughout the following discussion, let w, be a random variable that takes the value 1 if a correct response occurs on trial n and 0 if an error occurs on that trial. Then, t o estimate P ( c , ) we maximize the likelihood function
L = P(C,)”l [l - P(c,)]I-W1
(23)
by maximizing the log-likelihood function. Setting
a In L/a P ( c l )= 0 , we
immediately obtain P ( c l )= w L .
P
We next note that we can expect to limit our task to estimating and y since u = 1 - /?- y . Let the transition count ni be the observed number of one-trial transitions from state i to state j (i,j= c,e), and let ni.= 2 nij,n..= 2 ni..It is well known that the n i jare a set of sufficient j
i
statistics for the estimation of the parameters determining the transition probabilities in Matrix ( 5 ) (sufficient statistics may be thought of as exhausting the information in the protocol that is relevant to the parameters being estimated). Eliminating u from Eq. (4)t o obtain c e
We can then obtain the log-likelihood of the observed set of transition counts, In L = nccIn (1 - y / 2 ) + rice In ( ~ 1 2 ) (25) +nec In ( 1 - P - ~ / 2+) n e e In +~ / 2 ) .
(B
Ordinarily, a t this point we would take derivatives with respect to
6 and y and, setting them equal to 0 , obtain
Richard S. Bogartz
326
the solution to which would be the maximum likelihood estimates
as we would expect from inspection of Eq. (24). This method is not completely adequate, however, in that while it does find the estimates of /3 and y that maximize Eq. (25), it fails to properly constrain these estimates. The parameters /3 and y are probabilities constrained by the theory such that 0 < f l < 1 , 0 6 y 6 1, and 0 < /3 + y < 1. But no such constraints are incorporated in arriving at Eq. (27). This is revealed by inspection of Eq. (27) which shows that the estimate of y could be greater than one and the estimate of /3 could be negative. What is required is a method of maximization of Eq. (25) that incorporates the theoretical restraints. Such a method has been used to obtain the values given in Table 111. Table I11 gives the theoretically constrained maximum likelihood estimates of a, /3, and y for each of the four possible conjunctions of inequalities that can exist among the observed relative frequencies of
TABLE I11 THEORETICALLY CONSTRAINED MAXIMUMLIKELIHOODESTIMATES 8, ,& AND FOR THE VARIOUSPOSSIBLE OBSERVEDRELAT~ONSHIPS AMONG THE TRANSITION COUNTS Constrained maximum likelihood estimates Case
Observed relations"
I
FCe > Fee and Pce
11
B
6
> fjec
fjce > Fee and
.i.
Try both row I1 and row I11 of this table. Choose the solution giving the greater value for In L in Eq. ( 2 5 ) . n,, .-
+ n,,
- rice - nee
0
2(nce
Fce
111
G fjec
ncc + n e e - nce
P,, < pee and
n..
0 Fce
IV
a p- i j
- %,
2(nce
+ net)
n..
Fec
Pc, G Fee and Fce
+ nee)
n..
n..
Pec - Fce
Pec
= ni,ni.-l
(i,j
= e,
c).
Pee
- Fce
2 F C C
Short-Term Memory in Binary Prediction by Children
327
the various transitions. Table IV provides examples which correspond, row by row, to the four cases covered in Table III. Consider row IS of Table IV. The transition count matrix indicates that 40 correct responses followed correct responses, 60 errors followed correct responses, 60 correct responses followed errors, and 30 errors followed errors. Converting to relative frequencies or conditional probability estimates by dividing each count by its associated row total, i.e., 401100, 60ll00, 60190, and 30190, we obtain the matrix of relative frequencies. Inspection of these relative frequencies reveals that they do stand in the relations indicated in row I1 of Table 111. The estimates of a , /3, and y in Table IV are then obtained using the corresponding formulas in Table III. A similar discussion applies to the other three rows of the two tables. The basis for the method used to obtain the entries in Table I11 can be found in an article by Brunk (1958). The method consists of a series of
TABLE IV
EXAMPLES FOR THE VARIOUS Transition counts Case
Relative frequencies
C
e
TABLE111
Parameter estimates ___.
a
(ni,nlC1)
(nij)
I
CASES CONSIDERED IN
C
e
.500
.500
C
e
B
P
Rows I1 and,111 both give in this case 6 = /3 = 0, 9 = 1 since for row I1 6 < 0, and for row 111, < 0, therefore use either row to obtain 9 = 1
fi
e
I1
C
e
c
1
e
111
C
IV
C
c
e
e
e
1
.667
.333
C
e
C
.557
.052
0
.948
0
.304
.696
,157
.043
.800
e
.443
Richard S. Bogartz
328
steps. I n Step 1, the unconstrained maximum likelihood estimates of the parameters are found by the usual methods [Eq. (27)1. If they lie in the region of constraint, they are the appropriate values (see row Iv, Table 111),and no further steps are taken. If one or more lie outside the region, one is chosen and it is brought to the boundary of the region (e.g., a negative value of /3 becomes 0 ) and the estimates are obtained again, given the assumption that the maximum likelihood estimate for the chosen parameter is the value a t the boundary. If all the remaining parameter estimates now are in the permissible region, the maximum likelihood estimates are now in hand and the procedure stops. Otherwise, it continues by repetition of Step 1 using one of the parameters still remaining outside the theoretically permissible region. I n referring to the article by Brunk (1958) in the present context, the reader may wish to have available the fact that the maximization problem of interest here can be viewed as the minimization of a convex function on the intersection of closed, convex sets. First note that maximization of the expression in Eq. ( 2 5 ) ,which will be denoted by f (P, y ) , is equivalent t o minimization of -f(/3,y). The domain of f ( , 8 , y ) is the parallelogram with sides y = 0, /3 = 1 - y/2, y = 2, and /3 = y/2. The theoretical constraints limit f @,y ) t o the triangular domain with boundaries /3 = 0 , y = 0, and /3 y = 1. Given that a set S is convex if for every x and y in S and every 0 f 0 < 1, 0%+ ( 1 - O)y is also in S (geometrically, if two points are in the set, then any point on the line joining those two points is also in the set), it is obvious that both domains are convex and that the triangular domain is the intersection of the two. It is a simple matter t o show that -f (,8,y) is a convex function if we note that ( 1 ) a sufficient condition for convexity is that the Hessian of -f(,B,y), i.e., the matrix of second partial derivatives with respect to ,B and y , be positive semidefinite, and (2) a sufficient condition for positive semidefiniteness of a matrix is given by the Hadamard-Gershgorin theorem which states that a matrix is positive semidefinite if for every i,
+
where qij is the entry in the ith row and the j t h column of the matrix (Wolfe, 1965). It follows immediately that Brunk’s (1958, p. 438) theorem (Theorem 2.1) applies here, providing a justification for Table 111. We may also note in passing that the convexity of -f(,B, y ) implies the concavity off (,8,y), and it is well known that any local maximum of a concave function over a given domain is in fact its global maximum over that domain. Thus, the value of the likelihood function the logarithm of which is given in Eq. ( 2 5 ) attains its global maximum (rather than just
Short-Term Memory in Binary Prediction by Children
329
a local maximum as may occur in the solution of likelihood equations) over the parameter space with insertion of the appropriate value from Table 111. No new difficulties arise in the estimation of parameter values for a group of subjects assumed to be homogeneous in their parameter values. The maximum likelihood estimate of the initial probability of a correct response is just the proportion of correct responses on the first trial, and the maximum likelihood estimates of u, p, and y are obtained as for an individual subject, using the transition count over all subjects in conjunction with Table 111.
111. Data A. EXPERIMENT I : PREDICTION OF A SINGLEALTERNATION SEQUENCE In this section and the next one, two sets of data are analyzed from the point of view of the theory just considered. None of these data sets was collected for the purpose of testing the theory ;they had all been collected for other purposes before the theory was formulated. This comment will explain the presentation of data in Experiment I1 based on relatively small numbers of trials per subject. 1. Method The subjects in this first experiment were 20 4-and 5-year old preschool children attending classes a t the University Preschool Laboratories of the University of Iowa. Each child was taken individually to an experimental room and seated a t a table upon which the apparatus was mounted. From the child’s point of view, the apparatus was a small, metal receptacle and a microswitch mounted on the front of a large black box. Concealed within the box were 105 marbles stored sequentially such that each time a buzzer mounted within the box sounded, the next marble in sequence could be released into the receptacle by depression of the microswitch. After the marble was released, the switch was then deactivated until the next sounding of the buzzer since the circuit was arranged so that the current sources used to sound the buzzer also charged a capacitor which.was discharged across a solenoid to release the marble. The child was told that he was going to play a marble game. He was instructed t o guess the color of the next marble each time the buzzer sounded, and then to release the next marble by pressing the switch only after making his guess. He was then shown a pair of marbles and asked t o identify their colors. A randomly selected pair of the four colors black, blue, white, and yellow was assigned to each child. He was told that the pair of colors would be used in the game but that first he would
Richard S. Bogartz
330
practice with some red marbles. The first 5 marbles were red for all children and the remaining 100 marbles followed in a single alternation sequence of the two randomly selected colors. For example, a child assigned the pair blue-yellow received the sequence RRRRRBYBY . . . BY. After the five practice trials with the red marbles, the child was again shown the two colors and reminded that only those colors would be used thereafter. The 100 trials were then run without interruption. The buzzer, programmed by two cycling Hunter decade interval timers, sounded for . 3 second every 8 seconds. The experimenter removed each marble from the receptacle after approximately 1 second of exposure and placed it in an opaque container. 2. Results
A standard
x2 test
for homogeneity of transition matrices over the
20 subjects (Anderson & Goodman, 1957) gave a x2 of 154.33 on 38 df which is equivalent to a z of 8.91, is significant, and indicates rejection
-OBSERVED --ePREDICTED
.9
0
----
kO--d :-8 -. r a W
f
.61
I
I
I
I
I
2
3
4
5
BLOCKS OF 20 TRIALS
FIG. 4. Observed and predicted mean proportions of correct responses in successive 20-trial blocks (Experiment I).
of the hypothesis that the individual protocols were all sampled from the same Markov chain. Heterogeneity of this kind renders the standard xZ test for order (Order 1 versus Order 2 ; Anderson & Goodman, 1957) uninterpretable since that test presupposes sampling from a single chain and becomes subject t o selection effects which inflate the xz when heterogeneity of transition matrices exists. For the record, this x2 was 52.57 on 2 d f . A similar comment applies to the test for stationarity of the transition probabilities (Anderson & Goodman, 1957) which yielded a x2 of 212.66 on 198df, equivalent t o a z of .749. It will be seen later withother data that when homogeneity of the transition matrices is obtained, the xZ for order also falls into line. Another way t o approach the test for order
Short-Term Memory in Binary Prediction by Children
331
is to select subjects having similar parameter estimates. When this was done, the order test again gave nonsignificant x2’s. In Fig. 4, the solid curve shows the observed relative frequency of correct responses for the group of 20 subjects on the five consecutive blocks of 20 trials. A curve of this type will be referred to as an observed mean performance curve. The dashed curve in that figure gives the predicted mean performance curve.There are, in fact, two distinct methods for computing a predicted mean performance curve. The first uses the corollary to Theorem 2 with insertion of asingle set of parameter estimates inserted into Eq. (8). These are group estimates obtained under the hypothesis of homogeneity of subjects as indicated in the last paragraph of Section 11,D. The second method uses the parameter estimates for the individual subjects in conjunction with Eq. (8) to obtain a predicted mean performance curve for each individual. The average of these
OBSERVED AND PREDICTED
TABLE V RELATIVE FREQUENCIES OF THE EXFERIMENT I
3-TWLES:
3-tuples
Obs. Pred.,, Pred.,
ccc
cce
cec
cee
ecc
ece
eec
eee
.560
.096 .lo2 .114
.095 .090 .092
.042 .042 .045
.097 .I08 .115
.041 .032 .024
.042 .044 .046
.027 .030 .023
551
.541
predicted curves is then used as the predicted mean performance curve. Strictly speaking, the second method is always the appropriate one. I n practice, the two curves obtained by these two methods are almost always very close to one another. Although the amount of computational labor is greatly reduced when the first method is used, in the absence of a high speed computer the second method can be used with a desk calculator for good-sized groups of subjects without becoming overly burdensome. Similar remarks apply to each of the other types of data analysis that are presented (3-tuples, runs, and so on). Hereafter, predicted values based on group parameter estimates will be subscripted with G and predicted values obtained as the average of individual predicted values will be subscripted with AI. As it happens, the predicted,, (pred.*,) curve in Fig. 4 could be replaced by the predicted, (pred.G)curve with no discernible difference. The values for the pred., curve are identical to three decimal places with those of the pred.,, curve over the last four blocks and differ by .002 on the first block.
Richard S. Bogartz
332
The observed relative frequencies of the eight 3-tuples are shown in Table V together with the two types of predicted values obtained using Theorem 3. Although there is a consistent advantage, from the standpoint of goodness of fit, in using the average of the individual predicted values, the differences are quite small. Clearly, either method gives an adequate description of the observed values. TABLE VI OBSERVEDAND PREDICTED MEAN NUMBERS OF RUNSOF ERRORS OF LENGTHS 1 THROUGH 5 : EXPERIMENT I Run length
Obs.
2.25 2.70 3.07
9.80 9.31 9.40
Pred.,, Pred.,
1.30 .94 1.00
.45 .36 .33
.15 .16 .ll
The observed mean total number of runs of errors was 13.95, pred., was 13.96, andpred.,, was 13.60. Tables V I and V I I present the observed means and the two types of predicted means for the number of error runs of length one through five and the number of joint occurrences of correct responses k trials apart for values of k from 1 to 5. TABLE VII PREDICTED MEANNUMBERS O F JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATED BY K TRIALS FOR VALUES OF K FROM 1 THROUGH 5 : EXPERIMENT I OBSERVED AND
K
Obs. Pred.,, Pred.,
1
2
3
4
5
64.80 65.21 64.86
64.20 63.41 62.09
61.50 62.51 61.13
62.15 61.80 60.44
60.20 61.14 59.80
The existence of badness of fit a t the level of the individual subject, which is masked by averaging over a group, is a consideration to whic6 some attention should always be paid. That such is not occurring in this experiment may be seen by inspection of Fig. 5 which shows a plot of observed and predicted values for individual subjects on the measures of the total number of runs of errors and the number of errors of length
Short-Term Memory in Binary Prediction by Children
333
one, and by inspection of Fig. 6 which shows a plot of the individual observed and predicted values of C, for values of k from 1 through 5 . The raw data are presented in Table V I I I coded with a one for a correct response and a zero for an error. Discussion of this experiment is postponed until Section 111, C.
30-
denotes two points
PREDICTED
FIG.5. Plots of individual predicted versus observed total number of runs of errors (R)and runs of length one ( r l ) .The plot for r l is displaced upward by 10 units (Experiment I).
PREDICTED
FIG. 6. Plots of individual observed versus predicted values of C ( K ) , the frequency of occurrence of two correct responses K trials apart, for K = 1-5. The plot for each value of K is displaced 20(K -1) units to the right (ExperimentI).
Richard S. Bogartz
334
TABLE VIII INDIVIDUAL PROTOCOLS : EXPERIMENT I ~~
Subject
Protocol
1
10010001011001001111111111111100111111011111111111 11111010111000011100111111011101111111101111111110
2
11111111111111111111111111111111111111111111111111 11111111111110111111111111111111111111111111111111
3 4
5
6
7 8 9
10111111111111111011111111101110111110101111100011 11100111111111001011111111110111111111111111111111 00111111011111101110111111111111101111111111111111 01111011101111101111111110101110111010101111111111 01110111111010110111111110111111111111110001111111 11111111011111111111111111111111111111101111111111 11000111111110111111111111100101011111111010111100
10110111110111111111110111111111110111001111111111 01110110101010001111101110111111001011111000111110 10111110110010100010011011100000111000100011110101 01111111110111111111110101010011011111110111111101 11110001111111111000111101101110101111010101111101 11111011111111111111111011100111110111111111111111 10010111110011111111101011001111011111100110001111
10
11001100111111111000110000111100110001010001110001 01100111111100001110111000011100101111110001111101
11
11111101111111111111111011111111010111011111111111 11011111110111111111111111111111111110111111101010
12
10101000001111111011110000100110000111111101111111 11111100111111101111111111111111100011101100001111
13
01100111111111101001ll1111111111111111111111111111 11111111111111110111111111111111111111011111111111 10111111101111111111111111111010011111111110111110 11111110111111011111100111111111111111111001111111 11111111111111111110110111111111111111110101111111 11111111111111110111111111111111111111111111111111
14 15
16
11011101011110111011111111111111111110110100111111 11111111111111111111111110111111111111111111111111
17
19
11101111001111101110ll111011101110l101101111111011 11101111111111111111111010011000010001111111101110 11001111111111111111011111111101001101010111001010 01101001010111110101011001011101110101010100001101 01110110101010001111101110111111001011111000111110
20
01111111110111111111110101010011011111110111111101
18
10111110110010100010011011100000111000100011110101
11110001111111111000111101101110101111010101111101
Short-Term Memory in Binary Prediction by Children
335
B. EXPERIMENT I1 : PREDICTION OF A SINGLEALTERNATION SEQUENCE FOLLOWING PREDICTION OF A MA~KOVIAN TENDING-TO-ALTERNATE SEQUENCE I n this section, we consider the data from a group of subjects that participated in a more extensive experiment which is reported elsewhere (Bogartz, 1966a). For the present purposes, we regard this group as a replication of Experiment I. 1. Method The apparatus, procedure, and instructions were the same as in Experiment I with the following exceptions. Each of the six 5-year-old and five 4-year-old subjects from the University of Iowa Preschool Laboratories used only black and white marbles. Each child had a different
-
OBSERVED ---a PREDICTED
n .6
z
U
g
.5
-
I
I
2
I
3
I
4 BLOCKS OF 15 TRIALS
1
5
FIG. 7. Observed and predicted mean proportions of correct responses in successive 15-trial blocks (Experiment 11).
75 :25 tending-to-alternate Markov sequence of blacks and whites during the first 75 trials and then was transferred with no interruption t o a single alternation sequence for the final 75 trials. I n the 75:25 tending-to-alternate Markov sequence, Pr(b1ack on trial n + 1[white on trial n ) = Pr(white on trial n + llblack on trial n ) = .75. Including the initial five practice trials with the red marbles as in Experiment I, each child had 155 trials. 2. Results
Only the data from the final 75 trials will be considered a t this point. The x2 test for homogeneity of subjects gave a x2 of 71.91 on 20 d f , indicating significant differences between individual transition matrices. The x 2 ’ s for stationarity and order were 167.46 on 148 df ( x = 1.13) and 7.83 on 2 d f , respectively. Figure 7 displays a plot of the observed mean performance curve and the pred.,, mean performance curve plotted for blocks of 15 trials.
336
Richard S. Bogartz
The deviations of the observed points from the theoretical curve are larger than those found in Experiment I (see Fig. 4). Since each observed point is based on 165 pieces of binary data, the deviations appear quite large. We should note, however, that positive autocorrelation exists within the individual protocols. Such autocorrelation tends to perpetuate departures from the expected curve. Inspection is not a satisfactory method for evaluating the magnitude of the departures in this case. There is no theory developed for the distribution of this particular statistic. Consequently, an empirical investigation was undertaken t o obtain quantitative information concerning the expected magnitudes of departures of observed mean performance curves from pred.,, curves when the theory is correct. To perform such an investigation, we began by creating theoretical counterparts to the 11 real subjects that were actually run. Such counterparts have been referred to in other contexts as mathematical robots or stat-rats (Bush & Mosteller, 1955) and, more appropriately for this context, stat-children (Zeaman & House, 1963). Each of these 11 statchildren were caused to “behave” according to the theory for 75 trials, thereby generating a replication of the experiment. By doing this many times, it is possible t o obtain an empirical distribution of protocols from which the distribution of any statistic of interest can be obtained. As the number of replications increases, furthermore, the empirical distribution approaches the appropriate theoretical distribution. Each stat-child was assigned a true initial probability of a correct response and a matrix of true transition probabilities corresponding to Matrix ( 5 ) . The values assigned as the true values mere the maximum likelihood estimates for the real children. Thus, the first stat-child had as his true initial probability the maximum likelihood estimate of that probability for Subject 1. As his true matrix of transition probabilities he had the maximum likelihood estimates of the transition probabilities for Subject 1. The second stat-child had as his true values the values estimated for Subject 2, and so on. Each stat-child’s response protocol was then determined by using a new set of 75 random numbers sampled from the uniform distribution of numbers between zero and one. The first random number was compared with the initial probability of a correct response, the comparison resulting in the assignment of a correct response t o the first trial when the random number was less than the initial probability or the assignment of an error if it exceeded the initial probability. The remaining random numbers, indexed by the integers 2 through 75, were compared with the appropriate entry in the stat-child’s transition matrix, either with the probability of a correct given a correct if the response assigned to the previous trial was correct, or with the probability of a correct given an error when the response assigned to the
Short-Term Memory in Binary Prediction by Children
337
previous trial was an error. Each of these comparisons resulted in the assignment of a response to the trial indexed with the same integer as the random number. When the random number was less than the probability to which it was compared, a correct response was assigned. Otherwise, an error was assigned. Once the 11 stat-children were run for their 7 5 trials, a replication of the experiment was obtained. These pseudodata were then treated exactly as we would have treated genuine data. A “pred.,,” curve was obtained, an “observed” curve was “plotted,” and the deviations of the observed from the pred.,, were obtained for each of the five blocks of
I
4
OBSERVED
--4
PREDICTED
2
3
4
5
BLOCKS O F 15 TRIALS FIG. 8. Observed and predicted mean proportions of correct responses in successive 15-trial blocks, without Subject 3. The vertical lines at each block span the acceptance rcgion for the model (Experiment 11).
15 trials. Replication of this procedure 1000 times gave an empirical distribution of deviations at each of the five blocks which is an approximation to the theoretical distribution of deviations. The .025 and .975 percentage points in this empirical distribution were then used as critical values to define a critical region having aprobability of approximately .05 when the theory is correct. Approximate significance tests could then be performed on the real data by comparing the observed deviations with these critical values. The observed mean proportion for the third block of 15 trials (seeFig. 7) was the only point to fall in a critical region, the critical values for that block being .704 - .087 = .617 and .704 + .081 = .785. On the one hand, this test is conservative. It employs a .05 region at each block of trials and, therefore, an experiment-wise error rate for false rejection of the model when it is true which is closer to 1. - (.95)*= .226. On the other
Richard S . Bogartz
338
hand, Subject 3 (see Table XII) does have a highly atypical protocol and almost certainly is not described by the model discussed in Section 11. His protocol is not incompatible with the general theoretical approach taken here, however, and we will consider him again. Eliminating this one subject, recomputing the observed and pred.,, curves and the Monte Carlo-generated critical regions based on the parameter estimates TABLE IX OBSERVEDAND PREDICTED RELATIVE FREQUENCIES OF THE TUPLES : EXPERIMENT I1
Obs. Pred.,, Pred.,
ccc
cce
cec
cee
ecc
ece
eec
eee
.438 .461 .435
.098 .099 .lo7
.067 .068
.067 .063 .077
.lo6 .lo7 .110
.034 .034 .027
.067 .067 .079
.122 .lo2 .lo8
.056
for the remaining 10 subjects, the results shown in Fig. 8 were obtained. None of the departures of the observed mean proportions from the predicted values are now significant, even with the conservative test. The results of the various sequential analyses will now be given. Tables IX-XIpresent observed, pred.,,, and pred., values of the 3-tuple proportions, the mean number of runs of errors of lengths 1 through 5, TABLE X OBSERVEDAND PREDICTED MEAN NUMBER OF RUNS OF ERRORS OF LENGTHS 1 THROUGH 5 :EXPERIMENT I1 Run length
Obs. Pred.A, Pred.,
5.36 5.50 4.52
2.18 2.25 2.57
1.18 1.10 1.47
.46 .60 .84
.46 .35 .48
and the mean number of joint occurrences of correct responses K trials apart for values of K from 1 through 5. The observed mean total number of runs of errors was 10.46, pred.,, was 10.45, and pred., was 10.50. Figure 9 shows the plot of the individual observed and predicted values for the total number of runs and the runs of length one. Figure 10 shows a plot of the individual observed and predicted values of C ( K )for values
Short-Term Memory in Binary Prediction by Children
339
TABLE XI OBSERVEDAND PREDCITED MEAN NUMBERS OF JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATED BY K TRIALS FOR VALUES OF K FROM 1 THROUGH 5 : EXPERIMENT I1
K
Obs. Pred.,, Pred.,
1
2
3
4
5
39.91 41.78 40.08
36.91 38.98 35.84
35.64 37.61 33.97
35.64 36.75 32.97
33.91 36.08 32.31
of K from 1 through 5. Again it can be seen that the model is fitting well the data of most of the individual subjects. The individual protocols are given in Table XII.
C. DISCUSSION The analyses of the data collected in Experiments I and I1 indicate the capability of a model derived from the present theoretical approach to correspond well, statistically, to data generated in a single experimental situation. While this is no mean accomplishment, especially for 30 25
+denotes two points ' 0
5
10 15 20 PREDICTED
25
30
F I Q . 9. Plots of individual predicted versus observed total number of runs of errors ( R ) and runs of length one (rl):Experiment 11. The plot for r1is displaced upward by 10 units.
data from 4- and 5-year-old children, still one feels a bit uneasy, after inspecting the individual protocols, about how much information there is in many of the protocols. Subject 2 in Table VIII is of course the extreme example of a lack of information. If all the data were like his, one would need no theory or model, just a new experiment. I n short, then, while goodness of fit of the model is nice, we require something more of
Richard S . Bogartz
340
PREDICTED
FIG. 10. Plots of individual observed versus predicted values of C ( K ) , the frequency of occurrence of two correct responses K trials apart, for K = 1-5: Experiment 11. The plot for each value of K is displaced 30(K - 1 ) units t o the right.
TABLE XI1 INDIVIDUALPROTOCOLS : EXPERIMENT I1 Subject 1
2 3
4 5
6
I 8 9
10 11
Protocol 11110001110100000011111111111111111011000000011110 0001001100100111101111111 01111110111110111101101100110101000011000111000011 1110111111111110001111111
00000000111010001111000000000000000000000011001100 1000000110001110011111111 01111011101111111111111111101101111111111100001101 1111011111101111111011011 01111111111111111111011111111111111110011111111101 1111001111111111111110011 11111111111111111111101111111111111101111111111111 1111111001110111111111101 01111100010011110011111001111110111111111111101111 1111000001100011111111100 10101110101010110001110100110100111011001110000000 1000100000101110000010111 00000000010001100000010111000100110000000111111111 1100000110010101111110000 10101111111100111111111111110110100111011111~11111 1111101111111111111111111 01011111101111111111111111111100000111111111011011 1000100111100110001110111
Short-Term Memory in Binary Prediction by Children
341
the theory. Namely, the theory should be psychologically revealing in that it directs us to experiments that tend to confirm the relevance of the processes embedded in the theory to the behavior we are studying. In the next two sections, the theory is used to generate two such experiments. Some attention should be paid to the poorness of fit for Subject 3 and perhaps also Subject 9 in Experiment 11. It appears that these subjects are operating almost exclusively with response traces and null traces. I n addition, the input from the distracter does not remain constant. Instead, it seems that during portions of the trials they hold the t, well and use it. This gives the long runs of errors and the not-so-frequent long runs of corrects. At other portions of the session, the distractor gets very active. Perhaps the subjects attempt to engage E in conversation, bathroom needs arise, they begin to think about extraneous things, irrelevant objects in the experimental room attract their attention. Thus, other portions of the protocol show up quite free of long runs, looking more like chance performance (of course long runs would be expected with chance performance, but with low frequency). What is important to note here is not the badness of fit, which may simply show how unrealistic it is to suppose that all of the subjects not only operate according to the one model so far considered, but that the probabilities remain constant over trials indefinitely. Rather, the important feature of these data are the long error runs that produce the badness of fit. They indicate clearly the response-response dependencethat has been assumed to play an important role in binary prediction and for which the responsetrace mechanism was explicitly provided. Thus, while the particular model presented above probably does not apply to these two subjects, the general theoretical ideas may nevertheless be appropriate.
IV. Extension to the Effects of Intertrial Interval Duration A. THEEFFECT OF LENGTHENING THE INTERTRIAL INTERVAL
First, suppose that the subject does not attend to the event and encode it. Then, if the response trace t, is still present when the event is removed, t, will determine the next response provided that it rem.ainsin the STM store through the intertrial interval I,, (Fig. 1). On the other hand, if the event is encoded and t,, the event trace, has replaced t, (or some tothat has already replaced t,),then t, must remain in the memory through I,, in order to determine the next response. It seems reasonable, then, to assume that the longer the intertrial interval, the smaller the probability that t, or t, will determine the next response and, therefore, the greater the probability of guessing as a result of transfer of a to. Any
342
Richard S . Bogartz
monotonically decreasing decay function for traces will of course yield such a prediction. Also, any reasonable notion about extraneous information entering the memory and interfering with retrieval o f t , or t, will yield the same prediction.
B. EXPERIMENT 111: THEBOGARTZ AND PEDERSON STUDY Portions of the results of this study have been presented elsewhere (Bogartz & Pederson, 1966) with a more detailed statement of the method than will be given here. The study supports the theoretical expectations but contains certain flaws which will be discussed later. The analysis of the results in terms of the model has not been presented before. 1. Method
The Xs were 40 4- and 5-year-old children. Each was seated in front of a display panel having two circular apertures, each 1.5inches in diameter, one above the other. A blue or green light could appear in either aperture. The S was informed that he was going to play a guessing game with blue and green lights, shown a blue-green-blue-green alternation sequence, told that the sequence would always be blue-green-blue-green-.. ., and instructed to guess what the next color would be as soon as a buzzer sounded. Only the blue-green color sequence was presented throughout the experiment. Following attainment of six consecutive correct responses, S was given two stages of 40 trials each. At the close of a 2-minute interval between Stage I and Stage 11,Swasremindedof theblue-greenalternation sequence and informed of a change in spatial position of the stimuli which was to take place. Each trial consisted of a .3-second buzz, S's verbal color prediction, color onset, and color offset after 5 seconds. On each trial, E , seated behind 8,pressed the stimulus onset switch immediately upon hearing S's prediction and recorded the prediction as well as the response latency (to the nearest 1/100 of a second). The intertrial interval (ITI), the time between stimulus offset and the following buzz onset, was 3 seconds for 20 Ss and 10 seconds for the other 20 Ss. Two patterns of stimulus position were used. In Series A, position of the color alternated from aperture to aperture on each trial; in Series S, the stimulus color always appeared in the bottom aperture. Thus, an instance of Series A was blue in the bottom aperture, green in the top, blue in the bottom, green in the top ;in Series S, it was blue in the bottom, green in the bottom, blue in the bottom, green in the bottom. One half of the Ss in each IT1 condition received Series A in the 40 trials of Stage I and Series S in the 40 trials of Stage 11. The otherSs
Short-Term Memory in Binary Prediction by Children
343
received Series S in Stage I and Series A in Stage 11.The design is thus arepeated measurement two-by-two Latin square replicated at two ITI’s. 2. Results Two measures, number of errors and mean response latency were obtained in each block of 10 trials. Figure 11 shows the mean latencies and percentages of errors as a function of intertrial interval and trial blocks in each stage. A significantly greater number of errors are made with the 10-second IT1 than with the 3-second IT1 (PI,36 = 6.86), but the apparent difference on the latency measure was not significant (F,, 36 = 1.88). In view of the close correspondence of the latency curves to the error curves, the lack of significance is probably a Type-I1 error attributable to the notoriously large variances of children’s latencies. The model was fit to each of the eight sets of data (two ITI’s x 2
----
Latency
3.0
Errors
3-Second 10-Second
40
0
.’
p--4
7 2.5 77
30
-8 g
r?
E
20
20
1.5
10
E
B
+
-0 0 c
f
I .o
I
I
I
I
1
I
I
I
I
2
3
4
1
2
3
4
0
Stage 51
Stage I Blocks of 10 trials
FIG.11. Mean latency and percent errors as a function of intertrial interval and trial blocks in each stage.
stages x 2 series, A-S or S-A). Table XI11 shows the observed and predicted mean performance curves for blocks of 20 trials, and Tables XIV-XVI give the observed and predicted values for the 3-tuples, error runs, and C , statistics. Generally, the model describes the data well. Inspection of Table XI11 does suggest, however, that with the %secondITI, the mean performance curves for the S series slope somewhat more steeply than the model predicts, and that under the 10-second IT1 the same discrepancy may exist for the subjects receiving Series A in Stage I and Series S in Stage 11.
TABLE XI11 OBSERVED AND PREDICTED
MEAN PROPORTIONS O F CORRECTRESPONSES I N SUCCESSIVE 20-TRIAL BLOCKS : EXPERIMENT I11
IT1
3
Position
i;'
10
S-A
A-S
9
S-A
A-S
Stage
Block
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Prod.,
I
1 2
.950 .940
.946 .943
.946 .944
.905 .815
.864 .854
.863 .852
.925 .805
.872 .860
.868 .859
.865 .855
.854 .858
.861 .859
I1
1 2
.925 .835
.885 .877
383 .876
.925 .905
.925 .920
.918 .912
.825 .715
.776 .762
.776 .762
.875 .850
.865 .857
.866 .859
% r/l F m fi
TABLE XIV OBSERVED AND PREDICTED
IT1
RELATIVE FREQUENCIES O F THE 3-TUPLES: EXPERIMENT 111
3
F
5
10
Y
A-S
Position
S-A
A-S
8
s-A
Stage
3-tuple
Obs.
Pred.,
Prf3d.G
Obs.
PP3d.A
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Fred.,
I
ccc cce cec cee ecc ece eec eee
.839 .050 .047 .005 .050 .003 .005
.848 .046 .044 .006 .045 .004 .006 .001
.845 .049 .047 .005 .047 .003 .005
.700 .082 .061 .026 .068 .013 .024 .026
.701 .07 1 .059 .028 .068 .016 .027 .030
.685 .082 .056 .035 078 .009 .034 .021
.692 .087 .063 .024 .079 .024 .024
.704 .075 .061 .026 .072 .010 .025 .027
.690 .082 .059 .032 .079 .009 .031 .017
.642 .lo3 .097 .018 .097 .021 .018 .003
.650 .096 .090 ,023 .095 .016 .022 .008
.637 .I03 .lo2 .018 .I02 .017 .018 .003
ccc cce cec cee ecc ece eec eee
.708 .087
.723 .073 .062 .023 .071 .011 .022 .015
.710 .080 .064 .025 .077 .009 .024 .010
.795 .055 .044 .021 .052 .011
.812 .051 .045 .014 .049 .008 .014 .007
.791 .060 .047 .017 .057 .004 .017 .006
.513 .lo8 .I05 .055 .095 .055 .047 .021
.512 .I09 .096 .053 .I05 .038 .050 .037
.483 .127 .lo7 .053 .122 .032 .051 .026
.661
.664
.089
.089
,095 .021 .079 .034 .021
.083 .025 .086 .019 .024 .001
.639 .lo3 .lo1 ,018 .lo0 .016 .018 .003
I1
.ooo
.OM .021 .079 .005 .021 .013
.ooo
.018 .003
.
.008
.ooo
E $
g 9
7
(0
8 B
e y e
5
TABLE XV OBSERVED AND PREDICTED
IT1
3
1
Run length r1
TZ
r, -r, I1
rl r2 ~3
-T,
10
A-S
Position Stage
MEANNUMBER OF RUNS OF ERRORS : EXPERIMENT 111
S-A
A-S
S-A
9 !id
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
1.80 .20
1.83 .16 01
.
2.60 .60 .40
2.31 .60 .49
2.23
.oo
1.73 .19 .04
.49
2.70 .40 .50
2.38 .59 .40
2.70 .50 .30
2.45 .55 .32
2.53 .70 .26
1.70 .70 .I0
1.74 .38 .17
1.84 .49 .17
4.20 1.50 .60
3.80 1.20 .go
.84
Obs.
Pred.,,
Pred.,
2.35 .81 .42
4.10 .60 .10
3.61 .67 .20
4.07 .60
4.21 1.37 .66
3.90 .80
3.28 .70 .25
3.98 .60 .10
Pred.A Pred.,
.oo
.ll
m
3
09
e N
TABLE XVI OBSERVEDAND PREDICTED MEANNUMBERS OF JOINT OCCURRENCES OF CORRECTRESPONSES SEPARATEDBY K TRIALSFOR VALUES OF K FROM 1THROUGH 5 :EXPERIMENT 111
IT1
3
10
S-A
A-S
Position
S-A
A-S
Stage
K
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
Obs.
Pred.,
Pred.,
I
1 2 3 4 5
34.80 33.70 32.80 32.10 30.90
34.87 33.90 32.99 32.10 31.21
34.86 33.89 33.00 32.11 31.22
30.20 28.90 28.10 27.40 26.60
30.11 28.88 27.92 27.06 26.26
29.89 28.16 27.16 26.36 25.62
30.30 28.70 27.80 27.00 26.10
30.40 29.05 28.07 27.21 26.39
30.11 28.48 27.53 26.74 25.99
29.00 28.10 27.50 27.70 25.90
29.12 28.14 27.36 26.61 25.87
28.86 28.07 27.33 26.60 25.86
I1
1 2 3 4 5
30.90 29.40 28.60 27.60 26.70
31.04 29.83 28.90 28.06 27.26
30.82 29.41 28.54 27.75 26.98
33.20 31.90 30.70 30.20 29.50
33.68 32.57 31.64 30.76 29.90
33.18 31.84 30.91 30.06 29.22
24.00 23.50 22.00 22.10 20.80
24.20 23.09 22.33 21.68 21.07
23.76 22.40 21.72 21.13 20.54
29.10 28.70 27.60 26.90 25.90
28.96 28.15 27.41 26.68 25.94
29.34 28.39 27.59 26.82 26.08
348
Richard S. Bogartz
Table XI1 also suggests that performance is better under Series A than under Series S. An analysis of variance (see Bogartz & Pederson, 1966) indicated that this was in fact the case ( F , , , 0 8= 12.28). This effect may be attributable to the completely redundant position cue which may have provided another trace that could be used in conjunction with another rule. Thus, a position trace could be used with the rule: if top position last time, predict blue, if bottom, predict green. There is, however, the possibility that the alternating position of the light simply serves to maintain attention more effectively. A similar interpretive problem exists for the IT1 effect. Obviously, for S to be correct at a greater-than-chance level, some trace or other representation of either his response or the event on the previous trial must carry over to the next trial. It is reasonable to assume that the longer the ITI, the less likely is this carry-over, either as a result of greater trace decay or greater opportunity for the occurrence of responses incompatible with maintaining some representation of what happened on the previous trial. Unfortunately, IT1 is confounded with time in the experimental situation. Thus, the apparent effect of a longer IT1 may be the result of, say, a general lagging of attention which produces a greater performance decrement the longer S is in the situation. Additional information bearing on this question can be obtained by varying IT1 within 8 s instead of between Ss. That is, let IT1 vary from trial to trial, using a different random sequence for each S. Decremental effects should average out, and a cleaner picture of the effects of IT1 should be obtained. This is done in Experiment IV.
C. A MODELFOR WITHIN-SSVARIATION IN
IT1 DURATION We now introduce an experimental manipulation together with a theoretical assumption concerning the effects of that manipulation. Suppose that I,, the intertrial interval on trial n , is not held constant from trial to trial, but instead is manipulated as a “within-subjects” variable such that on each trial one of s possible values of I, is used according t o some experimenter-determined schedule of ITI’s. We will assume that the effect of the intertrial interval duration is located only in the values of d , and d,, that the effect is independent of the trial number and other experimental occurrences (response, event, and so on), and that d , and d, are monotonic increasing with increasing values of I,. Let d,, and d,, be the values of d , and d, associated with the occurrence of I,,, the kth type of IT1 occurring on trial n. Then, a, = p ( 1 - d,,), ,8, = ( 1 - p ) ( 1 - d,,), 7, = 1 - CL, - 3/, and we have the transition matrix THE
Short-Term Memory in Binary Prediction by Children
349
Given this experimental manipulation and theoretical assumption, the sequence of correct responses and errors is, in general, no longer a Markov chain with stationary transition probabilities since the transition probabilities are not necessarily constant over trials. I n view of the many analytical tools available for the treatment of such chains, it is useful to know which experimenter-determined within-subject trial-totrial variations in the intertrial interval will preserve the stationarity of the transition probabilities. A sufficient condition will be given. Letting p i j ( n )be the probability of a transition from response i on trial n to responsej on trial n + 1, and p i t k )be the conditional probability of such a transition given the occurrence I, (i.e., letting pi$,) be the general entry in Matrix (29) then T
p I.J. ( n )=
2
piIk)tik(n),
k= 1
where tik(n)is the conditional probability of I,, given response i on trial n. By definition, the sequence of correct responses and errors is a finite Markov chain with stationary transition probabilities if p ij ( n ) is independent of n ; i.e., a constant, p i j . From Eq. (30) it can be seen that this will be the case if the sequence of intervals is constructed using a probability density ti, that is independent of n . That is, if ti, is a constant. The required condition, therefore, is that the sequence of intertrial interval durations be a t most a simple contingent sequence. Putting this another way, it is required that the sequence be such that the probability of a given interval occurring on any trial depends at most upon the response made on that trial. Thus, of course, random equiprobability sequences of intervals (ti, = I/$) and noncontingent sequences (ti, = t,) also yield the desired stochastic process. A rigorous proof of the assertions in the previous paragraph is available in an article by Rouanet and Rosenberg ( 1 964), although some translation is required, in that their discussion is in the context of models for response continua. However, as they indicate, their analysis handles discrete random variables also. For translation, their response random variable x, should be defined on the response set (c,,e,), their reinforcement random variable should be defined on the set of intertrial intervals, and their “state of the organism” variable z, can be ignored or set equal to x,. Then their Theorem 2 (p. 2 2 2 ) provides the desired result.
Richard S. Bogartz
350
We note here that in Section V, we shall make use of this result again, although there in regard to differentinterpolated events occurring during the IT1 rather than to different durations of the ITI. Intepreting Iknas the occurrence of the kth type of interpolated event during the ITI, the above result continues to apply. The result is actually more general but we shall have no occasion to consider its other extensions. 1. The Distribution of 3,2-Tuples
We find here the distribution of a set of statistics similar to the 3-tuples treated in Theorem 3. Again, letting yn denote the generic response on trial n (c, or en)and i, the generic IT1 on trial n, the general inflY,,+~. 3,2-tuple beginning on trial n is yninyn+, The proportion of occurrences of this 3,2-tuple in N trials is N -2
n= 1
where J, is a value oft.. depending at most upon yn, J2is a value of t . . depending at most upon Y,,+~,K , is a value of p..(*)which is an entry in an appropriate matrix of the form of Matrix (29) determined by in, and K2 is the same as K , but from a matrix determined by in+l.Thus, for example, suppose there are only two intervals, I, and I,, and they occur at random according to a 50 :50 schedule. Then N-2
P(CI,
eI2 C)
= ( N - 2I-l
2 P(cn)(.5)(r1/2)(.5)(.,+ Y2/2).
n=l
(32)
To find P(cn)we use the fact that the sequence of correct responses and errors is a two-state Markov chain with stationary transition probabilities since we are limiting our attention to use of at most simple contingent sequences. Theorem 2 gives the value of P(cn)provided that each of the parameters CY, j3, and y is interpreted as an average over the values for
Short-Term Memory in Binary Prediction by Children
351
each ITI. Estimation of these average values, denote them by E , 8, and 7, is made using the transition counts as in Section 11, D, treating the data as if the IT1 variation did not occur. As in Eq. 9, letting
vN
N--2
=
( N - 2)-I
2 P(cn)
and
(1 -
n=l
vN)= ( N -
N-2
2)-l
2 P(en),
n= 1
we will use a quantity i;f,
= P(c,) - Z(1-
PN--2)(N
- 2)4(1-&I,
(33)
where now P(c,) = (a + 7/2)/(1 - 8) and 2 = P(c,) - P(cI).Estimation of &k, Pk, and Y k would of course be made in the usual way using transition counts appropriate to Matrix (29). Thus, for example, if there are nc.k transitions from cn n Ikn, and nCekof them are to en+,,the maximum (ignoring the problem likelihood estimate of yk would be = 2n,cek/nc.k of constraints for which the treatment is the same as in Section 11,D).
D. EXPERIMENT IV : WITHIN-SVARIATION IN THE IT1 DURATION 1. Method The Ss were 25 4-and 5-year-old preschool children. Each was taken to an experimental room and seated a t a table opposite E . On the table was a stack of 102 4 x 6-inch file cards concealed behind a small black box which could hold the entire stack. CenCered on each card was a 1.5 x 2.0-inch rectangular patch of red or green tape. The colors in the stack alternated (RGRG . . . or GRGR . ..). The S was shown the first two cards in the stack, one after the other, after being asked to name the colors on the two cards. Following correct naming of the two colors, a 6-V buzzer was sounded briefly and the child was told that each time he heard the buzzer he was to guess quickly the next color in the stack. On each of 100 trials, following each buzz, S made his prediction, E removed the top card from the stack, turned it color side up in front of S for about 1 second, and then placed it color side down in the box. The buzzer sounded for . 3 second every 8 seconds, except when the E depressed a foot switch which opened the buzzer circuit. If this occurred, the interbuzz interval was increased from 7.7 seconds to 15.7 seconds. A different random sequence of long and short intervals was used with each child; thus, approximately half the intervals were long and half short. 2 . Results
Since there are two ITI’s and they are presented according to a random equiprobability schedule (ti, = .5, all i and k), the sequence of correct responses and errors is theoretically a sample from a two-state
Richard S. Bogartz
352
Markov chain with stationary transition probabilities. Therefore, all of the model analyses appropriate to the data in the previous experiments are also appropriate in this experiment. Table XVII presents the observed and predicted mean performance curve in blocks of 20 trials, Table XVIII presents the observed and predicted relative frequencies of the 3-tuples, and Table XIX the observed and predicted error runs and C, statistics. The only suggestion of badness of fit in these tables is the perhaps greater than expected TABLE XVII MEANPERFORMANCE CURVE IN %)-TRIAL BLOCKS
Block
Obs. Pred.,, Pred.,
1
2
3
4
5
.796 .751
.788 .758 .770
.788 .758 .773
.754 .758 .770
.712 .758 .770
.758
TABLE XVIII OBSERVED AND
PREDICTED
RELATIVEFREQUENCY OB TUPLES
3-tuple
Obs. Pred.,, Pred.G
ccc
cce
cec
cee
ecc
ece
eec
eee
.530
.I07
.I03
.085 .080
.523
.I09
.076
.046 .048 .056
.lo7
529
.026 .026 .023
.046 .051 .057
.053 .055 .042
.I09 .I 11
TABLE XIX OBSERVED AND PREDICTED MEANVALUESOF r j , THE NUMBER OF ERROR RUNSOF LENGTH j,AND Ck,THE NUMBEROF JOINT OCCURRENCES OF Two CORRECT RESPONSES k TRIALSAPART rj
Pred.,
Pred.,,
rl r2 r3
7.78
8.28 2.67 1.07
3.28
r4
1.38 .58
TS
.25
.50
.26
Obs.
8.88 2.00 1.40 .68 .12
1
Ck
C, Cz C3 C, C,
Pred.,
62.90 58.99 57.56 56.76 56.11
Pred.,,
Obs.
63.17 60.25
63.00 60.32
58.82
58.12
57.87 57.11
57.96 56.84
Short-Term Memory in Binary Prediction by Children
353
downward slope to the observed performance curve. The trend suggests a possible drift in the parameter values, probably in the probability of an encoding response occurring. The drift is not large over the 100 trials and actually appears t o be nonexistent during the first 7 0 or 80 trials. It does not hamper the ability of the model to describe well the detailed structure of the data. To treat the effects of the IT1 duration we must estimate the transition probabilities for the long (1,) and short (S) ITI’s, cn+I
en+ I
1- Y L P dL
+Y L P
YLP BL
+ YLI2 (34)
The maximum likelihood estimates for the group were cn+I
en+1
cn+I
en+1
and
Using theoretical Matrix (34),we can estimate the group meanprobability of a to being passed to the generator after a long IT1 by the quantity 2pceL= 2(.266) = ,532 and the corresponding value for a short IT1 by 2pCes= 2(.074) = .148. Thus, the probability of a guess following a long IT1 is about 34 times the guessing probability following a short ITI, as was predicted. To further evaluate the ability of the model t o describe the effects of the variation in IT1 duration, the observed and predicted values of the 32 possible 3,2-tuples were obtained and are shown in Table XX. The agreement of observed with predicted is excellent.
E. DISCUSSION A remark concerning the effects of TTI duration in Experiment 111is needed. It seems likely that in the %second condition, the pacing of t h e trials is so rapid that there are attentional effects as well as memory effects. With an 8- or 10-second ITI, there is time for the child to reorient, become involved with other matters between predictions. With a 3second ITI, the child is captured by the sequence of events and prevented from disorienting. For this reason, children in the %second condition are probably attending to almost every event. We know almost nothing about the effects ofthe trial pace on children’s learning, but these results suggest that control of attention could be
TABLE XX OBSERVED AND 3,2-Tuple c C c c c c c c e e e e e e e e
s S S S S S S S S S S S S S S S
c C c c e e e e c c c c e e e e
s L S L S L S L S L S L S L S L
c C e e c c e e c c e e c c e e
PREDICTED
Obs.
Pred.,,
.156 .123 .016 .054 .009 .011 .004 .004 .025 .027 .003 .011 .011 .011 .016 .014
.166 .133 .012 .045 .009 .009 .005 .005 .030 .022 .003 .010 .011 .013 .015 .013
RELATIVE FREQUENCY
Pred., .I64 .130 .013 .047 .008 .008 .006 .006 .030 .024 .002 .009 .014 .015
.012 .011
O F THE
3,2-Tuple C C c c c c c c e e e e e e e e
L L L L L L L L L L L L L L L L
C C c c e e e e c c c c e e e e
S L S L S L S L S L S L S L S L
C C e e c c e e c c e e c c e e
3,2-TUPLES
Obs.
Pred.,,
Pred.,
.137 .114 .006 .031 .032 .033 .018 .021 .030 .025 .004 .009 .013 .011 .014 .008
.133 .lo8 .010 .034 .030 '029 .019 .020 .030 .023 .003 .009 .013 .013 .013 .013
.130 .lo3 .010 .037 .028 .030 .023 .021 .032 .025 .003 .009 .013 .014 .011 .010
Short-Term Memory in Binary Prediction by Children
355
maintained by a rapid pacing. This might help us understand why young children in the laboratory seem to have attentional limitations they do not seem to have outside the laboratory (except perhaps in classrooms, which also, for the individual child, have much blank time). Even irrelevant filler tasks which do little but fill time (although consideration to their novelty effects would also have to be given) might be very helpful. (An excellent place to see this effect is a t parades where for the young child, even the uninteresting dignitaries in cars seem to help span the interclown or interfloat intervals in preference to just the blank time of the strung-out parade.) The results of Experiment IV support the general theoretical point of view taken here. The support is at the level of predicted directional experimental effects and predicted directional differences in parameter values rather than simply the goodness of fit of a model with parameters free for estimation. The results justify the introduction of a second memory axiom giving a formal statement of the assumption concerning the effects of the IT1 duration; however, that axiom will not be given here. Nor will any of the other additional assumptions to be presented later be given the same formal treatment as was presented in Section 11, although there are no serious problems that would prevent this. There is another interesting piece of evidence which should be mentioned at this point. I n the original design of Experiment IV, it was planned that for half of the subjects the event card would not be exposed for only 1 second, but instead would remain exposeduntil after the subject made his next prediction and it was covered by the next event card. Thus, information as to what event occurred on the previous trial would always be available to the child. The child is thereby provided with a distraction-free memory which is not subject t o the effects of his responses provided that he looks a t the exposed card when he receives the cue to make his next prediction. Six additional subjects were randomly assigned to this condition at the beginning of Experiment IV. The condition was then terminated after these six because the subjects were making so few errors that it was felt more information would be obtained if the limited number of subjects were used in the 1-second exposure of the card condition. The largest number of errors made by any of the six was four in the 100 trials. The mean error probability for the six subjects was .023 as compared with a mean error probability of 2 3 2 for the 25 subjects in the 1-second exposure condition. Thus performance in the prediction task can be raised from a level of less than 80% correct to practically perfect performance by supplementing the child’s distractible memory with a distraction-free memory that is not subject t o effects of the child’s own previous response.
356
Richard S. Bogartz
V. Extension to Interpolated Events A. INTERFERENCE EFFECTS Experiment IV demonstrated that performance dependent upon the presence of the response and event traces is degraded by the lengthening of a temporal interval during which one of those traces must be held in memory. This does not necessitate a “fading-trace” hypothesis since the possibility of distracting events causing replacement of the trace in the memory also exists. The experiment to be considered in Section V,C will show that such interference effects can be produced experimentally. This will not rule out the possibility of trace decay occurring, but will a t least demonstrate occurrence and manipulability of interference effects in the alternation prediction task. A natural extension of the model based on the theoretical ideas presented above will be used to demonstrate the power of the present theoretical position t o describe the effects of this experimental manipulation. The memory store subsystem that has been used above is a one-slot memory. That is, only one trace is stored a t a time. The theoretical meaning of interference is the displacement of one trace by another. Storage of an event trace, for example, must interfere with use of the response trace since if the t, is stored, the t, is displaced with probability 1.0. Likewise, entry of a tointo the memory during the IT1 displaces the t, and thereby produces complete interference with it. We wish now to consider experimental production of interference effects by attempted manipulation of the contents of the memory store. The manipulation to be considered is one in which the subject is required to respond to some stimulus during the ITI. The presumed probabilistic consequences of this manipulation is the entry into the memory store of a trace ti produced by the encoding of the interpolated event. We shall be specifically interested in an overt naming response that the subject makes to a stimulus presented during the ITI. We shall require a categorization of the types of stimuli that can be presented and the types of ti’sthat can be the consequences of naming such stimuli. Remembering that we are now still within the context of the alternation prediction task, it will be convenient to introduce a twofold categorization of the possible interpolation events. To do this we first note that the generator determines a partition of the set of possible traces. Recalling that A, and A, denote the two possible PR’s, abd t,“) are the traces and t,(,) are the traces produced, respectively, by A, and A,, tIC2) produced by encoding of E, and E,, respectively, and t , ( O ) is the null trace, we now introduce t, = {to(,),t,(,),t2(,)}, the set of possible traces produced by naming the interpolated event. We see that the generator , ti(,),t,(,),to(,),tl(,),t2(,)} into the rules partition the set { t l ( 0 ) , t l ( l ) t,(’),
Short-Term Memory in Binary Prediction by Children
357
sets t,={t,(l), t,',), t,(3)},t, = {t,(I), t,(Z),t2(3)}, and to = {t,(O), The set t I is the set of traces each of which results in the PR A, ; the elements oft, all result in the PR A, ; and the elements of toresult in aguess. We can now introduce the twofold categorization. An interpolated event is first categorized as relevant or irrelevant. Any interpolated event the naming of which can result in entry of a in the memory store is an irrelevant (I)event. An event that can is a relevant event. Relevant events result in entry of a tlC3)or a t2(3) will be categorized as either same (S) or different (D) according to the following. On any given trial, a relevant interpolated event is an S event if its if the prediction made on that trial encoding would enter the trace ti(3), resulted in entry of t j ( l ) ,and if i =j . Thus, for any k(k = 1,2),if the event E t,. Encoding the interpolated is an S event, then tj(l)E t, implies ti(3) event enters into the memory a trace that will cause the generator to produce the same prediction that would have been produced if the response trace on that trial had been transferred to the generator. A relevant event that is not an S event is a D event. Thus, the trace entered by encoding a D event will cause the generator to produce the prediction Ai such that, if Aj is the prediction that would have been produced if the response trace on that trial had been transferred to the generator, then i # j . To summarize : Encoding of irrelevant events enters the equivalent of a null trace into the memory; encoding of an S event enters the equivalent of the response trace entered on that trial; encoding of a D event enters a trace equivalent to that which would have been entered if the response that did not occur on that trial had occurred.
B. A MODELFOR THE EFFECTS OF INTERPOLATED EVENTS The model to be considered here is for the case of noncontingent interpolations, i.e., where the probabilities of S, D, and I events remain constant from trial to trial. It will be assumed that on each trial the probabilityof the interpolated event being encoded and transferred to the generator is a constant 6. Thus, u,/3, y , and 6 are the probabilities that t,, t,, to,and ti, respectively, are transferred to the generator on a given trial. Since these are mutually exclusive, exhaustive events, + /3 + y 6 = 1.0.
+
The transition matrix
Richard S . Bogartz
358
gives the transition probabilities when an S event occurs during the IT1 between trial n and trial n + 1. Since the effect of encoding an S event is the same as retaining the response trace, we can arrive at Matrix (36) by simply adding, in each row, a 6 to the column containing the term /I in the matrix appropriate to alternation prediction without interpolations, Matrix ( 5 ) .Similarly, for D events we add, in Matrix ( 5 ) ,6 in each row to the column that does not contain 8, since the effect of encoding a D event is opposite to that of retaining the response trace. This gives
C,nD e,nD
(37) a+S+y/2
P+y/2.
Finally, for I events, encoding of which has the same effect as entry of a null trace, we simply replace y by 6 y everywhere in Matrix ( 5 ) , giving
+
en+ I
en+I
c, n I
en n I Since the sequence of interpolated events is noncontingent, the result of Section IV,C applies, giving that the sequence of correct responses and errors will be a two-state Markov chain with stationary transition probabilities. The transition probabilities will be weighted averages of the entries in corresponding positions of Matrices (36), (37), and (38), with the weights equal to the probabilities of S, D, and I events occurring. Thus, the data can be analyzed as if the interpolated events manipulation had not been introduced, as was done in Experiment I V with the variable IT1 duration. To test the effects of the interpolations, an approximate x2 test can be applied to test the goodness of fit of the matrix C
e
cnS cnD cnI ens enD en1
(39)
Short-Term Memory in Binary Prediction by Children
359
to the corresponding table of observed transition relative frequencies. For large n, the standard x2 statistic will be distributed as x2 with 6 - 3 = 3 df (sincethree parameters are estimated) under the hypothesis that the data are a sample from the model.
C. EXPERIMENT V : THEINTERPOLATED EVENTS EXPERIMENT 1. Method The subjects were 40 4- and 5-year-old children attending the University Preschool Laboratories of the University of Iowa. The stimuli were Animal Rummy cards (Whitman Publishing Company). Thirty-eight of the following animals were used: dog, owl, squirrel, kitty, goose, turtle, lamb, mouse, fox, bunny, and chick. For each subject, four of these animals were randomly selected as stimuli and assigned as stimuli A, B, C, and D. Randomization was restricted such that both elements of the following pairs could not be included in the same set : goose and chick ; kitty and bunny; squirrel and fox ; dog and kitty. The first three of these restrictions were because the animal pictures in those pairs were considered too similar in appearance. The fourth restriction was included because of probable association value. The child was escorted into the experimental room and seated at a table opposite the experimenter. The experimenter showed him an instance of each of the four stimulus cards, A, B, C, and D, asked him to name them, and then told him that they were the kinds of cards to be used in the game. The stimulus cards for a given subject were arranged in two decks: an alternating deck and an interpolation deck. The two decks were on the table, face down, a few inches apart and directly in front of the child. The alternating deck contained 10 A and B alternating cards. The interpolation deck contained, for each subject, a different randomly ordered sequence of 64 interpolations. The game was explained to the child as follows. “This is the naming deck (the interpolation deck was touched) and this is the guessing deck (the alternation deck was touched). The guessing deck has only A’s and B’s in it. The first one is an A (the first card was turned face up). The next one is a B (the next card was turned over). The next one is an A (the same).’’ Several alternating A’s and B’s were shown this way. The child was asked to guess a few. When the experimenter felt confident that *he child knew the rule, he tested him by asking “If I have just shown you an A from this deck, what will be next? And if I have just shown you a B, what comes next?” In all cases the children gave satisfactory answers to these questions and there was no need for more practice. Kext, the experimenter explained the interpolation deck. “This is the naming deck. It has all four kinds of cards in it (the four initial examples were picked up and fanned in front of the subject).
360
Richard S. Bogartz
When I show you a card from this deck, you just tell me what it is, and when I say guess, you guess a card from this (alternation) deck.’’ Each subject predicted an alternation sequence of 65 A and B stimulus cards. Following each card in this sequence except the last, he was shown a series of interpolation cards to be named aloud. The set of 32 interpolations following A cards was identical to the set following B cards, except for order. Of the 32, 16 were of length 1 and 16 were of length 3. Half of the interpolations of each length (8) contained relevant cards (A’s and B’s) and the other half contained irrelevant cards (C’s and D’s). Thus the length-1 interpolations in each set of 32 consisted of 4 A’s, 4 B’s, 4 C’s, and 4 D’s. The length-3 interpolations consisted of the eight possible AB triples (AAA, AAB, . . ., BBB) and the eight possible CD triples (CCC, CCD, ..., DDD). The sequence started with the child guessing the top card in the alternation deck. As soon as the child made his guess, the guessed card was turned over so he could see it and then placed face down on the bottom of the deck. The correct number of cards from the interpolation deck (one or three) was turned over one at a time and the child named each. The used cards from the interpolation deck were placed face down in back of the deck. The experimenter attempted to show the cards at a 2-second rate, however, there was much variability because of response latency variability. 2. Results
Because the S, D, and I interpolation events occur on any given trial with respective noncontingent probabilities of 114, 114, and 112, the sequence of correct responses and errors is, according to the model, a sample from the same type of stochastic process assumed to have generated the data in the previous experiments. Therefore, the usual analyses may be again performed, this time as if the interpolation procedure had not occurred. The results of these analyses are shown in Table XXI, which gives the observed and predicted values for the mean performance curve in blocks of 13 trials, the 3-tuple analysis, the error runs analysis, and the C, analysis. I t can be seen that the model again describes in detail the statistical properties of the data. The most obvious difference between these results and those for the first four experiments is the relatively low level of performance in this experiment. As the model predicts, the mean performance curve is flat throughout the 65 trials, but the probability of a correct response is now at about .66, whereas in the previous experiments the performance level was near .80 or above. This result is not unexpected because the introduction of the interpolation schedule used in this experiment should have a net effect of degrading the overall performance level.
Short-Term Memory in Binary Prediction by Children
361
Preliminary analyses of the effects of the interpolation procedure focused on the difference between the naming of one and naming of three interpolated stimulus cards. Study of the responses following different triples such as SSS versus DDD revealed no simple or obvious differences that could not be equally well accounted for by using just the last card of the triple to categorize the interpolation. The effects of the interpolaTABLE XXI STATISTICAL ANALYSES FOR EXPERIMENT V, IGNORING THE INTERPOLATION MANIPULATION ~~
~~
~~
~
Mean performance curve in 13-trial blocks Block 1 2 3 4 .667 .638 .669 Obs. .685 .666 .666 Pred.,, .677 .666 .663 .663 Pred., .673 .663
5 .665 .666 .663
Relative frequency of 3-tuples cec 3-Tuple ccc cce .337 .134 .119 Obs. .356 .126 .110 Pred.,, Pred., .332 .138 .113
ecc .131 .125 .137
cee .075 .077 .082
ece .062 .061 .057
eec .074 .076 .082
eee .068 .070 .060
Mean numbers of runs of errors of lengths 1-5 Run length Obs. Pred.,, Pred.,
1 7.95 7.25 7.43
2 2.52 2.74 3.09
3 1.30 1.15 1.29
4 .40
.52 .54
5 .28 .25 .22
Mean number of joint occurrences of correct responses separated b y K trials: K = 1-5 K 1 2 3 4 5 Obs. 30.08 28.72 27.82 27.88 27.15 Pred.,, 30.83 29.35 28.66 28.13 27.65 Pred., 30.07 28.01 27.37 26.90 26.46
tion were, therefore, analyzed as if on each interpolation of three cards, only the last card of the triple had been interpolated. Table 22 shows the observed and predicted transition probabilities corresponding to the theoretical Matrix (39). Maximum likelihood estimates of&,B, y , and 6 were found to be .286, .119, .385, and .210 for the group. Thus, the children were using te’s about 30% of the item, tr’s about loyo,guessing about 40%, and using a memory trace of the interpolated event about 20% of the time. A xZ test of the hypothesis that
Richard S. Bogartz
362
the observed transition relative frequencies are compatible with the theoretical Matrix (39) yielded a nonsignificant x2 of 4.61 on 3 df,which, considering the large N’s upon which the estimates are based, indicates a remarkable agreement of the data with the model.
D. DISCUSSION Experiment V demonstrates the susceptibility of the memory store to information intrusions produced by the encoding of the interpolated events as a result of the overt naming response. The effects of these naming responses are exactly those that were suggested during the discussion of the data of the single subject in Section I, B. The naming of it stimulus event tends t o enter a trace of that event into the memory, increasing the probability that the prediction of the next response will be based upon the informational properties of the named event in relationship t o the rules of the generator. TABLE X X I I ANALYSIS OF THE INTERPOLATION EFFECT Group predicted
Group observed
C
E
C
E
.808
.192
.SO8
597 .702 .478
.403
.607 .698 .435 .663
.192 .393 .302 .565 .337 .382
N
_ .
CandS CandD CandI EandS EandD EandI
.688
.583
.298 .522 .312 .417
.618
438 420 843 223 199 437
The fact that ti’s and t,)s are being used by the generator suggests that there is imperfect tagging of the trace with respect to its source or, perhaps equivalently, imperfect discrimination of the tags by the generator. The child uses the traces, a t least to a certain extent, interchangeably. On the other hand, the fact that even though the child must name the interpolated events, the probability of using a ti is still only about .20, may indicate either a resistance to encoding the interpolation events or rejection of them a t some later stage of processing. These are questions that require subtle experimental study and this sort of discussion is a t most conjecture a t this point. An attempt to train children to tag the traces should be one fruitful avenue of attack on the problem and would give some indication of the child’s capacity to sort information in memory.
Short-Term Memory in Binary Prediction by Children
363
VI. Extension to Markov Event Sequences A. THEIMPLICATION OF A ONE-TRIAL MEMORYFOR PREDICTION OF MORECOMPLEXEVENTSEQUENCES The theory proposed so far states that in predicting an alternation sequence the child uses stored information from just the previous trial. The short-term store is the locus of trial-to-trial effects, while any longterm effects are located in the set of generator rules. One implication of these assumptions is that as long as the generator rules stay fixed, the long-term properties of the event sequence are of no consequence. The subject’s response is as predictable as it can be solely on the basis of what his last response was and which event occurred. The assumption of a one-trial memory is very strong. We know that 4- and 5-year-old children can predict a double alternation sequence (AABBAABB. . .). This fact strongly suggests, but, contrary to popular belief (e.g., Restle, 1961), does not require a two-trial memory (a memory with two slots). Memory depth (number of slots) can be replaced by response-trace diversification together with an expanded set of rules. For example, in predicting the double alternation sequence red-redgreen-green- . .., the child may actually output two different responses for each type of event. Responding with the iambs red-RED, greenGREEN, or the trochees RED-red, GREEN-green, differences in stress or volume may result in four response traces rather than just two. Then, for the iamb system, for example, the rules r --f R, R --f g, g -+ G, and G -+ r would permit double alternation behavior using only one memory trace. It is of interest, then, to see how far the assumption of a one-trial memory can be carried reasonably. Also, even if memory effects from two or more trials back do occur, as they certainly do with adults, in young children these effects may be slight, so that a one-trial memory model may capture the bulk of the truth. The fact that in Experiment V the error probability was about .33 and yet the children did not abandon the alternation rule indicates that generator rules are rather resistant to change in the standard prediction task. Prediction errors do not seem to cause the child to abandon a rule very quickly (although this is perhaps not true during the first few trials during which rule selection often takes place). This, in turn, suggests that even if the sequence did not alternate on every trial, but just alternated fairly often, the alternation rule might be held. In this case, prediction of a sequence that alternated imperfectly would still depend only on the response and event of the previous trial. Incorporation into bhe model of the irregular nature of the event sequence should permit data analysis that would test the basic theoretical ideas in such an extension of the alternation prediction task. The model to be given in
Richard S . Bogartz
364
Section VI,C will do this and will be applied in Section VI,D to data that were obtained from children predicting an event sequence which alternated on a random 75% of the trials. This constitutes one extension of the model to the so-called probability learning task (“so-called” because from the present point of view the subjects do not learn probabilities). In the following section, a new, alternative response axiom is introduced to permit application of the basic ideas to prediction of sequences that tend to repeat rather than alternate.
B. THEUSEOF
A
REPETITION RULE
Thus far we have considered models in which Response Axiom R1, the consequences of which are shown in Fig. 3, has played a part because the tasks have all involved prediction of an alternation sequence. According to this axiom, the child is using an alternation rule with event and response traces. We now wish to introduce a different rule, a repetition rule, because we also wish t o treat data obtained from children who were predicting an event sequence in which the events alternated only 25% of the time, i.e., an event followed itself 75% of the time. Whereas the alternation rule causes the trace of one response t o lead t o the occurrence of the other response and the trace of one event t o lead to the prediction of the other event, the repetition rule is such that the trace of a response results in the occurrence of that same response on the next trial, i.e., a response perseveration or repetition, and the trace of an event results in the prediction that the same event will occur again. Thus, when a repetition rule is in use Response Axiom R1 (Section 11) would be replaced by the following
RESPONSE AXIOMR2. For every Cn-, in X, every i ( i = 1,2), every f (f = 0 , 1 , 2 ) ,everyg (g = 1 , 2 iff # 0 ; g = 1iff = 0), and everyn ( n= 1 , 2 , .. .), p(Ai,nlCn-l
=
iO
1 112
n
t$fi)= P(Ai,nlt$fi) if f = 1 or 2 a,nd i if f = 0 otherwise.
=g
C. Two MODELSFOR PREDICTION OF MARKOVEVENT SEQUENCES Two Markov event sequences (Bush & Mosteller, 1955, p. 124) are of interest here, and a model for prediction of each will be derived. For later notational convenience we shall let the symbol rr denote a probability that is always greater than 112. To define the two types of sequences we again use the notion of C,, an n-cylinder set of the sample space X defined in terms of a t most the first n trial outcomes. (The sample space X for the two models to be considered here can be defined as was that
Short-Term Memory in Binary Prediction by Children
365
defined in Section B). Then, a tending-to-alternate (TA) event sequence is one such that for any C,,
and a tending-to-repeat (TR)event sequence is one such that for any C,,
Thus, for example, a TA event sequence with transition probability matrix
T =
.75 would have a
while the TR matrix for T = .75 would be
'32, n
I ;:1
.25 .75
.
Using Axiom R1 with TA event sequences and Axiom R2 with TR sequences, we obtain the two tree diagrams shown in Figs. 12 and 13, as summaries corresponding to the role of Fig. 3 which summarized the first model for prediction of the single alternation sequence. By arguments similar to those in Section I1 it can be shown that the sequences of correct response and errors for both the TA and the TR models are samples from a two-state first-order Markov chain with stationary transition probabilities. Let &(x) = rx + (1 - .rr)(l- 2). Then for both the TA model and the TR model, the transition matrix for correct responses and errors is cn
en &(R+Y/~) &(B+Y/~) * Thus, for example, P(cn+,Icn)= n(l - y / 2 ) (1 - ~ ) k y / 2 ) . To estimate U , 8, and y , we first note that & ( ~ + y / 2=nec/ne, ) &P y / 2 ) = nee/ne, and $ ( y / Z ) = nce/nc. are maximum likelihood estimates. It is then easily shown that
+
+
Richard S . Bogartz
366
p = -a@ + YP)- Q(YPJ
(43)
2T- 1
are the maximum likelihood estimates of a,@, and y . If either Eq. (42) or (43)yields an estimate that falls outside the permissible values for probabilities, modified estimates must be used. These modified estimates are given in Table XXIII for the various possible violations that may arise. TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODING RESPONSE
<
TRIAL n t i MEMORY TRACE
(1) ti,nti
TRIAL n t i RESPONSE
TRIAL n t i EVENT
-
Ek#h, n t i
1
Ajti,nti Eh, n t i
ao,n
Ai,n €h,n
(2) 'h,nti
1
A 'j#h,nti
7l;f
\
EkZhanti
1-7
h, n t i
FIG.12. A tree diagram summarizing the model for prediction of a tending-toalternate sequence, using Axiom R1.
Given the estimates of the parameters and the fact that the sequence of correct responses and errors is theoretically a sample from a two-state Markov chain, the various statistical analyses of the data performed for the alternation prediction data can also be applied t o the data from
Short-Term Memory in Binary Prediction by Children
367
prediction of TA and TR sequences. The formulas in Section I1 are no longer appropriate, however, but the more general statement of the various theorems presented by Bogartz (1 966b) can be used.
D. EXPERIMENT V I : PREDICTION OF TENDING-TO-ALTERNATE AND TENDING-TO-REPEAT SEQUENCES The data to be treated here are from portions of a larger experiment reported in greater detail elsewhere (Bogartz, 1966a, Exp. 11),but the models analysis has not been given previously. TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODING RESPONSE
TRIAL n + i MEMORY TRACE
(1)
+i,nti
f
TRIAL n t i RESPONSE
-
TRIAL n t i EVENT
h, n+i
1
A i, nti Ek#h,n+i
I-di
Eh, n+i
€k#h,nti Ai,n Eh,n Eh,nti
€k#h, n+i €h, n t i
‘k#h, n+i
FIG.13. A tree diagram summarizing the model for prediction of a tending-torepeat-sequence, using Axiom R2.
1. Method The subjects were 22 4- and 5-year-oldsfrom the University Preschool Laboratories at the University of Iowa. The apparatus was essentially the same type of marble-dispensing unit used in Experiment I, and the
Richard S. Bogartz
368
TABLE XXIII
MODIFIED MAXIMUMLIKELIHOOD ESTIMATES OF a , p, AND y FOR VARIOUS VIOLATIONS WHICHCANARISEUSING Eqs. (42), (44), and (45) Modified estimates Violation
B
ti
P
0
111 ti 2 0 p2-0
ti+fl>l
IV 8 < 0 B
v
tit0 B>1 6+/3>1
VI 6 > 1 B
a
F- r - (1 - 77)
0
1-6
1 -ti
0
2rr- 1
Try the estimates in rows I and I1 of this table. Choose max (I, I T ) , the set m_aximizing the function n t j In Qt,, ( i , j = c, e ) , where Q t j is the entry in row i, column j of the transition matrix after the modified estimates are inserted
z,,
Use max (I,111),the set from row I or row 111, of this table giving the larger log-likelihoodfunction described in row IV
Use max (11, 111),the set from row I1 or row I11 of this table giving the larger log-likelihood function described in row IV
/? = 1 ; if it is less than zero, @ = 0
I f this quantity exceeds 1, If this quantity exceeds 1 , t i
= 1 ; if
it is less than zero, ti = 0
Short-Term Memory in Binary Prediction by Children
369
children were instructed in the same fashion as in Experiment I that they were to predict the color of the next marble each time they heard a buzzer. The first 75 trials are of concern to us here, and in those trials, half of the subjects predicted a TA sequence with T = .75 and the other half predicted a TR sequence with Z- = .75. (Within each group a different sequence was used for each child). Only black and white marbles were used during these trials, and the pacing of the trials was such that the buzzer sounded for . 3 second every 8 seconds. 2. Results
Table XXIV shows the observed and predicted mean performance curves for the two groups. The variability of the observed about the TABLE XXIV OBSERVED AND PREDICTED MEANPERFORMANCE CURVES BLOCKS OF 15 TRIALS FOR THE TA AND TR SEQUENCES
IN
Block Group
TA Pred., Pred., Obs. TR Pred.,
Pred., Obs.
1
2
3
4
5
.565 573 521
.581
.594 564
581 .594 .600
.581 394 .667
.581 394 539
.602 .612 .533
.586 .594 ,642
.586 .594 .545
.586 .594 .679
.594
.586
.545
expected values is somewhat larger than that observed in most of the other experiments. Using the results of the stat-child investigation for Experiment I1 (which had a comparable number of subjects and the same number of trials per block) as a gauge, the discrepancies appear within the range of expectation. An additional factor enters here which also tends to contribute to badness of fit. This is that the model is not only a model of the subjects but also a model of the randomly generated event sequences. Thus, the variance of observed values from predicted values is produced by the variability normal to the behavior of the event sequence as well as the variability normal to the subject behavior. This was not the case in the previous experiments in that the event sequences there were completely determined rather than probabilistic. The remaining statistical analyses are presented in Table XXV. The results indicate that the extensions to models for prediction of the
OBSERVEDAND
PREDICTED
TABLE XXV VALUESFOR VARIOUSSTATISTICS FROM EXPERIMENT VI
w THE
3-Tuples ccc cce cec cee ecc ece eec eee Mean number of runs of errors
Obs.
Pred.,
r1
7.3 r4
r5
C(K), K C(1) C(2) C(3) (34) C(5)
=
0
Pred.,
Obs.
Pred.,
Pred.,
.243 .127 .132 .091 .130
.224 .135 .115
.lo1
.251 .134 .116 .096 .132 .077 .095 .098
.093
m
16.636
15.723
17.095
W 0 rn
.209 .I51 .126 .097 .148 .080 .097 .092
.240 .I36 .120 .096 .138 .081 .097 .094
.217 .139 .124 .lo5 .137 .08S .lo3 .087
17.545
16.592
17.065
.088
.om
.101 .141 .085
.106
6
!i
Runs of errors of lengths 1-5 r2
4
TR
TA
Statistic
TA AND TR GROUPS:
N
10.182 4.182 1.818 .636 ,182
9.284 3.886 1.724 .806 .396
9.344 4.228 1.913 365 .391
8.818 3.273 1.727 1.000 .364
8.948 3.868 1.763 .850
9.188 4.251 1.966
.434
.420
26.364 25.455 25.273 24.727 24.445
27.753 26.461 25.792 25.318 24.913
26.354 24.952 24.472 24.115 23.775
27.273 27.364 27.000 26.273 25.909
28.956 27.840 27.180 26.663 26.205
.g09
1-5
26.582 24.904 24.318 23.931 23.580
Short-Term Memory in Binary Prediction by Children
371
two types of Markov event sequences have captured the essential features of the data.
E. DISCUSSION For both types of sequences, the children are correct about 60% of the time. This is below the maximum possible value of 75%, but is also well above the chance rate of 50%. There is no way for them to use overall event relative frequency to attain better-than-chance performance since the two events are equiprobable. I n the past, when subjects have attained better-than-chance accuracy with Markov sequences such as these, it has been inferred that they must be sensitive to the conditional probabilities characterizing the sequence. It is a short step, then, t o the inference that some sort of conditional probability analysis must have been occurring. I n the present context, such reasoning argues that the 4- and 5-year-old children in this experiment perform a conditional probability analysis on the events, learn these, and use them to make their predictions. Such an inference is certainly not necessary, and also seems to be not even sufficient t o deal with the facts. It is not necessary simply because the model proposed above accounts for the data but does not postulate such a process. It won’t deal with the facts in that the mean performance curves are a t asymptote in the second block of 15 trials, and it is hard to imagine a contingency analyzer that would zero in so rapidly, i.e., on such a small sample. The fact that “probability learning” can go on without the necessity of any probabilities being learned is important. It suggests that in other probability learning paradigms such as those with non-contingent events there may well be little or no learning of probabilities. The broader implication of this is that to regard the probability learning situation as a paradigm for studying ways in which children learn the everyday contingencies of life may well be a mistake. Or, on the other hand, perhaps it is a proper paradigm for such “learning,” but children in fact may deal with the contingent relationships in everyday experience as they do in the binary prediction task, namely, by fairly quickly deciding upon the appropriate one of a limited number of rules that might apply, and then sticking to the rule. The focus of interest in research into such matters then becomes the process of rule learning and the process of rule selection. It is not clear how much weight t o give t o the correspondence between the two sets of data. The observed values of the various statistics for the two types of sequences are so close that one could do a reasonable job of predicting the behavior under one sequence (say TA) by using the observed values under the other (TR). The group parameter estimates were almost identical for the two groups because of the fact that the
Richard S. Bogartz
372
two basic group transition matrices were
e and
I
c
e
.609
.391
.541
.459
C
e
for TA
The correspondence may be only coincidentally the result of the particular composition ofthe two groups of subjects. A less conservative position would argue that the basic theory together with the two models in fact predicts that the two sets of data should have the same group transition matrices and that the similarity of the two sets of data is added confirmation of the theory. This is true, but such a prediction would only be expected to be confirmed with large numbers of subjects in each group. Also, such a prediction would have t o be qualified in accordance with the following technical considerations. The model for TA prediction assumes Axiom R1 t o hold, whereas the model for TR prediction assumes Axiom R2 holds. The models, as stated, also maintain that these axioms are in effect beginning with the first trial. It is obvious that this is impossible without pretraining of some sort. A subject assigned a t random to one of the two types of sequences, TA or TR, cannot know in advance which rule to use. Observation of many subjects in this age range has led us t o believe that in prediction of binary event sequences they almost never use a rule other than the one-trial repetition rule in Axiom R2 (or certain minor variations of i t which cannot be discussed here) or the one-trial alternation rule in Axiom R1 (or its minor variations). They come into the experimental situation with such a rule, test it during the first few trials, perhaps shift t o the other class of rules (from repetition t o alternation or vice versa) and then tend t o be relatively inflexible about changing rules. The models do not describe the rule decision period and, in fact, completely ignore it. The prediction that for large numbers of subjects the group behavior should be the same in both TA and TR sequence prediction would suppose that there are no important differences during the rule decision period. But if rule decision occurs so rapidly, and is based on a cursory sampling of the dependencies, for example, perhaps the first three or four trials, one may wonder why there isn’t adoption of the wrong rule. I n fact, there appears to be much adoption of the wrong rule when one
Short-Term Memory in Binary Prediction by Children
373
surveys the available evidence. There is much evidence indicating that preschoolers will rather inflexibly persist with an inappropriate alternation rule when predicting noncontingent event sequences (Bogartz, 1965; Craig & Myers, 1963). Also, Bogartz (1966a) showed that after 75 trials of predicting a TR sequence, subjects shifted to a single alternation sequence had considerable difficulty in recognizing the shift in event contingencies (also see Exp. I of that study). We know almost nothing about the processes involved in the child’s decision as to which rule to use. We know almost nothing about the conditions under which greater flexibility of rule usage might develop, such as a learning-to-learn task in which the event contingencies change drastically from block to block or stimulus set to stimulus set. Thanks t o years of interest in another problem, we know that preschool children usually can induce and use the double alternation rule, but we know very little about the level of rule complexity that the child can use. Bogartz (1966a, Exp. I) showed that induction or use or both of either an ABBABB . . . or ABBBABBB . . . rule was very difficult. This is perhaps a bit surprising in that from a t least one point of view, an ABBABB . . . rule should be easier to induce and use than should a double alternation rule. It is an open question whether or not given either of these two rules, 4- and 5-year-olds can effectively use them. The obvious experiment needs to be performed. The number of elements in a string such as ABBB that the child can hold in memory may be critical to the processes of rule induction, in which case the literature on digit-span performance may be informative. Also, rule induction studies should take into account whether or not the child is required to respond during the rule induction phase. Requiring him t o respond will tend to produce response-trace interference; permitting him not to respond will tend to produce an overestimation of the time taken for rule induction.
VII. Noncontingent Event Sequences A. NONCONTINGENT EVENT SEQUENCES : n- > .5 The most common “probability learning” task is that involving noncontingent binary events. I n this case, the sequence of events E, and E, is a sequence of independent trials in which E l occurs with a probability T 2 .5 and E, occurs with probability 1 - n-. (There are other variations involving blank trials in which neither event occurs and double trials in which both occur, but these will not concern us.) The transition matrix for the event sequence is then E*,fl+I
C” n E,*fl cn
n E2.n
E2,fltI
374
Richard S. Bogartz
for every C,. The case in which rr = .5 requires a separate model so the first two models t o be considered will apply only t o the case rr > .5.
B. INDIVIDUAL DIFFERENCES IN RULESELECTION I n Section VI it was suggested that a t least two rules, a repetition rule and an alternation rule, tend t o be adopted by the children we have studied. As it happened, all of the children predicting the TR sequence in Experiment V I could be assumed t o have been using a repetition rule and all of the children predicting the TA sequence could be assumed to have been using the alternation rule. This sort of uniformity of rule adoption is by no means typical. Bogartz (1965) found some children alternating even with an essentially noncontingent sequence with rr = .8 (in which case the events tend t o repeat on about 68% of the trials). Craig and Myers (1963) found, as we would expect, that kindergarten children alternated more to a sequence with rr = .6 than t o one with T = .8, the former having more event alternations than the latter. Also, of course, children can be expected to have initial preferences for rules they may use almost regardless of the probabilistic structure of the event sequence. The major consequence of such individual differences in rule adoption is that tests of the theory will have to incorporate the different rule possibilities. It would be a mistake, for example, to apply a repetition rule model to data from a group of subjects some of whom were using an alternation rule. In the absence of any a priori reason for believing a given subject was using a particular rule (such information could come from analysis of pretraining data, instructions given to the child concerning the use of rules, and so on), the logic of deciding which rule and, therefore, which model applies, seems t o be that of parameter estimation, although of a somewhat unusual sort. Further comment on this will now be postponed until the two models of interest are presented.
C. Two MODELSFOR PREDICTION OF NONCONTINGENT EVENT WITH T > .5 SEQUENCES To derive the two models of concern here, we continue to apply the same basic ideas as before concerning the processes governing the child’s predictions but incorporate the new set of event probabilities. This immediately gives the tree diagram shown in Fig. 14, assuming a repetition rule to be in effect. Analysis of this stochastic structure reveals that the sequence of correct responses and errors is not a first-order Markov chain as it has been in the models considered above but, in fact, the sequence of A , and A, responses is such a Markov chain with stationary transition probability matrix
Short-Term Memory in Binary Prediction by Children
375
It is of interest here to note that the use of a repetition rule with a = 1.0 (use of only the event trace) is equivalent to the well-known
win-stay, lose-switch strategy. The present formulation is more general TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODlNG RESPONSE
TRIAL n + i MEMORY TRACE
TRIAL n + i RESPONSE
TRIAL n + i EVENT
REPETITION RULE
a h,n
<
I-.-\ E2
E2
FIG. 14. A tree diagram summarizing the model for prediction of
~t
non-
contingent binary event sequence with Pr(E,) = T , using a repetition rule.
in that it allows for interference by responses and distractions. A formulation of the present approach from the point of view of stimulus-sampling theory would also be possible, as the section on hypothesis models in Suppes and Atkinson (1 960) suggests ; however, their argument that such formal identity reduces the choice between languages t o describe the theory t o a matter of taste (p. 37) tends to blur the differences in
Richard S. Bogartz
376
suggestive properties of the different languages. For example, the present approach and the corresponding stimulus-sampling approach would have very different suggestions concerning the latency distributions in binary prediction experiments. TRIAL n RESPONSE EVENT PAIR
TRIAL n ENCODING RESPONSE
TRIAL n + i MEMORY TRACE
TRIAL n + i RESPONSE
TRIAL n+i EVENT
ALTERNATION RULE
O0.n
<< i,n+i
;c
El
1-77
A
€2 El
j # 1.n. , I1-77
a h.n
E2
FIG. 15. A tree diagram summarizing the model for prediction of a noncontingent binary event sequence with Pr(E,) = T ,using an alternation rule.
Figure 15 shows the tree diagram when the alternation rule is assumed to be in effect. Analysis of this process reveals that the sequence of A , and A, responses is a first-order Markov chain with stationary transition probability matrix A*,,+,
A2, n+ I
4 1 - 7/2) + (1 - 4 x(=
+ 7/2)+ (1
( B + 7/21
- n)7/2.
(.46)
Short-Term Memory in Binary Prediction by Children
377
Since the data will have to be consulted in the absence of special information regarding which rule is in effect, the choice between Matrix (45) and Matrix (46) is essentially the estimation of a parameter governing the behavior of the generator, i.e., which rule the generator is using. To make this explicit, we can introduce the parameter 0 which takes the value 1 if the repetition rule is in effect and the value zero if the alternation rule is in effect. The joint model may then be thought of as having the stationary transition matrix
where p i j ,I is the element in the ith row and j t h column of Matrix (45) and pij, is the element in the ith row, j t h column of Matrix (46). Parametrization of what can be thought of as the state of the generator has a number of salutory effects. For one, the notion immediately occurs that 8 may not be constant from trial t o trial but in fact may vary. The interpretation is of course that the subject changes rules during the course of the session. This, in turn, indicates the direction t o follow in the treatment of the transfer problem. The parameter 8 may be associated with the states of some learning process governing the behavior of the generator. Perhaps an all-or-none rule-learning model would apply. A two-state model would have the child using either the alternation rule or the repetition rule, with 0 = 0 or 1. A three-state model would perhaps take 0 = 0, 112, or 1 , where 8 = 112 would be associated with an intermediate state of uncertainty wherein the child would not be sure which rule to use and would choose the rule a t random, equiprobably. This will not be pursued further here. The analysis of the data can proceed as with the previous models provided that instead of coding correct responses and errors as ones and zeroes, the A,’s and A,’s are so coded. The use of 8 as a parameter of the generator is a recent development in the present approach (it first occurred during the writing of this paper) and our approach to data has not yet included it. Instead, decision between the two ways of characterizing the subject, repetition rule or alternation rule, has rested upon the use of a pseudo-X2statistic involving 3-tuple frequencies and a comparative inspection of the fits to other statistics. The data reported here are based on the latter approach. Introduction of the parameter 8, however, provides an unambiguous, maximum likelihood method for deciding between the two rules. The entire estimation procedure then includes two stages. I n the first stage,
Richard S. Bogartz
378
8 is set equal to 1 and the maximum likelihood estimates of the pij,,’s are obtained; then 8 is set equal to 0 and the maximum likelihood esti-
mates of the pij,z’s are obtained. I n the second stage the likelihood function
is computed for k = 1 , 2 . The value of k giving the larger Lk determines which set of @ i j , k ’ ~ should be used and therefore which value of 8 is appropriate.
TABLE XXVI
MODIFIEDMAXIMUMLIKELIHOOD ESTIMATES OF a , p, AND y FOR VARIOUS VIOLATIONS WHICH CAN ARISE USINGEq. (49) Modified estimates
6
Violation
111 ti
-+-
0
b o
n21
n12
n2.
%I.
1
B
P
-a
0
ti+j3>1
IV 6 < 0 P
V ti > 1
Try rows I and 11. Choose maximum (I, 11),the set maxmizing ci, n i j In p i j , 1 where p i j ,1 is the entry in the ith row, j t h column of matrix (29) with the modified estimates inserted Use max. (11,111)
[
6+@>1
I f thisexcecds 1,6 = 1,P
=
0; if it isless than zero, 6 = 0, j3
=
1.
Short-Term Memory in Binary Prediction by Children
379
To obtain the maximum likelihood estimates of the p i j ,,’s ,the maximum likelihood estimates of m, p, and y are obtained using Matrix (45), and then substituted back into that matrix. Thus, using Matrix (45), the maximum likelihood estimates under the repetition rule are
Table XXVI gives the maximum likelihood estimates under the repetition rule for the various possible violations of the theoretical constraints that can occur when Eq. (49) is used. TABLE XXVII
MODIFIED MAXIMUM LIKELIHOOD ESTIMATES OF a , fi, AND y FOR VARIOUS VIOLATIONS WHICHCAN ARISE USINGEq. (50) Modified estimates Violation
6
B
P
0
IV B t O B
v
6>1 B<0 6+B>1
Use max (I, II), the set maximizing p i j S 2 ’are s in Matrix (46)
n i j In p t j ,2 , where the
Usc max (11, 111)
If this value exceeds 1 , p = 1 , fi
=
0 ; if it is less than zero,
=0,b =
1
380
Richard S. Bogartz
Under the alternation rule, i.e., 0 = 0, Matrix (46) is used and the maximum likelihood estimates are
;ir=l--dt-P.
Table XXVII gives the modified estimates under the alternation rule when Eq. (50) leads to violations.
D. THE OFFENBACHEXPERIMENT Offenbach (1964) reported a study of probability learning in which the effects of reward and punishment were studied at two age levels, kindergarten and fourth grade, and subsequently made the raw data available. We shall be interested here only in the data for the 30 kindergarten children whose ages ranged from 4 years and 5 months to 6 years and 5 months. The details of method are given in the cited report and it will suffice here to indicate that on each of 100 trials the child verbally predicted which of two color patches, red or blue, would occur on the next card. The color sequence was essentially a noncontingent 75 :25 sequence, although runs of events of length greater than six were not allowed. Each subject received the same sequence. Of the 30 subjects, 24 were fitted satisfactorily by one or both of the two models above. The data for four of the subjects could be fit by either model fairly well and they have been included in both the repetition group and the alternation group. Thus, each group had 14 children in it, but only 24 children were fit. Table XXVIII shows the various observed and predicted sequential statistics for the subjects assumed to be using the repetition rule and for those assumed to be using the alternation rule. The fit appears reasonably good considering the fact that any peculiarities of the single event sequence used could affect each subject. The only obvious systematic deviation of the models from the data seems to be in the under-prediction of the C ( K )statistics for subjects using the alternation rule. It is not clear why this occurs. One conjecture that bears further examination is that the rule parameter 0 is not constant, but varies at times during the session. An occasional shift to a repetition rule would perhaps produce this sort of departure, Further support for this conjecture occurs when the six subjects that were not fit are considered. Two subjects made a very large number of A, responses, one making 91 and the other making 94. This could have resulted from an initial color preference. In any event, they appear quite different from the
TABLE XXVIII OBSERVED AND PREDICTED VALUESOF VARIOUS SEQUENTIAL STATISTICS FOR Repetition rule subjects
THE
OFFENBACH EXPERIMENT
~
~
~~~
~~
Alternation rule subjects
u1
w
Statistics Obs.
Pred.,,
Pred.,
Obs.
Pred.,,
Pred.,
sR
3-Tuples A1 A1 A1 A1 A1 A2 A1 A2 A1 A, A1 A2 A1 A, A1 A2 A1 A2 A2 A, A1 A2 A2 A2 Runs of A2responses Total runs r2 r3
r4 TS
.267 .157 .176 .067 .160 .087 .066 .020
.295 .I41 .I40 .078 .I42 .074 ,078 .049
.299 .148 .148 .073 .149 ,077 .074 .037
.057 .lo6 .282 .082 .lo6 .259 .083 .025
.041 .089 .268 .097 .089 .275 .097 .046
.037 .099 ,265 .099 .099 ,265 .099 .037
24.29 18.21 4.86 1.43 .14 .07
22.04 14.22 4.92 1.78 .67 .26
22.36 15.03 4.93 1.62 .53 .17
36.29 28.36 6.07 1.79 .29 .00
36.53 26.92 6.71 1.91 .62 .22
36.54 26.69 7.19 I .94 .52 .14
42.00 43.43 44.50 44.00 44.86
43.18 42.62 42.17 41.73 41.30
44.18 43.72 43.28 42.83 42.38
16.07 33.21 22.57 29.93 25.21
12.84 30.24 19.87 25.80 21.67
13.46 29.59 21.94 25.03 23.28
Joint occurrences of A,’s k trials apart
c1 c2 c 3 C4 C5
W 00
Richard S. Bogartz
382
remaining subjects. More interesting are the other four subjects. These subjects make too many alterations to be fit well by the repetition model and too many repetitions to be fit well by the alternation model. Examination of each of their protocols and its relationship to the event sequence suggests that these subjects are shifted from one rule to the other by the occurrence of a burst of either repetitions or alternations. Three or four alternations in a row shift the child to an alternation sequence ; three or four repetitions of the same event seem to shift the child into a repetition run. These children seem quite different from the majority. They appear sensitive to several events back in the sequence. Their behavior is much more like adult probability learning behavior in their sensitivity to alternation runs (Anderson, 1960). Their shifts to repetition runs also offers one possible answer to a problem which Norman Anderson posed concerning adult probability learning. After finding that adults respond with alternation runs to alternation runs in the event structure, he suggested that since a run of alternations moves the mean performance curve toward the .50 level, there must be some compensating action which moves the curve toward 1.O if the mean curve is to be a matching curve, i.e., asymptote a t about the rr level, as is often found. One such compensatory response tendency would be to emit a run of repetitions of some length greater than the repetition run in the event sequence, as these four children seemed to do. The above interpretation of the four subjects involves a several-slot memory rather than a one-slot memory and points the way to a developmental model which would postulate increase of the available short-term storage locations, elaboration of the possible rules that could be used by the generator, how the number of rules increases, what the rules for rule selection are, and how they change with development.
E. TWO MODELSFOR PREDICTION O F NONCONTINQENT BINARY SEQUENCES WITH m = .5 When 7r = .5 and the repetition rule is in use, the theory predicts that and P(Ai,n+l[ f'(cn+llcn)= P(cn+llen)= .5, P(Ai,n+l[Ai,n)= (1 + E i , J= (1 + cr)/2. When rr = .5 and the alternation rule is in effect, P(c,+IIcn)= P(cn+llen)= ' 5 , p(Ai,n+llAi,n)= (1 - B ) / 2 , and P(Ai,n+ll E,,,) = (1 - a ) / 2 . (A happy correspondence of a change in sign from + to - to the change in rule from repetition to alternation.) Defining an event repetition response (ER) to be the joint event Ei,, n Ai,,+, (i = 1 or 2), and an event alternation response (EA) to be the joint event Ei,,+, n Aj,n+l(i = 1 or 2 , j # i ) ,and letting (1 0
+ a)/2
==((1 - a ) / 2
if repetition rule in effect if alternation rule is in effect,
(51)
Short-Term Memory in Binary Prediction by Children
383
it is easy to show, using techniques introduced above, that when the initial probability of an A, is . 5 , the probabilities of the eight possible %tuples of ER’s and EA’s are P ( R R R ) = u3 P(RRA)= ~ ’ ( 1 U) P(RAR) = ~ ’ ( 1 -U ) P(RAA) = u(1 - u)’ P(ARR) = u’( 1 - U ) P(ARA) = u ( l - u)’ P(AAR) = a(1 - u)‘ P(AAA) = (1 - u)’, the expected total number of runs of ER’s in an N-trial experiment is
E(RER) = ( N - 1)u - ( N - 2 ) u2,
(53)
and the expected number of runs of ER’s of length j in an N-trial experiment is E(qER) = ( N - 1)u - Z(N - 2 ) u2 + ( N - 3) u3. (54)
F. THEBOGARTZ STUDY Bogartz (1965) reports an experiment (see Exp. I1 of that report for details of the method) in which 16 preschool children each predicted a different 50 :50 random sequence of 150 black and white marbles a t a rate of one prediction every 5 seconds. Bogartz found that the data were fit well by a linear operator model for experimenter-controlled events (Bush & Nosteller, 1955) of the form
(a,P,P,,+ (1 CC,
P n i ,=
-al)
if a black marble on trial n. if a white marble on trial n,
(55)
where P, is the probability of predicting black ( A , )on trial n. A slightly greater than 50: 50 frequency of Al’s (.542) necessitated the choice of two rate parameters. Also involved in the testing of goodness of fit was the value V, = .542, the overall A, relative frequency, and an estimate of V,, the average second raw moment of the response-probability distributions. Thus four parameters were estimated t o fit the linear model.’ I n addition to the results shown in Table XXIX giving the observed and predicted values for the ER - EA triples and the runs of ER’s, assuming a repetition rule for each subject, we can also derive for the repetition model the joint probabilities originally used to test the fit of the linear operator model. For T = .5 we obtain
Richard S. Bogartz
384
TABLE XXIX OBSERVED AND PREDICTED ER AND EA TRIPLES AND RUNSOF A, RESPONSES FOR THE BOGARTZ (1965)STUDY Statistic Triples RRR RRA RAR RAA ARR ARA AAR AAA
Obs.
Pred A,
.
Pred.,
.211 .134 .122 .lo1 .136 .088 .lo2 .lo5
.198 .136 .136 .lo3 .I36 .lo3 .lo3 .084
.184 .139 .139 .106 .139 .106 .106 .080
Total number of runs of Az's
R Runs of A2'sof lengths 1-5 rl r2 p3 I.4
r5
33.62
35.98
36.57
13.56 8.94 3.69 2.94 1.75
15.65 8.67 4.86 2.77 1.60
15.52 8.76 4.95 2.79 1.58
Table XXX shows the observed joint proportions corresponding to Eqs. (56-59), the predicted values for the linear model, and the predicted values for the present approach, assuming that for all 16
Short-Term Memory in Binary Prediction by Children
385
subjects the repetition rule is in effect. To predict the values in the first 14 rows, three parameters V1, a , , and a, were estimated for the linear model and one parameter a (a group a or the average of the individual a’s),was estimated for the repetition rule model. To predict the last four values, one more parameter V, for the linear model and one more for the repetition rule model /I (a group or an average of individual B’s) were estimated, The fit of the linear model seems only slightly better than that of the repetition rule model, both doing a good job. The repetition rule model requires estimation of only two parameters, whereas the linear model TABLE XXX OBSERVED AND PREDICTED VALUESOF VARIOUS EVENTCOMBINATIONS AND RESPONSE-EVENT COMBINATIONSAND AN A, RESPONSE Combination
Observed
.307 .235 .162 .122 .146 .113 .088
.060 .066 .060
.074 .062 .080 .052 .185 .122 .145 .090
Pred.,,
Pred.,
.287 .213 .143 .lo7 .143 .lo7 .077 .053 .077 .053 .077 .053 .077 .053 .165 .122 .128
.284 .214 .142 .lo7 .142 .lo7 .071 .053 .071 ,053 .071 .053 .071 .053 .161 .123 .125
.085
.089
Pred.Linear .298
.244 .161 .134 .137 .110 .086 .072 .074 .060 .075 .062 .063 .049 .162 .116 .I33 .111
requires estimation of four. The question of which model, if either, “fits better” is a difficult one and will not be grappled with here. It will be indicated that the use of V, by the linear model is a means ofincorporating a slightly greater tendency to predict A, than A,. The repetition rule model could be elaborated t o allow for biased guessing rather than unbiased guessing and thereby provide for an improved fit. This, however, would sacrifice the simplicity of the unbiased guessing hypothesis and simplicity of derivations for only a slight gain in accuracy and was not considered worth the loss of simplicity.
386
Richard S. Bogartz
VIII. Conclusions and Directions The general theoretical approach taken here generates models that describe well the data gathered in a variety of deterministic and probabilistic binary prediction experiments with young children and generates directional predictions concerning the effects of experimental variables such as intertrial interval variation and the naming of interpolated events. The data confirm these directional predictions as well as fitting the models. The emphasis on the memory effects of the subject’s own response, one of the aspects of the present approach that distinguishes it from most of the other approaches to binary prediction (which have attempted t o base their predictions exclusively upon the event sequence properties and process assumptions about the way such properties are used) is supported by the data analyses. Space limitations prevent any extensive discussion of still other models that this approach generates which will be needed to handle the data of small subgroups of subjects that behave in atypical fashion in some tasks. I will just mention as an example, that for some subjects predicting an alternation sequence, a model is needed that links the attender to the generator such that an alternation rule is in effect when the subject is paying attention, but when attention lags, a repetition rule using response traces seems to go into effect. Anumber of such models incorporating high- and low-attention levels have been developed and will be presented elsewhere. Our present plans for experimental research involve five lines of study. We want to pursue study of the effects of intertrial interval duration through parametric studies in which the different intertrial intervals are scheduled according to a modified staircase or up and down psychophysical programming method (Smith, 1961 ; Bogartz, 1968). This will permit the subject t o tailor the intervals to his own storage characteristics and also to trace out a function which is the memory analog of the traditional psychophysical function. A second line of study concerns trace confusion. Using minimal pairs of words (Gleason, 1961), that is, words which differ by only one phoneme, as compared with words differing by two, three, or more, it should be possible to study the precision of the short-term store in preserving small differences over time. Using visual stimuli such as pictures of a hat and a bat versus hat and a ball, and assuming that the hat and ball are not visually less similar to each other than are the hat and bat, greater errors to the hat-bat pair than t o the hat-ball pair in the alternation prediction task would suggest conversion of the visual inputs to auditory storage. Absence of such an effect would suggest that storage may instead be in a visual (image) store. Visual inputs with
Short-Term Memory in Binary Prediction by Children
387
phonemic similarities would control for perceptual confusions, thus allowing for implication of the memory as the source of confusion. I n Experiment V, we saw that when young children are predicting the next animal picture in an alternation sequence, for example, bear-dogbear-dog- . . .’ if they are asked to name other pictures during the intertrial interval, then the naming of a picture corresponding to the subject’s own previous response enhances the effect attributed t o response traces by the model, naming of a picture corresponding to the last predicted card enhances the effect attributed t o event traces, and naming of an irrelevant picture card, e.g., giraffe or elephant, enhances the effect attributed to null traces. We have interpreted these results as meaning that with a certain probability the contents of the memory store are replaced by the encoded name of the interpolation picture. Recall that subjects named either three or one interpolated picture during the interval, with different combinations of bear and dog or of giraffe and elephant being presented when three namings were required, but one of the four when only one naming was required. The effects of three of the stimuli in combination were not clear and the expected effects only revealed themselves when only the last of the three stimuli was used in the analysis. Although to some extent this supports the notion of a one-bin memory in alternation prediction, we want t o pursue the matter by further manipulation of interpolated lists, and to attempt to develop a model for the input probabilities of each of the positions in the interpolated list that will account for the microstructure of the abovementioned effect. Such an analysis might go further toward supporting the notion of a one-bin memory, and also has relevance t o the next two lines of research. We also would like t o study the flexibility of the generator by studying sequence-to-sequencetransfer effects. We have shown that while previous prediction of a simple repetition sequence, in which only one event occurs on each trial, does not interfere with later performance on a single alternation sequence, very poor performance on an alternation sequence is observed followingprediction of a tending-to-repeat sequence (Bogartz, 1966a). We want to know under what conditions young children detect changes in the rules governing the event sequences and incorporate them into the generator. Is this gradual or all-or-none? Can it be influenced by verbal cueing, times out for rest, changes in the stimuli being used when a given sequential rule goes into effect, such as red-green for single alternation but blue-blue-yellow-yellow for a double alternation sequence? Will children be able to “color-code” the rules in their longterm memory, and enter them into the generator following the first color change so as to limit themselves to only one error? Can arousal by delivery of novel stimuli facilitate detection of rule changes? Furthermore, in
388
Richard S. Bogartz
conjunction with these ideas we would like to know whether or not the child perhaps expands and contracts the number of bins functioning in his short-term store depending upon the rule in operation in the generator. If the child uses, for example, two bins in dealing with the double alternation sequence but only one when predicting the single alternation sequence, then we might be able to detect this by perhaps finding that, for example, the last two interpolations of an intertrial interval interpolation list play a part when subjects name interpolations between double alternation predictions, but that only the last one plays a role when single alternation sequences are being predicted. The last line of experimental work to be mentioned involves study of the rule capacity of the generator by manipulation of sequence complexity. We would like to know more about the kinds of rules that young children can use. How involved a rule can the child hold and use effectively? How can one characterize the complexity of rules? Some start toward such characterization of rules for adults in terms of mandatory and optional rules that may be employed in prediction of binary events has been made by Restle (1967). His notions concerning the possible applicability of the generative grammar approach (Chomsky, 1963) to rule utilization are suggestive, and the small number of stages of learning suggested by his analysis of sequence learning into the discovery of a hierarchical structure of rules organizing the entire sequence are in accord with the rapid discovery of familiar sequences by children. The discovery of rules of high complexity may require a recoding process such as counting. In this case, we expect that typical 3-, 4-, and 5-year-olds will be extremely limited in the discovery process. It is anticipated that the children would be provided with the rules by instruction and pretraining before the actual assessments of their ability to use them were tested. This part of the investigation will be of interest in its own right since there is almost no information on how complex a sequence a child can acquire if he is carefully tutored in the properties of the sequence rather than left to his own devices to discover them. At present, the most challenging theoretical questions concern the assumptions that will be needed to extend the theory to earlier and later developmental stages. It is clear that the capabilities of the memory will have to be extended and some elaboration of the generator will be needed. The interaction of the attender with the generator as suggested above will have to be considered, as also will the possible interaction of the attender with the memory such that perhaps more memory locations are in use at higher attention levels than at lower levels.
Shorf-TermMemory in Binary Prediction by Children
389
REFERENCES Anderson, N. H. Effect of first-order conditional probability in a two-choiee learning situation. Journal of Experimental Psychology, 1960, 59, 73-93. Anderson, N. H. An evaluation of stimulus sampling theory: Comments on Professor Este’s paper. In A. W. Melton (Ed.), Categories of human learning. New York: Academic Press, 1964. Anderson, T. W., & Goodman, L. A. Statistical inference about Markov chains. Annals of Mathematical Statistics, 1957, 28, 89-110. Atkinson, R. C., Bower, G., & Crothers, E. J. An introduction to mathemtical learning theory. New York: Wiley, 1965. Atkinson, R. C., Sommer, G. R., & Sterman, M. B. Decision making by children as a function of amount of reinforcement. Psychological Reports, 1960,6, 299-306. Averbach, E., & Coriell, A. S. Short-termmemory in vision. Bell System Technical Journal, 1961, 40, 309-328. Blau, J. H. The combining of classes condition in learning theory. Technical Report No. 32, 1960, Applied Mathematics and Statistics Laboratory, Stanford University, Contract Nonr 225(17). Bogartz, R. S. Sequential dependencies in children’s probability learning. Journal of Experimental Psychology, 1965,70, 365-370. Bogartz, R. S. Variables influencing alternation prediction in preschool children : I. Previous recurrent, dependent, and repetitive sequences. Journal of Experimental Child PsychoEogy, 1966, 3, 40-55. (a) Bogartz, R. S. Theorems for a finite sequence from a two-state fist order Markov chain with stationary transition probabilities. P s y c h m t r i k a , 1966, 31, 383395. (b) Bogartz, R. S. Extension of a theory of predictive behavior to immediate recall by preschool children. In K. F. Riegel (Ed.), The Development of language functions. Technical Report No. 8, Language Development Program, Center for Human Growth and Development, University of Michigan, 1966. Pp. 1-14. (c) Bogartz, R. S. Test of a theory of predictive behavior in young children. Psychonomic Science, 1966,4, 433-434. (d) Bogartz, R. S. An asymptotic likelihood ratio rest for a Markov chain with Bernoullian or contingent input. Psychometrika, 1968, 33, 405-422. Bogartz, R. S., & Pcderson, D. R. Variables influencing alternation prediction in preschool children : 11. Redundant cue value and intertrial interval duration. Journal of Experimental Child Psychology, 1966,4, 311-216. Broadbent, D. E. Flow of information within the organism. Jourmal of Verbal Learning and Verbal Bchawior, 1963, 2, 34-39. Brunk, H. D. On the estimation of parameters restricted by inequalities. Annals of Mathematical Statistics, 1958, 29, 437-454. Bush, R. R. Sequential properties of linear models. I n R. R. Bush & W. K. Estes (Eds.),Studies in mathematical learning theory. Stanford: Stanford Univ. Press, 1959. Bush, R. R., & Mosteller, F.Stochastic models fwlearning. New York: Wiley, 1955. Bush, R. R., Mosteller, F., & Thompson, G. L. A formal structure for multiplechoice situations. In R. M. Thrall, C. H. Coombs,& R. L. Davis (Eds.), D e c i s h processes. New York: Wiley, 1954. Chomsky, N. Formal properties of grammars. I n R. D. Luce, R. R. Bush, & E. H. Galanter (Eds.), Handbook of mathematical psychology. Vol. 2. New York: Wiley, 1963. Conrad, R. An association between errors and errors due to acoustic masking of speech. Nature, 1962,193, 1314-1315.
390
Richard S. Bogartz
Craig, G. J., & Myers, J. L. A developmental study of sequential two-choice decision making. Child Developmnt, 1963,34,483-493. Edwards, W. Reward probability, amount, and information as determiners of sequential two-alternative decisions. Journal of Experimental Psychology, 1956, 52, 177-188. Estes, W. K. Probability learning. I n A. W. Melton (Ed.), Categories of human learning. New York: Academic Press, 1964. Estes, W. K., & Suppes, P. Foundations of statistical learning theory. I. The linear model for simple learning. Technical Report No. 16, 1957, Applied Mathematics and Statistics Laboratory, Stanford University, Contract Nonr 225( 17). Estes, W. K., & Suppes, P. Foundations of linear models. I n R. R. Bush & W. K. Estes (Eds.),Studies in mathemtical learning theory. Stanford, Calif. : Stanford University Press, 1959. Gleason, H. A., Jr. An introduction to descriptive linguistics. Rev. Ed. New York: Holt, 1961. Goodnow, J. J. A review of studies on probable events. Australian Journal of Psychology, 1958, 10, 1 1 1-125. Hake, H. W., & Hyman, R. Perception of the statistical structure of a random series of binary symbols. Journal of Experiwwntal Psychology, 1953, 45, 64-74. Hunt, E. B. Simulation and analytic models of memory. Journal of Verbal Learning and Verbal Behavior, 1963, 2, 49-59. Kemeny, J. G., & Snell, J. L. Finite Marlcov chains. Princeton, N.J.: Van Nostrand, 1960. Lloyd, K., Reid, L. S., & Feallock, J. B. Short term retention as a function of the average number of items presented. Journal of Experimental Psychology, 1960, 60, 210-207. Luce, R. D. Individual choice behavior. New York: Wiley, 1959. Mackworth, J. F. The visual image and the memory trace. Canadian Journal of Psychology, 1962,16,55-59. Melton, A. W. Implications of short-term memory for a general theory of memory. Journal of Verbal Learning and Verbal Behavior, 1963,2, 1-21. Offenbach, S. I. Studies of children’s probability learning: I. Effect of reward and punishment a t two age levels. Child Development, 1964,35,709-715. Postman, L. Short-term memory and incidental learning. I n A. W. Melton (Ed.), Categories of human learning. New York: Academic Press, 1964. Restle, F. Psychology of choice andjudgment. New York: Wiley, 1961. Restle, F. Grammatical analysis of the prediction of binary events. Journal of Verbal Learning and Verbal Behavior, 1967, 6, 17-25. Rouanet, H., & Rosenberg, S. Stochastic models for the response continuum in a determinate situation : comparisons and extensions. Journal of Mathematical Psychology, 1964, 1, 215-232. Siegel, S. Choice, strategy, and utility. New York: McGraw-Hill, 1964. Siegel, S., & Andrews, J. M. Magnitude of reinforcement and choice behavior in children. Journal of Experimental Psychology, 1962, 63, 337-341. Siegel, S., & Goldstein, D. A. Decision making behavior in a two-choice uncertain outcome situation. Journal of Experimental Psychology, 1959, 57, 37-42. Smith, J. E. K. Stimulus programming in psychophysics. Psychomtrika, 1961, 26, 27-33. Sperling, G. The information available in brief visual presentations. Psychological Monographs, 1960,74, ( 11 Whole No. 498). Stevenson, H. W., & Weir, M. W. Variables affecting children’s performance in a probability learning task. Journal of Experimental Psychology, 1959,57,403-412.
Short-Term Memory in Binary Prediction by Children
391
Stevenson, H. W., & Zigler, E. F. Probability learning in children. Journal of Experimental Psychology, 1958, 56, 185-192. Suppes, P., & At,kinson,R. C. Markov learning models for multiperson interactions. Standord, Calif. : Stanford University Press, 1960. Weir, M. W. Developmental changes in problem-solving strategies. Psychological Review, 1964, 71, 473-490. Weir, M. W. Age and memory as factors in problem solving. In K. F. Riegel (Ed.), The development of language functions. Technical Report No. 8, Language Development Program, Center for Human Growth and Development, University of Michigan, 1966. Pp. 15-33. Wolfe, P. Foundations of nonlinear programming : Notes on linear programming and extensions. Part 65, August 1965, Memorandum RMP4669-PR, The RAND Corporation, Santa Monica, California. Yntema, D. B., & Mueser. G. Remembering the present states of a number of variables. Journal of Experimental Psychology, 1960, 60, 18-22. Zeaman, D., & House, B. J. The role of attention in retardate discrimination learning. In N. R. Ellis (Ed.), Handbook of mental deficiency. New York: McGraw-Hill, 1963. Pp. 159-223.
This Page Intentionally Left Blank
AUTHOR INDEX Numbers in italics refer to the pages on which the complete references are listed.
A Aaronson, D., 173, 197 Ammons, R. B., 242,296 Andelman, L., 2, 41 Anderson, N. H., 305,382,389 Anderson, N. S., 62, 97, 135, 190, 197 Anderson, T. W., 330, 389 Andrews, J. M., 301, 390 Atkinson, R. C., 135, 161, 197, 207, 211, 212, 215-222, 224, 228, 229, 231-233, 235, 237, 238, 239, 301, 316, 375, 389, 391 Attneave, F., 48, 64, 67, 96 Austin, G. A., 102, 132 Averbach, E., 136,197, 305, 389
Brunk, H. D., 327,328,389 Brunswik, E., 3, 4 1 Bunderson, C. V., 126, 132 Burke, C. J., 30, 40, 101,132 Bush, R. R., 101, 132, 300, 302, 322, 323, 336, 364, 383,389
c Chase, W. CC., 49, 75, 84, 92, 97, 98 Chomsky, N., 388,389 Chumbley, J., I., 107, 132 Cimbalo, R. S., 146, 197 Clark, W. H., 122, 125, 133 Cohen, B. H., 46,97 Conrad, R., 46, 97, 146, 165, 173, 185, 186, 197, 306, 389 Cooper, E. H., 264, 296 Corballis, M. C., 189, 197, 198 Coriell, A. S., 136, 197, 305, 389 Cornsweet, J. C., 79, 99 Cornsweet, T., 79, 99 Cox, N., 51, 97 Craig, G. J., 301, 373, 374, 390 Crossman, E. R. F. W., 166, 198 Crothers, E. J., 211,228,229,237,238,316, 389 Crowder, R. G., 217, 238
B Baddeley, A. D., 146, I97 Bahrick, H. P., 96, 96 Bartlett, F. C., 61, 97 Beck, J., 53, 97 Beller, H. K., 47, 98 Bergstrom, J. A., 185, 197 Bernbach, H. A., 149, 197, 204, 211, 229, 231, 232,236,237, 238 Bevan, W., 67,97 Bishop, C. H., 52, 53, 97 Blau, J. H., 300, 389 D Bogartz, R. S., 301,302,311,320,330,335, Dallett, K., 96, 97 343,348,367,374,383,384,386,387,389 Boies, S. J., 62, 74-78, 80, 81, 82, 84, 87, D’Andrea, L., 96,97 DeSoto, C., 96, 97 88, 97, 99 Dinner, J. E., 249, 296 Bouchee, B., 96, 96 Ditrichs, R., 278, 296 Bourne, L. E., Jr., 101, 126, 132 Dodd, D. H., 126,132 Bower, G., 316, 389 Bower, G. H., 1, 2, 27, 41, 102, 103, 107, Dukes, W. F., 67, 97 117, 119, 121, 122, 124, 132, 134, 135, Duncan, C. P., 245, 249, 296, 297 148, 159,197, 211,237, 316, 389 E Brelsford, J. W., Jr., 220,221,222,237,238 Broadbent, D. E., 305, 306, 389 Edmonds, E. M., 62,97 Broadbent, L., 59, 97 Edwards, W., 300, 390 Brooks, L. R., 96, 97 Egeth, H. E., 54, 97 Brown, J., 166, 197 Ehmann, E. D., 124, 126,132 Brown, L. T., 52, 53, 99 Eichelman, W. H., 50,51,52,53,62,74,75, Brown, R., 237, 238 76, 77, 78, 80, 81, 84, 97, 99 Bruner, J. S., 1, 40,44, 47, 97, 102, 132 Ekstrand, B. R., 293, 297 393
Author Index
394
Emmerich D., 123, 133 Erickson, J. R., 124, 126, 132 Estes, W.K., 30,40,101,102,132,300,312, 390 Evans, S. H., 62, 97
F Fant, C. G. M., 194, 198 Feallock, J. B., 306, 390 Feigenbaum, E.A., 136,145,161,194,196, 198, 199 Feldman, J., 136, 198 Feller, W., 218, 238 Fentress, J . , 69, 98 Fiero, P., 195, 198 Fitts, P. M., 62, 97 Flavall, J. H., 44, 97 Fletcher, H., 153, 198 Fraser, D. C., 185, 198 Fraser, J . , 278, 297 Frost, R., 70, 97
G Galanter, E., 46, 98 Gibson, E. J., 52, 53, 57, 67, 97 Glanzer, M., 122, 125, 133 Gleason, , ., 386, 390 Glucksberg, S., 136, 198 Goggin, J., 264, 268, 297 Goldsmith, R., 57, 62, 63, 64, 77, 99 Goldstein, D. A,, 300, 390 Goldstein, K., 44, 97 Goodman, L. A., 330, 389 Goodnow, J. J., 102,132, 300, 390 Goodrich, K. P., 1, 40 Gottsdanker, R., 59, 97 Green, €3. F., 196, 199 Green, E. J., 101, 133 Greenberg, R., 293, 296 Greeno, J. G., 229, 238 Gregg, L. W., 117, 122,133, 175, 198 Guy, D. E., 126,132 H Haber, R. N., 52,97, 125,133 Haberlandt, K., 14, 16, 17, 19, 21, 4 1 Hake, H. W., 305, 390 Halle, M., 194, 198 Handel, S., 96, 97 Hansen, D. N., 231, 232, 238 Harlow, H. F., 102, L33
Harris, G. J., 146, 184, 185, 198 Hawkins, H. L., 54, 55, 59, 98 Hayes, W. H., 52, 53, 99 Hebb, D. O., 53, 98 Hellyer, S., 193, 198 Hellyer, S., 223, 224, 226, 238 Hershenson, M., 52, 53, 98 Hille, B. A, 185, 197 Hillner, K., 218, 219, 238 Hinsey, W. C., 64, 98 Hintzman, D. L., 147, 188, 194,198 Hochberg, J., 44, 46, 49, 50, 52, 53, 98 Holgate, V., 3, 41 Honig, W. K., 24, 32, 40 House, B. J., 2, 27, 41, 336, 391 Houston, J. P., 249, 296 Hubel, D. H., 53, 98 Hughes, C. L., 1, 40 Hull, C. L., 3, 26, 29, 30, 40, 101, I33 Humphrey, G., 45, 98 Hunt, E. B., 306, 307, 390 Huttenlocher, J., 122, 133 Hyman, R., 305, 390
I Irion, A. L., 242, 248, 296
J Jacobson, R., 194, 198 Jahnke, J. C., 186, 198 Jenkins, J. J., 66, 100 Jonckheere, A. R., 13, 40 Justesen, D. R., 126, 132 K Kamin, L. J . , 2, 29, 30, 31, 33, 35, 40 Keele, S. W., 65, 66, 68, 69, 71, 74, 75, 98, 99 Keeney, T., 66, 100 Kemeny, J. G., 317, 319, 390 Kendler, H. H., 122, 133 Kendler, T. S., 122, 133 Keppel, G., 183, 198, 257, 258, 277, 278, 283, 290, 293,294, 296, 297 Kessel, F., 66, 100 Kleinberg, J., 122, 133 Konick, A. F., 76, 77, 99, 195, 199 Konorski, J., 70, 73, 80, 83, 98 Krechevsky, I., 102, 103, 133 Krueger, W. C. F., 266, 296
Author Index L Land, V., 218, 219, 238 Landauer, T. K., 175, 198 Lashley, K. S., 2, 26, 40 Laughery, K. R., 136, 146, 172, 184, 185, 188, 195, 197,198, 199 Lawrence, D. H., 2, 40 Lazar, G., 249, 296 Leitenberg, H., 103, 133 Leonard, J. A., 62, 97 Levine, M., 27, 40, 102, 103, 105, 116, 122, 123, 125, I33 Lindsay, J. M., 54, 98 Lindsay, R. K., 54, 98 Lloyd, K., 306, 390 Logan, F. A., 14, 16, 17, 19, 21, 41 LoLordo, V. M., 35, 41 London, M., 96,97 Lovejoy, E. P., 2, 27, 40 Luce, R. D., 300, 390 Lurk, A. R., 44,98 M McGeoch, G. O., 264, 296 Mackintosh, N. J., 1, 2, 4, 26, 38, 40 Mackworth, J. F., 184, 199, 305, 390 McNeill, D., 237, 238 McNulty, J. A., 285, 297 Mandler, G., 46, 98 Mandler, J. M., 46, 98 Martin, R. B., 278,296 Matter, J., 1, 40 Mealy, G. H., 196, 199 Melton, A. W., 183, 195,199,213, 238, 304, 390 Mewhort, D. J. K., 50, 98 Miller, G. A,, 46, 98, 168, 194, 199 Miller, P., 122, 133 Mitchell,R.F., 44,48,49,52,57.59,60,94,99 Morton, J., 191, 199 Mosteller, F., 101, 132, 300, 302, 336, 364, 383, 389 Mueller, M. R., 62, 97 Mueser, G., 306, 391 Murdock,B.B., Jr., 184,187,195,199, 208, 218,235,238,247, 255, 296 Myers, J. L., 301, 373, 374, 390
N Neisser, U., 46, 47, 53, 70, 82, 95, 98, 99, 237, 238
395
Newell, A., 196, I99 Newton, J. M., 243, 296 Nicely, P. E., 194, 199 Nickerson, R. S., 54, 98 Norman,D.A., 135,149,173,199,200,236, 237, 238, 239 North, A. J., 1, 40 Norton, T., 28, 41 0
Offenbach, S. I., 380, 390 Oldfield, R. C., 61, 98 Olshavsky, R. W., 175, 198 Ozier, M., 285, 297
P Pantle, A. J., 264, 296 Papanek, M. L. 1, 40 Pederson, D. R., 302, 343, 348, 389 Perkins, C. C., Jr., 28, 40 Peterson, L. R., 183, 193, 195, 199, 216, 217, 218, 219, 238 Peterson, M. J., 193, 195,199, 216,217, 238 Phillips, J. L., 207, 212, 215, 216, 217, 218, 219, 220, 221, 224, 231, 233, 235, 239 Pick, A. P., 67, 98 Pinkus, A. L., 146, 172, 184, 185, 198, 199 Podell, H. A., 68, 98 Pollack, I., 45, 98, 185, 199 Posner, M. I., 44, 46, 48, 49, 52, 57-60, 62-66, 68, 69, 71, 74-78, 80, 81, 84, 85, 87,89,91,92,94,97, 98, 99, 100, 183, 190, 195, 199, 217,239 Postman, L., 183, 199, 243, 245, 246, 251, 257, 258, 260-262, 264, 268, 273-278, 283-286,290,293,294,296,297,310,390 Pribram, K., 46, 98 Price, H. H., 61, 73, 99 Price, T., 14, 16, 17, 19, 21, 41
R Rappoport, M., 62, 97 Ratliff, F., 79, 99 Reeves, J. W., 46,61, 99 Reid, L. S., 306, 390 Rescorla, R. A., 3, 11, 32, 33, 35, 40, 41 Restle, F., 27, 41, 101, 102, 122, 123, 124, 132, 133, 300, 388, 390 Ribot, T., 45, 73, 99 Richter, M. L., 103, 133
Author Index
396 Riese, W., 61, 99 Riggs, L. A., 79, 99 Robinson, J. S., 52, 53, 99 Rock, I., 102, 133 Rockway, M. R., 249,297 Rosenberg, J., 122, 133 Rosenberg, S., 349, 390 Ross,L. E., 1, 40 Rossman, E., 217,239 Rouanet, H., 349,390 Russell, D. G., 2, 40
Thompson, G. L., 300,389 Thune, L. E., 242,243,248,297 Tighe, L. S., 122, 134 Tighe, T. J., 122, 134 Tolmen, E. C., 3, 41 Tonge, F. M., 196,199 Trabasso, T., 1,2,27,41,102,103,107, 117, 118, 121, 122, 124,132,134 Trask, F. P., 166,200 Tulving, E., 265, 285, 297 Twedt, H. M., 273,297
S
U
Saltzman, D., 218, 219, 238 Schiff, W., 52, 53, 97 Schulz, R. W., 247, 249, 256, 297 Schwartz, M., 260, 261, 262, 297 Schwenn, E., 243, 245, 246, 251, 297 Selfridge, 0. G., 70, 82, 99 Seligman, M. E. P., 28, 41 Shepard, R . N., 70, 99 Shiffrin, R . M., 135, 161, 197, 207, 212, 215-221, 224, 231, 233, 235, 237, 237, 238,239 Siegel, S., 300, 301, 390 Simon, H. A., 117, 122, 133, 161, 198 Simon, S., 278,296 Smith, J., 52, 53, 97 Smith, J. E. K., 386, 390 Snell, J. L., 317, 319, 390 Sokolov, Y. N., 71, 99 Sornmer, G. R., 301, 389 Spence, K. W., 1, 26, 30, 41, 103, 134 Sperling, G., 75, 91, 95, 99, 135, 136, 199, 237, 239, 305, 390 Stark, K., 278, 285, 297 Steinmeyer, C. H., 122, 133 Sterman, M. B., 301, 389 Sternberg, S., 44, 56, 59, 84, 92, 99 Stevens, S. S., 46, 99 Stevenson, H. W., 301, 390, 391 Strange, W., 66, 100 Suppes, P., 300,312,375,390, 391 Sutherland, N. S., 2, 3, 4, 6, 26, 38, 41
Uhr, L., 61, 70, 100 Ulbricht, C., 146, 184, 185, 198 Underwood, B. J., 247, 249, 256, 273, 293, 296,297
T Talland, G., 80, 100 Taylor, R. L., 62, 74-78, 80, 81, 84, 85, 87, 89, 91, 92, 97, 99, 100 Teghtsoonian, &I., 70, 99 Thomas, E., 28, 42
V Van Sant, C., 59, 97
W Wagner, A. R., 1, 2, 5, 14, 16, 17, 19, 21,25, 26, 28, 29, 33, 38, 40, 41 Walker, E. L., 234, 235, 239 Waugh, N. C., 135, 199, 237, 239 Weiner, B., 234, 235, 239 Weinstein, M., 62, 97 Weir, M. W., 301, 390, 391 Weiss, J. M., 28, 41 Weizenbaum, J., 196, 200 Welton, K. E., 57, 62-64, 99 White, R. M., Jr., 126, 134 Wickelgren, W. A,, 135, 146, 149, 153, 166, 173, 186, 188, 194, 195, 200, 236, 239 Wickens, D. D., 243, 296 Wiesel T. N., 53, 98 Wilcox, S. G., 96, 97 Wolfe, P., 328, 391 Woodworth, R. S., 45, 48, 100
Y Yntema, D. B., 166, 200, 306, 391 Yoder, R. M., 122, 133
z Zacks, R., 257, 258, 297 Zajkowski, M. M., 124, 126,132 Zavortink, B., 290, 293, 294, 296 Zeaman, D., 2, 27, 41, 336, 391 Zigler, E. F., 301, 391
SUBJECT INDEX A Acquisition of transfer skills, 273-284 practice in, 274-285 role-governed behavior, 273-274 Attention theory, 26-28
Levels of stimulus processing, 44-45 abstraction, 45 generation, 46, 80-83 recognition, 46 Long-term memory memory structures of, 144-145 recoding in, 169-170
B Blank-trials assumption, 103 Binary prediction task, 3 W 0 2
C Continuity versus noncontinuity theory, 101-103 Continuous memory task, 219-223 Cue validity, 4-5 experiments of, 5-20
M Markov event sequences, 363-373 models for prediction of, 364367 Modified continuity theory, 2&34 experimental evaluation of, 34-38
N Noncontingent event sequences, 373-386 models for prediction of, 374-380, 382386
F Forgetting, loss of replicas and, 204-206
H Hypothesis testing abstract versus specific theory, 122-126 blank trials probe, 103 dynamics of, 105-122 eight-dimensional problems, 107-122 four-dimensional problems, 105-107 oogs-error, 109
I Intertrial interval model for within Ss variation and, 348353 single alternation task and, 341-348
P Paired-associates memory for, 217-219 repeated presentations and, 226-231 Perception role of familiarity in, 49-53 schema formation and, 61-62 stimulus examination and, 47 Proactive inhibition instructed “release” from, 235-236 learning to learn and, 293-295
R Recall effects of practice on, 285-295 role of warm-up in, 25Ck256 Rehearsal processes, 231-236 Retroactive inhibition, 286-293
L Learning to learn associative skills in, 26Ck263 response-integration skills in, 257-260 versus warm-up, 242-248
S Schema formation evidence for, 62-67 recognition and, 70-73 role of variability in, 67-70 397
398
Subject Index
Serial-position effects experimental paradigm of, 207-208 mathematical model of, 208-211 payoffs and, 233-234 Short-term memory computer simulation model of, 138-174 basic information units, 140-142 forgetting mechanisms, 142-144 memory processes, 149-173 memory structure, 144-149 types of errors, 173-174 rehearsal processes in, 231-236 retrieval in, 203-204 serial-position effects in, 206-215 simulation experiments of, 182-188 Single alternation task effects of interpolated events on, 357363
experiments on, 329-353 interference effects in, 3 5 M 5 7 intertrial interval and, 341-355 model of, 312-329 theory of, 307311 Single-item recall, 21C217, 223-226 incentives in, 234-235 Stimulus examination, visual matching in, 48 Stimulus matching analog operations, 5C61 visual representation in memory, 74-84 Stimulus processing levels of, 44-45 serial and parallel, 5 4 5 6 units of, 53 visual and code names in, 84-94 Stimulus selection attentional interpretation of, 26-28
cue validity and, 4-5 modified continuity theory of, 28-34 research strategy and, 2
T Transfer nonspecific acquisition of skills in, 273-284 definition of, 241 two-stage analysis of, 256263 warm-up versus learning to learn, 242-248
specific, 241-242
V Verbal memory, 201-202 replica theory, 202-206 forgetting, 204-206 retrieval, 203-204 storage, 202-203
W Warm-up retention and, 248-250 role of recall in, 250-256 versus learning to learn, 242-248 Whole versus part learning, 263-273 experimental investigation of, 266-272 total-time hypothesis, 264 Z
Zero memory assumption, 102