A Sherlockian
experiment
ULRlC NEISSER” JOHN A. HUPCEY
Cornell University ” ‘It would be superfluous to drive us mad,...
37 downloads
1339 Views
5MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
A Sherlockian
experiment
ULRlC NEISSER” JOHN A. HUPCEY
Cornell University ” ‘It would be superfluous to drive us mad, my dear Watson’, said he. ‘A candid observer would certainly declare that we were so already before we embarked upon so wild an experiment.’ “ (Devil’s Foot)
Abstract Members of a Sherlock Holmes society were presented with sentences taken from the Holmes stories. They were asked to identify the story and the immediate context from which the sentence was drawn. Concrete sentences relevant to the story’s themes proved to be the most effective cues; descriptions and proper names were ineffective. In most studies of memory, the subject encounters or learns the material during the course of the experiment itself. We know very little about memory for materials which subjects have mastered on their own time and for their own reasons. Moreover, despite a certain amount of research on memory for stories (e.g., Bartlett, 1932), we also know little about how genuine works of literature are remembered. With such material will ‘concrete’ items serve as better cues than ‘abstract’ ones? Will the relevance of the item to the theme of the story make a difference? Are names and descriptions of persons effective as cues? The present experiment is offered as a first step toward filling these gaps. Conan Doyle’s sixty stories about Sherlock Holmes and Dr. Watson have a powerful and continuing fascination for many readers. It is not unusual for Holmes’ fans to form small societies, which meet regularly for Sherlockian purposes. At such meetings the members may read or discuss some of the famous ‘analyses’ or ‘exposes’ of Holmes’ life and loves, argue about his methods, or compare favorite stories. A frequent pastime at these convocations takes the form of a test of memory. One member opens a volume of the stories at random and reads a sentence; the others must name the adventure from which it comes or even (ideally) go on with the text from memory. More complicated *This paper was written while the senior author was a the Behavioral Sciences, Stanford, California. The support Even more gratefully, we acknowledge the enthusiastic Underground.
Fellow at the Center for Advanced Study in of the Center is gratefully acknowledged. cooperation of the Cornell Baker Street
Cognition,
3/4J, pp. 307
- 311
308
Uric Neisser and John A. Hupcey
questions may also be asked. In what stories does Holmes’ brother Mycroft appear? Which ones involve animals? How many does Dr. Watson not narrate? The present study became possible because the junior author was a member of such a group, the Baker Street Underground of Cornell University.
Method Five types of sentences were selected from the Holmes stories. NAME sentences contained little information other than the name of a protagonist: “Mr Holmes, I am the unhappy John Hector McFarlane” (Norwood Builder*); “Good evening, Mr. James Windibank” (Case of Identity). DESCRIPTION sentences provided personal descriptions of characters: “His tall, gaunt, craggy figure had a suggestion of hunger and rapacity” (Thor Bridge); “He was an elderly man with a thin projecting nose, a high bald forehead, and a huge grizzled moustache” (Empty house). ISOLATED ABSTRACT sentences were comments which did not carry any reference to the story or the setting in which they were made: “To let the brain work without sufficient material is like racing an engine” (Devil’s Foot); “Deceit, according to him, was an impossibility in the case of one trained to observation and analysis’. (Study in Scarlet). ISOLATED CONCRETE sentences describe objects or concrete deductions which do not bear on the main theme of the story, as when Holmes is simply impressing Watson with his powers: “My eyes tell me that on the inside of your left shoe. just where the firelight strikes it, the leather has been scored by six almost parallel cuts” (Scandal in Bohemia); “Each of these mends, done as you observe with silver bands, must have cost more than the pipe did originally” (Yellow Face). RELEVANT CONCRETE sentences are specific observations integrally related to the story or the solution: “The gun was made to conceal” (Valley of Fear); “Were it mixed with any ordinary dish, the eater would undoubtedly detect it, and would probably eat no more” (Silver Blaze). Ten sentences of each type were originally selected, and the first two subjects were run with all 50. This proved too burdensome, and the remaining 8 subjects were tested with only about 27 sentences each, drawn randomly from those available. Thus the sentences were not all used equally often; presentation data appear in Table 1 below. The subjects were asked four questions about each sentence. (1) From what story is it taken? (2) In what context does it occur? (3) What sentence occurs next in the story? (4) Which of these two alternative sentences (shown to the subject) occurs next in the story? The present report is based on the answers to the first two questions alone. The remaining data were discarded because adequate recall of the next sentence (question 3) almost never occurred; correct recognition of the next sentence (question 4) was often based less on memory than on considerations of logic or of literary style. *Redundant
phrases
in story titles, such as “The Adventure
of the...“,
have been omitted.
A Sherlockian experiment
309
The subjects, all members of the Cornell Baker Street Underground, varied in their familiarity with the stories. By their own accounts, they had been interested in Holmes from two to ten years, and had read the entire works from one to twelve times. They were tested individually, and all sessions were tape-recorded. Each key sentence was read aloud by the experimenter, and repeated as often as desired. The subject was asked to name the story from which it came; if he was wrong, he was allowed to make a second guess. He was scored ‘correct’ even without recalling the title itself, if he could convince the experimenter that he knew the story in question (perhaps by giving a brief outline of the plot). If he was unable to do this, he was scored ‘failed’. When a failure occurred, it became important to determine whether the subject had any recollection of the story whatsoever. The experimenter began to describe the main plot, carefully avoiding any reference to the incident involving the key sentence. He continued until either the subject began to pick up and continue the story line himself, or both of them became convinced that he didn’t know it. In the latter case, data from the sentence was not included in further analyses. The subject was then asked to report the general context in which the given sentence occurred. He was prompted with such questions as “Who said it?” “Where were they?” and so on. No positive information was contained in the prompts. The experimenter had a checklist of the major points in the context of each sentence, enabling him to prompt and rate subjects consistently. He scored their responses on a four-point scale (3, excellent; 2, good; 1, some partial recall; 0, nothing), rechecking his rating from the tape later. After his recall of context had been determined, the subject was asked to report the next sentence from the story if he could, and was given a pair of alternative next sentences from which to select. (As noted above, these data will not be presented.) The experimenter then continued with the next randomly-chosen sentence, and so on for about an hour. The subject was also asked about his familiarity with the stories and his methods of recall.
Results The major results of the study appear in Table 1. It is clear that different types of sentences produced very different results. In 54 presentations of DESCRIPTION sentences (ten different sentences, presented from two to nine times), the subjects identified the story on only seven occasions, even counting second guesses. In 64 presentations of RELEVANT CONCRETE sentences, however, the story was identified 46 times! Other sentence-types fell between these extrem.es. Similarly, good to excellent description of the context was achieved in only about a third of the NAME and DESCRIPTION sentences, but in over two-thirds of the RELEVANT CONCRETE. The data suggest that relevance to the story line was the most important variable. RELEVANT CONCRETE sentences produced strikingly more recall of titles, and substantially more recall of
3 10
Uric Neisser and John A. Hupcey b
Table 1.
Numbers of recalls and failures for various types of sentences. Each sentence was presented to at least two of the ten subjects; most were presented to six or more.
Type of sentence:
NAME
DKXRIPTION
Number of sentences Total presentations
10 51
10 54
ISOLATED ABSTRACT
ISOLATED CONCRETE
RELEVANT CONCRETE
11 59
10 63
9 64
__-
Title recall correct title or story correct second guess failed
11 (19%)
6 (11%)
13 (22%)
17 (27%)
44 (69%)
2 (4%) 44 (77%)
1 (2%) 47 (87%)
0 (0%) 46 (78%)
2 (3%) 44 (70%)
2 (3%) 18 (28%)
Context recall excellent or good some partial recall failed
20 (35%) 6 (11%) 31 (54%)
17 (31%) 10 (19%) 27 (50%)
17 (29%) 20 (34%) 22 (37%)
35 (56%) 6 (10%) 22 (35%)
46 (72%) 11 (17%) 7 (11%)
context, than ISOLATED CONCRETE ones. The latter, in turn, were somewhat better than ISOLATED ABSTRACT sentences, but this may be because they do at least fit into a tiny story of their own (often a Holmesian deduction) rather than because they are concrete. DESCRIPTIONS, while extremely concrete in (say) Paivio’s (1971) sense, were quite ineffective as cues. The sentences within a category differed in their effectiveness. This variation was most striking in the ISOLATED categories, where some sentences led to correct recall by nearly every subject while others never produced a correct answer at all. This held true for recall of context as well as of title, and sentences easy on one task were also easy on the other. Some of the intersentence variation was probably due to the presence of a few particularly famous, often-quoted passages. (“He never spoke of the softer passions, save with a gibe and a sneer”, Scandal in Bohemia). Subjects who could identify the story almost always remembered at least a little bit of context as well. In 98 cases of correct title or story recall (counting second choices), only a single instance of failed context recall occurred. This observation confirms the existence of a link between context and title mentioned in many of the introspective reports. Given a sentence, most subjects would try to remember an incident in which it might have occurred, and work from there to the whole story and its title. As one might expect, the subjects most familiar with the stories remembered most. The Spearman rank correlation between familiarity (the number of times each subject said he had read the stories) and the number of titles recalled was 0.72; that between familiarity and context recall was 0.66.
A Sherlockian experiment
3 11
Conclusions It appears that the links between individual sentences and the stories in which they occur are strongest for sentences directly relevant to the main theme. Coherence with the theme is critical. Descriptions of characters are much less effective, despite their concreteness. Further research will be necessary to determine whether these conclusions apply generally, or are a peculiarity of Holmes stories and Sherlockians. We will not speculate further at the present time. “It is a capital mistake to theorize before you have all the evidence. It biases the judgment” (A study in Scarlet). REFERKNCLS Bartlett, F. C. (1932) Remembering. Cambridge, England, University Press. Doyle, A. C. (1930) The Complete Sherlock Holmes. Garden City, New York: Doubleday volumes). Paivio, A. (1971) Imagery and Verbal Processes. New York, Holt, Rinehart and Winston.
(two
On a present6 a des membres de la societe Sherlock Holmes des phrases tirdcs des histoires de Sherlock Holmes. On leur demanda d’identifier I’histoire et le contexte proche dont ces phrases ont et6 tirkes. Les phrases concretes, pertinentes au sujet m&me de l’histoire se sont r&.%es I’indice le plus efficace pour cette tlche; au contraire, les descriptions ainsi que les noms propres n’eurent aucune efficacite.
2
Alternative
conceptions
of semantic
theory”
ARNOLD L. GLASS KEITH J. HOLYOAK Stanford
University
Abstract It is argued that theories of semantic memory have diverged in a manner that parallels current linguistic controversy concerning the representation of meaning. The featurecomparison model (Smith, Shoben & Rips, 1974) applies the linguistic theory of Lakoff (1972) to predict people’s reaction times to verify sentences, while the marker-search model, described here, uses the type of semantic representation outlined by Katz (I 972) to explain a similar range of data. The two models are described and the evidence for each is reviewed. Available evidence supports the marker-search model, but disconfirms a major prediction of the feature-comparison model. It is argued that the feature-comparison model is in principle inadequate as.a model of semantic representation, unless its conception of semantic components is substantially altered. Philosophers and linguists have long discussed how the meaning of a word is represented in memory. In psychology, semantic memory research has approached this question by investigating possible mechanisms by which people use their knowledge about words to determine whether sentences are true or false. The dependent variable of major interest has been reaction time (RT) to verify simple propositions, such as A dog is an animal. In a recent review paper, Smith, Shoben, and Rips (1974) have proposed a model, called the feature-comparison model, to account for the bulk of the RT differences reported in the semantic memory literature. This model emerges from what they term a ‘set-theoretic’ tradition in semantic memory research (Meyer, 1970; Schaeffer & Wallace, 1970). It uses a semantic theory outlined by Lakoff (1972) as the basis for a psychological process model. Set-theoretic models of semantic memory have been contrasted with net-
*The ordering of authors is haphazard. This paper has benefited from the extensive suggestions of Gordon H. Bower, Daniel Osherson and an anonymous reviewer. We are especially grateful to our good friends Edward E. Smith, Edward J. Shoben, and Lance J. Rips for the free exchange of data and ideas upon which this paper depended. This paper was completed while A. Glass held an N.S.F. graduate fellowship and K. Holyoak held a Stanford University fellowshiR, and it was supported by Grant MH13950-06 from the National Institute of Mental Health to Gordon H. Bower.
Cognition
3(41, pp. 313 - 339
3 14 Arnold L. Glass and Keith J. Holyoak
work models, which represent the meaning of a word as a mapping between the word and a relational network (Collins & Quillian, 1969). However, the distinction between settheoretic and network models has never been clearly drawn. Smith et al., describe settheoretic models as those models in which concepts are represented by sets of elements. But network models can also define concepts componentially, so that at this general level the notations of set theory and of graph structures are largely interchangeable. In order to contrast the two types of models more clearly, we will describe two assumptions that can be used to distinguish set-theoretic from network models. First, set-theoretic models restrict themselves to a very simple formal representation. Each element in the set representing a concept is treated as an atomic unit. Such formal devices as redundancy rules (Katz, 1972) which would permit one element to dominate (and therefore entail) another are excluded from the representation. Semantic relations are defined in terms of operations such as set inclusion; e.g., it might be assumed that a person can verify that a dog is an animal by determining that the set of features defining dog contains the set of features defining animal. In contrast, network models can include the graph-theoretic equivalents of redundancy rules in order to mark entailments, while the equivalents of antonymous n-tuples (Katz, 1972) can be used to mark contradictions between semantic components. A second assumption, one that is central to the feature-comparison model, is that the relation of category membership is in some sense a matter of degree. This assumption is identical to Lakoffs (1972) hypothesis that absolute notions of truth and falsity should be replaced by a continuous truth dimension. This view would represent a sentence such as A bat is a bird as having some ‘intermediate’ truth value. In contrast, semantic relations in a network model are basically all-or-none: A component either dominates another, or it does not; and a component either contradicts another, or it does not. This type of representation therefore naturally leads to absolute rather than continuous notions of truth and falsity, as is advocated by Katz (1972). Under this view, a person might be uncertain as to the truth value of A bat is a bird, due to his ignorance or the sentence’s ambiguity; but nevertheless, an absolute dichotomy remains between truth and falsity. Clearly, these parallel debates in psychology and linguistics are related to the extent that the goals of the two fields converge. Accordingly, the issues which distinguish these two classes of models have implications for linguistics as well as for psychology. The intent of the present paper is to analyze the empirical and theoretical justification for these two types of models. The first section of the paper provides a critical review of the major evidence for the feature-comparison model of Smith et al. We are focusing our attack on the feature-comparison model for several reasons. First, it formulates the settheoretic assumptions that we wish to call into question more clearly than any previous proposal. Second, the model has been given a precise formulation that allows the possibility of disconfirmation. Third, as Smith et al., point out, the feature-comparison model has been more successful in accounting for available verification data than any
Alternative
conceptions
of semantic memory
3 15
other set-theoretic model yet proposed. Accordingly, if we can show the Smith et al., model to be inadequate we will be rejecting not simply an arbitrary version of a settheoretic model, but the most successful one yet devised. The feature-comparison model invokes two separate mechanisms for verifying a sentence. In Section 2, by contrast, we present a network model that adapts the theory of Katz (1972) to psychological prediction, and which assumes a single basic mechanism for verifying sentences. In this alternative model the single underlying variable that determines both true and false RT is the time required for the person to access information that logically confirms or contradicts the truth of the presented sentences. Section 3 presents data that provides support for our proposal, while disconfirming a critical prediction of the feature-comparison model. The final section of the paper then examines broader theoretical issues concerning the nature of semantic representation that are raised by a comparison of the two models. We shall argue that a set-theoretic representation is in principle inadequate as a model of semantic memory.
1. The feature-comparison model: Review and critique The feature-comparison model assumes that the meaning of a word is represented by a set of features, and that “some features will be more defining or essential aspects of a word’s meaning, while others will be more accidental or characteristic features” (Smith et al., 1974, p. 4). Each feature is thus stored along with a weight indicating its degree of ‘definingness’ for the concept in question The feature-comparison model posits two distinct serial stages that are used to verify sentences of the form An S is a P. In the first stage, the overall relatedness of the subject and predicate words is assessed in terms of all features (regardless of their definingness weights) of the two categories. If the overall relatedness of the subject and predicate words exceeds an upper criterion, a quick ‘true’ response is made. If their relatedness falls below a lower criterion, a quick ‘false’ response is made. Only if the overall relatedness falls between the upper and lower bounds is the second stage executed, resulting in a longer RT. This second stage separates the more defining features from the characteristic ones on the basis of feature weights, and compares only the more defining features of the subject and predicate. A ‘true’ decision is made in case all the defining features of the predicate are contained in the subject; otherwise, the decision is to respond ‘false’. The feature-comparison model does not specify any relationship between overall semantic relatedness and the duration of second-stage processing. This model predicts that for true sentences, as the relatedness between subject and predicate increases, the percentage of quick stage-one ‘true’ responses will increase, resulting in faster mean RT for more related true sentences. But for false sentences, high relatedness will decrease the percentage of quick stage-one ‘false’ responses, resulting in slower mean false RT as relatedness increases. We will consider whether the feature-comparison model’s assumption of two serial processing stages is justified. This may be done in light of a test derived from one pro-
3 16 Arnold L. Glass and Keith J. Holyoak
posed for stage models by Stemberg (1969); viz., are there two conceptually distinct variables, one derived from stage one and the other derived from stage two, that both affect RT as predicted by the feature-comparison model? Smith et al., identify two variables that might satisfy this criterion for the feature-comparison model: (1) Semantic relatedness, which should affect the outcome of stage one only; and (2) category ‘size’ (i.e., the number of features that define a particular category), which should affect the duration of stage two only. Let us examine the evidence concerning these variables. Semantic relatedness For true sentences the feature-comparison model predicts that high semantic relatedness will result in a greater probability of a correct stage-one response, and hence, lead to relatively fast mean ‘RT. Smith et al., review several studies showing that high relatedness indeed speeds up correct classification of an instance as a member of the test category. However, such evidence is open to an alternative interpretation. The problem is that rated relatedness has proved in every case to be positively correlated with the frequency with which the instance is produced as an association to the category name, as measured by association norms such as those of Battig and Montague (1969) (see Rips, Shoben, & Smith, 1973; Rosch, 1973; Smith, 1967; Smith et al., 1974; Wilkins, 1971). For instance the correlation between relatedness and production frequency (one standard measure of association strength) was 0.85 in the Rips et al., study. While this correlation is consistent with Smith et al.? claim that production frequency reflects semantic relatedness, other evidence demonstrates that production frequency has an independent effect on RT. This evidence is provided by Experiment I of Smith et al., where the correlation between semantic relatedness and production frequency was only 0.49. In that study, production frequency was clearly a better predictor of RT than were relatedness judgments. Smith et al., can argue, of course, that production frequency simply measures the underlying conceptual variable of relatedness more accurately than do ratings. However, other empirical results are difficult to reconcile with the notion that production frequency can be identified with relatedness. Loftus (1973) obtained measures of the production frequency (PF) of the category given the instance as a stimulus, as well as of the frequency of the instance given the category as a stimulus. She then varied whether the category or the instance was presented to the subject first in a verification task requiring determination of whether the instance was a member of the category. When the instance preceded the category (e.g., robin-bird), the instance-to-category PF determined RT; but when the category preceded the instance (e.g., bird-robin), the category-toinstance PF determined RT. In terms of the feature-comparison model, this result implies that for the same two words, relatedness differs depending on the presentation order. However, the model does not specify the exact composition rule that is to be used to compute overall similarity, and it is unclear whether or how such a rule can be made sensitive to word order. In other words, if production frequency is to be taken as a
Alternative conceptions of semantic memory
3 17
measure of relatedness, then the notion of relatedness will have to be considerably complicated. Other conceptual problems arise in trying to identify production frequency with relatedness. When a person is asked to rate the relatedness of a pair of words, he is being asked to do what Smith et al., assume is done during stage one of sentence verification _ i.e., to compare the subject and predicate words and assess their degree of relatedness. Production tasks, on the other hand, involve the retrieval of one concept given another as a cue. While Smith et al., may assume that these tasks measure relatedness, there is no a priori reason to believe this to be true. In section 3 we will argue that production frequency reflects a different conceptual variable, namely, the order in which information about word meanings is retrieved. Note that the results of Loftus, described above, have a straightforward interpretation in terms of retrieval. Suppose that verification in the Loftus paradigm requires that the person find a path between the two presented concepts (the category and the instance), beginning at whichever concept is presented first. Then the obtained effects of presentation order simply indicate that the frequency with which the second word is generated as a response to the first in a production task measures how quickly a path can be found from the first concept to the second during verification. Other evidence that production frequency is best conceptualized as a measure of the order of information retrieval is reviewed in Section 3. Other experimental evidence, reviewed by Smith et al., generally supports their prediction that high relatedness increases false RT:Several studies have found that the RT to reject meaningful (high-related) false sentences (e.g., All grains are wheats) is longer than the RT to reject relatively anomalous sentences (e.g., All typhoons are wheats) (Kintsch, 1972; Meyer, 1970; Rips et al., 1973; Wilkins, 1971). However, the issue is not yet closed. A study by Glass, Holyoak and O’Dell (1974) suggests that false RT is not monotonically related to overall relatedness of the subject and predicate terms. Contrary to the Smith et al., prediction, false sentences in which the subject and predicate were very closely related (e.g., Many arrows are dull) were rejected more quickly than relatively meaningful sentences in which the subject and predicate were less related (e.g., Many arrows are wide). However, minimally-related anomalous sentences (e.g., Many arrows are intelligent) were rejected most rapidly of all. This latter result is not inconsistent with the earlier findings, since previous studies compared a mixture of relatively meaningful sentences, differing in relatedness, to anomalous (very low-related) sentences. Further false RT data that is incompatible with the feature-comparison model is reviewed in section 3 below. Category size Smith et al., also specified a variable which supposedly affected only stage two in their model, namely, category size:Since larger, more abstract categories logically have fewer defining features, fewer comparisons should be required in stage two in order to match
3 18 Arnold L. Glass arid Keith J. Holyoak
defining features of a large category with the features of the test instance. Accordingly, holding constant the probability that stage two processing occurs (by controlling semantic relatedness), an increase in category size should decrease decision time. This prediction has been tested by studies that have varied category size while attempting to hold relatedness constant (Landauer & Meyer, 1972; Wilkins, 1971). But contrary to the prediction of the feature-comparison model, both studies found that statements involving larger categories took longer to verify than statements involving smaller categories (the difference was 32 msec for false sentences and 17 msec for true sentences, respectively). Smith et al., criticize these studies, arguing that neither study reported tests of the obtained differences against item variability (Clark, 1973). It has not been demonstrated, therefore, that this category-size effect is reliable. However, the fact that these trends are opposite to the prediction of Smith et al., remains problematic. Finally, Experiment I in Smith et al. tests their category size prediction while escaping methodological problems inherent in the earlier studies. That experiment varied category size and semantic relatedness (as measured by production frequency) independently. The true RT results showed that production frequency was a highly significant variable; but when the variance in RTs attributable to production frequency was eliminated (by an analysis of covariance) the residual effect of category size did not approach statistical significance. Apparently, the conclusion best supported by available results is that, contrary to the prediction of the feature-comparison model, category size in itself has no effect on true RT when production frequency is controlled. Since category size is the sole variable so far proposed to affect stage-two processing, one must conclude that the twostage model does not meet the evidential standards for multi-stage models proposed by Sternberg (1969). Smith et al., also test the feature-comparison model by fitting a mathematical model of its major assumptions to the data of their second experiment. The mathematical model provided estimates of such parameters as the length of the duration of stage two in relation to category size, and the subject’s criterion (based on relatedness) for making a response without stage two processing. They used two different estimation procedures, one based on relatedness ratings (with sixteen parameters) and one based on error rates (with ten parameters). When RT was predicted from the model on the basis of semantic relatedness ratings the obtained fit was extremely poor, with a correlation between predicted and obtained RT of only r(14) = 0.69. Furthermore, the fit provided by the more successful procedure seems to rest on the use of a general correlation between higher error rates and slower RTs. We tested this possibility by predicting RTs for Smith et al.% data directly from the observed error rates. For this purpose we grouped both the 96 true items and the 96 false items into ten levels of error rates, so as to have at least five items at each level, and used linear regression to predict the mean RTs. For true items the correlation between predicted and observed RT was r(8) = 0.972, p < 0.01, and the root mean square deviation equalled 11.3 msec; while for false items the correlation was r(8) = 0.929, p < 0.01, with a mean deviation of 16.4 msec. Caution is necessary in
Alternative conceptions
of semantic memory
3 19
comparing these results with those of Smith et al., since we predicted two sets of ten mean RTs, while Smith et al., predicted a single set of 36 mean RTs. Nevertheless, such a comparison is suggestive. Smith et al., estimated ten parameters in order to predict RT, and obtained a correlation of r(24) = 0.945 between predicted and observed RT, with a mean deviation of 28.9 msec. Our predictions, each set of which is based on just two estimated parameters, are no less accurate than those obtained by Smith et al., using their more elaborate model and parameter-estimation procedure. Thus while it is true that the parameter estimates obtained by Smith et al., are consistent with the feature-comparison model*, their RT and error rate data are also consistent with the large class of models that predict a positive correlation between error rates and RT. While this correlation is indeed predicted by the feature-comparison model (see Smith et al., 1974) it is in fact a general empirical result commonly obtained not only in semantic-memory studies but in other RT studies as well (e.g., Clark & Chase, 1972; Meyer & Schvaneveldt, 1971; Posner, 1970). RT and error rate are generally taken to be convergent measures of item difficulty. Consequently, the burden of proof remains with Smith et al., to demonstrate that this relationship reflects processes specific to semantic decision-making, rather than more general response strategies typically used by subjects in RT experiments.** It should also be noted that the data to which Smith et al., fit their model are drawn from an experiment of a rather problematic design. The subjects’ task was to decide whether an instance was a member of a target category (e.g., bird), but all distractor *For instance, Smith et al., found that the parameter estimate for stage-two duration was longer for small categories (280 msec) than for large ones (161 msec), as the feature-comparison model predicts. However, it is possible that this result was artifact&, since category size was confounded with category discriminability in their experiment. For the small categories offruir and vegrfable, subjects rated the vegetable instances used as more closely related to fnrif than to vegetable (!), while they rated the fruit instances used as nearly as close to vegetable as to @it. Clearly, Ss had problems discriminating between what the_& called ‘true’ and ‘false’ instances for these categories; consequently, Ss’ RTs were slower for these ‘small’ categories than for statements about instances of the large categories, animal and plant. However, Smith et al., did not introduce a parameter to account for category discriminability. This difference between the categories therefore had to be reflected by some other parameter. The most likely candidate for this role is the estimate of stage-two duration, since different parameters were estimated for the stage-two duration of large and small categories. Accordingly, the increase in RT for small categories resulting from the difficult discrimination between ‘true’ and ‘false’ instances may have been reflected in the parameter estimates by a longer estimate of stage-two duration for small categories. (We thank L. Glass for this suggestion.) **Smith ct al., also found that error RTs are faster than correct RTs, as their model predicts. However, it is not clear whether this is a general effect, as we have found no consistent relationship between error and correct RTs in data of our own. Even if the effect is general, it may be accounted for by a plausible strategy of how response speed might be traded off against accuracy. Subjects may tend to gradually speed up their responses over trials until they make an error. The occurrence of an error may then make the subject momentarily more cautious, and hence slower and more accurate (Rabbitt, 1966). Then he may begin a gradual decrease in RT until the next error. Such a cyclic pattern of response times would result in faster RTs for errors than for correct responses. Furthermore, since difficult sentences require more processing time for a correct decision, and thus arc more likely to produce an error if the subject tries to respond quickly, this strategy would also produce a positive correlation between error rate and correct RT across conditions.
320
Arnold I,. Glass and Keith J. Holyoak
instances were drawn from a single non-target category (e.g., insect). For this example, the subject could therefore logically decide to respond ‘true’ not only by verifying that a given instance (e.g., canary) was a bird, but also by verifying that it was not an insect. $onversely, he could respond ‘false’ either after he verified that the instance (e.g., termite) was not a bird, or after he verified that it was an insect. The obtained pattern of RTs suggests that subjects in fact used all these possible decision strategies. True RT was found to depend not only on the relatedness of the instance to the target category (e.g., canary to bird), as the feature-comparison model would predict, but also on the relatedness of the instance to the non-target category (e.g., canary to insect). Both of these variables influenced false RT as well. Smith et al., offer only a post hoc explanation of these unexpected effects, suggesting that some subjects used different strategies than others, or that subjects varied their strategies from trial to trial (see their Footnote 9). It is clear, however, that these results were not predicted by the feature-comparison model, nor reflected in the parameters of the mathematical model. Nor is it clear that the pattern of results to which Smith et al., fit their model would generalize to a situation in which distractor items were drawn from a variety of categories. In summary, the feature-comparison model predicts that the time used to make semantic decisions will be determined by two variables: Semantic relatedness and category size. However, these predictions are not unambiguously supported by available data. Production frequency appears to predict true RT more accurately than rated relatedness, while category size does not predict RT at all. In addition to these empirical difficulties, there are a number of further conceptual problems with the feature-comparison model. A major problem is that the model does not appear to be specified in terms of explicit mechanisms. For example, what mechanisms might plausibly allow a holistic comparison of the type that is postulated to occur in stage one? How are subject and predicate features matched so that a hypothetical overall relatedness estimate can be calculated swiftly? This process must presumably take place without’ identifying feature dimensions, or else it would seem that a more reasonable and parsimonious strategy would be to compare defining features of the subject and predicate immediately, as is done in stage two of the hypothesized process. Until these questions have been answered, it is impossible to tell how the first stage of the model could ever be executed.
2. An alternative
approach:
Ordered marker-search
Semantic representation
We will now describe a marker structure (Katz, 1972) in sufficient detail to account for the verification of sentences quantified by all or some (e.g., All canaries are birds, Some birds are canaries). These types of sentences are logically equivalent to the cate-
Alternative conceptions of semantic memory
Figure 1.
pet canary -
321
Hypothetical marker structure, illustrating word-to-marker and marker-tomarker associations.
canary
h
CANARY
gorization tasks to which the feature-comparison model has been explicitly applied (e.g., deciding that canary is an instance of the category bird, or that the category bird contains the instance canary). No attempt will be made to describe how relationships other than category membership could be represented, such as possession (has) or ability (can). Thus the only verb which can be represented in the structure to be described is the present tense of the copula (is, are). Discussion of the extendability of the two models will be postponed until the final section of this paper. A portion of such a marker structure is diagrammed in Figure 1. In this figure labels representing words are in capital letters, while labels of markers are in small letters. Thus BIRD represents a word, while (avian) stands for a marker. Where possible, markers are labeled with adjectives to emphasize that they are best thought of as properties (e.g., (animate)), rather than as categories or exemplars. For many markers no appropriate English adjective with which to label them exists; these are labelled with nouns. Thus the marker labeled (canary) stands for an abstract concept roughly equivalent to “possessing the essential properties of a canary”. As Figure 1 illustrates, we assume that most common words are directly associated with a single marker in the attribute structure. In each case we will refer to this marker as the ‘defining’ marker for the particular word. For example, (avian) is the defining marker for bird. In the case of words with multiple meanings (e.g., bank), each sense of
322
Arnold I,. Glass and Keith J. Holyoak
the word will access a different defining marker. Note that no single word is associated with the marker labeled (pet canary). This illustrates our assumption that only a subset of the markers in memory are’ directly associated with words. But we assume that all markers can potentially be accessed during the search procedure used to verify sentences. The marker-search theory has two basic structural assumptions. First, we assume that markers are interrelated in such a way that any marker stands for, or dominates, a set of further markers associated with it. In Figure 1 these associations are represented by arrows. For example, the arrows pointing from (avian) to (animate) and (feathered) indicate that (avian) stands for the set {(animate), (feathered)}. The property of containment represented by the arrows is transitive. Thus in Figure 1, since the marker (robin) implies ~ that is, dominates ~ the marker (avian), it must also dominate (animate) and (feathered). In other words, (robin) stands for the set of markers {(avian), (animate), (feathered)}. Also, by definition any marker dominates itself. The arrows in Figure 1 are labeled in order to illustrate the second basic structural assumption of the model - that information about contradictions is represented in memory through the associations between markers. In the network structure a contradiction arises whenever two arrows labeled with the same greek letter meet at a common node. In the figure arrows which connect animal species to (animate) are labeled with an (Y,while the arrow connecting (pet) to (animate) is labeled with a 0. Thus a contradiction arises at (animate) between (avian) ((Y,) and (canine) (o$), but not between (avian) and (pet) (/3). This notational system simply represents our intuitive knowledge that a bird cannot be a dog, but might possibly be a pet. Note that no significance is attached to the labels used (Q, p, etc.). The essential point is simply that intersections are of two types ~ contradictory and non-contradictory. It should also be noted that in Figure 1 the marker (robin) dominates (avian) while (collie) dominates (canine). It therefore follows from the transitive properties of the arrows that a contradiction of the sentence A canary is a collie also arisls at (animate). This treatment of redundancy rules and antonymy is based on the proposals of Katz (1972). The present notation is a variant of that used by Katz, who introduces superscripted markers to form antonymous n-tuples. The notation we are using is similar to that suggested by Bierwisch (1969), and has the advantage of capturing the close relationship between redundancy rules and contradictions. The different logical relationships between concepts that are denoted by the quantifiers all and some (All S are P and Some S are P) can be specified in terms of the relationships between the defining markers of the subject and predicate (see Table 1). An All-statement is true if the defining subject marker dominates the defining predicate marker. For example, since in Figure 1 an arrow points from (avian) to (animate) it follows that All birds are animals is true. Similarly, since (avian) also implies (feathered), All birds are feathered is true as well. The truth conditions for some are slightly different. A Somestatement is true if there exists a marker from which arrows lead to both the defining subject marker and the defining predicate marker. The sentence Some animals are feathered is therefore true, since (avian) satisfies the above criterion.
Defining subject marker dominates defining predicate marker
A marker dominates both defining subject marker and defining predicate marker
All
Some
True sentences
Defining subject marker contradicts defining predicate marker
Some brras are canaries.
Some birds are pets.
Defining subject marker contradicts defining predicate marker A marker which dominates defining subject marker contradicts defining predicate marker
Criterion
All birds are animals.
Examples
False sentences
Decision criteria for true and false sentences quantified by All or Some
Criterion
1.
Quantifier
Table
Some birds are dogs
Contradictory.
All birds are canaries.
Counterexample.
All birds are dogs.
Contradictory.
Examples
324
Arnold L. Glass and Keith J. Holyoak
As discussed above, falsity based on contradiction can also be explicitly defined in terms of the relations among arrows. Both All- and Some-statements are false if a contradiction exists between the defining subject marker and the defining predicate marker; i.e., if similarly-labeled arrows from the two defining markers meet at a common node. For example, we have already seen that (avian) contradicts (canine); consequently, All/ Some birds are dogs is false. In addition, an All-statement can be falsified by a slightly different type of contradiction. An All-statement is also false if there exists a marker which dominates the defining subject marker, but contradicts the defining predicate marker. All birds are canaries is therefore false because (robin) satisfies the above criterion, and thus serves as a counterexample to the sentence - it stands for a bird that is not a canary. Notice that this type of false All-statement would be true if the quantifier were some (e.g., Some birds are canaries). This kind of false AU-statement will be referred to as a “Counterexample” sentence. The distinction between Counterexample sentences and those falsified by a direct contradiction (Contradictory sentences) will play a critical role in predicting the time taken to reject false sentences. An interesting property of this representation is that it allows the person to be uncertain about the truth of certain sentences. For example, the sparse network of Figure 1 does not contain the information that some dogs are pets, since there is no marker equivalent to (pet dog) which dominates both (dog) and (pet). On the other hand, (dog) and (pet) have a non-contradictory intersection at (animate), indicating that a dog might be a pet. So if this memory were probed with the sentence Some dogs are pets, neither the criteria for a ‘true’ nor for a ‘false’ decision could be satisfied, and the appropriate response would be something like “It’s possible, but I’m not sure”. A real subject might give a similar answer to a sentence about some really obscure fact such as Some anteaters are pets.
We should point out that our discussion of structural representation has left unresolved a number of serious problems associated with marker theory. We have nothing to say about what elements in the marker system are primitive, except to agree with Katz (1972) that this question cannot be decided a priori, but only after considerable empirical study. Other important issues that have been ignored concern how a marker system could be acquired developmentally. For instance, how are redundancy relations generated between markers, and what determines whether an intersection between two relations will be marked as contradictory ? While the evidence we will discuss below is solely from studies of sentence verification by adults, the adequacy of any representational proposal cannot be firmly established without studies of how a system of information can be acquired. Searching the marker structure
According to Katz (1972), the set of markers that form the dictionary entry for a word is unordered. Accordingly, while the hierarchical structure in Figure 1 serves to represent
Alternative conceptions of semantic memory
325
the logical relations between markers, it is not necessary to assume that it indicates the order in which the markers are accessed to verify a sentence. If this assumption couZd be made, it would provide a strong empirical and formal constraint on network representations (Collins & Quillian, 1969). However, there is now ample evidence that people sometimes access relatively abstract markers more quickly than those that are less abstract. For instance, people can decide that scotch is a drink more quickly than they can decide that it is a liquor (Smith et al., 1974). Examples such as these demonstrate that the order in which markers are accessed is not always hierarchical. However, the fact that the marker set is unordered in Katz’s theory does not mean that the markers are accessed in some random fashion. It is possible to specify a performance model based on ordered, though non-hierarchical, search procedures. This possibility can be realized by modifying the structural representation in Figure 1 to include additional redundancy-rule pathways. For example, we will allow the possibility of a direct link between (canary) and (animate), as well as the illustrated pathway from (canary) to (avian) to (animate). To establish an ordering between any two alternative pathways, we can assume either that one of the pathways has a higher probability of being searched first, or that both pathways are searched in parallel with one requiring less time to traverse. Postulating additional redundancy rules immediately raises a serious question: How is the model to be constrained? Abandoning a strictly hierarchical representation leaves the model without any formal constraint that would prevent each node from being directly connected to every node that it dominates. It would be possible to fit the pattern of RTs from any sentence verification task post hoc, simply by adding to the structural representation whatever additional connections are required. The price of this freedom, of course, is that the model would be rendered entirely vacuous. But while no formal constraints on the representation prevents this outcome, it may be possible to find empirical constraints that will yield specific, testable predictions. This is the strategy that is followed in the studies to be reviewed below. First an empirical measure of search order, independent of RT, is identified. Since the same representation is presumably used in all tasks that depend on the kind of semantic information we have described, the results from one such task should predict performance on the other. Secondly, verification of different sentences will sometimes depend on the same semantic information. The relative speed with which a particular bit of information can be accessed should therefore determine the RT to verify a number of different sentences. In general, the complexity of the representation that will be required to account for sentence verification is an empirical issue. If it eventually becomes necessary to postulate unlimited redundant connections in the network, the marker-search model will be rejected as unworkable. But if sufficiently strong empirical constraints can be maintained, the model will have explanatory value. The marker-search model takes the order in which markers are searched to be the underlying variable determining differences in semantic decision time. In the studies dis-
326
Arnold L. Glass and Keith J. Holyoak
cussed below, we assume that the search procedure is subject to the following constraints: (1) During sentence verification the markers accessed by a word include not only those markers dominated by the defining marker of the word, but also those which dominate the defining marker. For example, consider the word bird in Figure 1, which is defined by the marker (avian). We assume that the marker search-set for the word bird will include both (animate) (which is dominated by (avian)), and (canary) (which dominates (avian)). This assumption is necessary to explain how the various types of true and false sentences summarized in Table 1 can be verified. Decisions about true AU-statements and Contradictory false sentences are based on a marker dominated by the defining subject marker; decisions about true Some-statements and Counterexample AU-statements, on the other hand, are based on a marker dominating the defining subject marker. (2) The order in which the markers accessed by a word are searched is independent of the particular quantifier or the truth value of the sentence in which the word appears. Thus the markers of building will be accessed in the same order in sentences such as Some buildings are houses and AN buildings are houses. (3) The search will self-terminate as soon as either the criterion for a ‘true’ or for a ‘false’ response (as discussed above) is satisfied. For example, a ‘true’ response will be made to the sentence All birds are animals as soon as (avian) is found to dominate (animate); while a ‘false’ response will be made to the sentence All birds are dogs as soon as (avian) is found to contradict (canine). It would presumably be possible to construct a number of different explicit search mechanisms consistent with the above constraints. These might differ in features such as the degree to which search is serial or parallel. However, the predictions outlined below will not discriminate between the various possible search models based on the semantic representation and search constraints described. 3. Evidence for the marker-search
model
In this section we review the evidence for the search model provided by recent experimental results. The model takes the order in which markers are searched to be the variable that determines differences in semantic decision time. In order to test this model, it was necessary to develop a measure, independent of RT, of the order of marker search. Such a measure was proposed by Glass et al. (1974). They asked subjects to provide true , and one-word completions for incomplete sentences of the form All/Some S are tabulated the frequency with which different words were given as predicates. This constrained association technique is similar to the way production frequency norms are collected from subjects who are asked to produce different instances as responses to a category name (Battig & Montague, 1969). Glass et al., assumed that the frequency with which a word appeared as a completion reflected the probability with which its corresponding defining marker was accessed from the defining subject marker. Clearly production frequency can be at best an imperfect measure of search order, since we have assumed that many markers will not correspond to single common English
Alternative conceptions
of semantic memory
327
words. This problem is particularly acute in the case of anomalous false sentences, in which the subject and predicate words generally differ at the level of abstract markers such as (living) versus (non-living) (e.g., All birds are chairs). These sentences are typically rejected relatively quickly (Kintsch, 1972; Wilkins, 1971); but since the abstract markers on the basis of which they could be rejected seldom define common English words, production frequency is not a valid measure of the speed with which such markers are accessed. The problem of predicting RT to reject anomalous sentences is discussed more fully in Holyoak and Glass (1975). In the present paper we will only discuss sentences for which production-frequency measures make clear RT predictions; i.e., true sentences, Counterexample sentences, and Contradictory sentences in which the subject and predicate words differ with respect to relatively specific markers (e.g., All birds are reptiles).
The marker-search model predicts that true sentences with high production frequency (PF) will be verified more quickly than sentences with lower PF. Glass et al. tested this prediction for sentences with five different quantifiers (All, Many, Some, Few and No), and both noun and modifier predicates (e.g., All birds are animals, All birds are winged). In each case the corresponding PF norms successfully predicted RT. These results extended the findings of Loftus (1973) and Wilkins (1971), who also found that high PF leads to fast true verification RT. The Glass et al., findings have since been replicated by Glass and Holyoak (1974) and Holyoak and Glass (1975). We have also seen that in studies where semantic relatedness was a successful predictor of true RT, it was confounded with PF (Rips et al., 1972; Rosch, 1973). Furthermore, in the one case in which the effects of the two variables have been compared, PF was a much better predictor than was relatedness (Smith et al., 1974). Accordingly, the available data concerning true RT are consistent with the model. Generation of false sentence completions
While previous experiments were primarily concerned with using PF (or relatedness) to predict true RT, the results obtained by Glass et al., for false Many-statements (discussed earlier) indicated that it should also be possible to use such norms to predict RT to reject false sentences. These predictions, to be outlined below, were tested by Holyoak and Glass (1975). As a necessary initial step, Holyoak and Glass collected false PF norms for sentences quantified by all or some. They had 32 Stanford undergraduates generate false completions for sentences of the form All S are and Some S are . The resulting false PF norms were compared with true PF norms compiled in previous work (Glass & Holyoak, 1974; Glass et al., 1974). Several striking relationships between true and false sentence completions provided evidence for the marker-search model. Referring to Figure I, let us consider strategies that people might use to generate false completions of a sentence. One plausible strategy would be to first access a marker dominated by the defining
328
Arnold L. Glass, and Keith J. Holyoak
subject marker, and then use it to compute a contradiction as a response. For example, when presented with AllJSome birds are , the person might follow the arrows from (avian) to (animate), and then from (animate) to (canine). This procedure would generate the false completion dog, producing a Contradictory sentence, as defined in Table 1. If this strategy were actually used by subjects, and search order is independent of the quantifier (as the model assumes), there should be a close relationship between the frequency with which AllJSome birds are dogs is produced as a false sentence, and the frequency with which All birds are animals is produced as a true sentence. Specifically, we expected that each high-PF true completion of All S are from the norms of Glass and Holyoak (1974) and Glass et al (1974), would determine some high-PF Conand Some S are As the examtradictory completion of both All S Are ples given in the top of Table 2 illustrate, this prediction was confirmed. Fourteen of the 16 highest frequency true All-statements (produced by between 35% and 78% of the respondents) corresponded to high-PF Contradictory sentences (produced by from 19% to 56% of respondents). Referring again to Figure 1, let us illustrate a second strategy that subjects might use to generate false All-statements. This strategy depends on our assumption that markers that dominate the defining subject marker are also accessed by the subject word. The subject can therefore go directly from the defining subject marker to a marker that dominates it, and then use this marker to produce a false completion. For instance, the person can simply follow the arrow from given the fragment All birds are the (avian) to the (canary) marker, and re’spond with the word canary. This procedure Examples
Table 2.
of relationships between high-frequency
true and false sentences
______ Contradictory -
__-_
falses False
Trlle -
-
All birds are animals.
All/Some
birds are dogs.
All chairs are furniture.
All/Some
chairs are tables.
All women
All/Some
women
All/Some
diamonds
are humans/females.
All diamonds Counterexample
are stones.
Some flowers
are emeralds.
falses False
True Some prisoners
are males.
are roses. arc men.
All flowers All prisoners
are roses. are men.
Some books arc novels.
All books are novels.
Some teachers
All teachers
are professors.
are professors.
Alternative conceptions
of semantic: memory
329
would produce Counterexample sentences, such as those shown in Table 2. Note that exactly the same procedure would produce a true sentence if the quantifier were some (e.g., Some birds are canaries). Since the model assumes that search order is independent of the quantifier, our second prediction was that each high-PF true Some-statement (from the earlier norms of Glass and Holyoak and Glass et al.) would correspond to a high-PF Counterexample All-statement. This prediction was confirmed. The bottom of Table 2 lists four examples of the 22 highest frequency true Some-statements from the earlier norms (given by from 22% to 89% of respondents). Each of these 22 true Somestatements (e.g., ‘Some flowers are roses) corresponded to a high-frequency false Counterexample sentence (e.g., Allflowers are roses), given by from 16% to 53% of respondents. A comparison of true and false sentence completions therefore supported the assumption that contradictions can be found in memory. People appear to access the same markers in producing false as well as true sentence completions, except that they contradict the marker in generating false completions. Rejection
of false sentences
The central evidence that discriminates between the marker search and feature-comparison models concerns RT to reject false sentences (Holyoak & Glass, in preparation). The marker-search model predicts that disconfirmation of meaningful false sentences (i.e., Contradictory and Counterexample sentences) requires discovery of a contradiction. Consequently, the sooner the person can access a marker that brings out a contradiction between the subject and predicate, the quicker such sentences will be rejected. For both Contradictory and Counterexample sentences, the order in which contradictions are discovered should be predicted by our production frequency norms; however, the variable that determines false RT should be quite different for these two kinds of false sentences. Contradictory sentences (e.g., All/Some birds are dogs) contain predicates that directly contradict the subject (See Table 1). For such a sentence, its production frequency was taken as an index of the speed of accessing the marker (e.g., (animate)) that produces a contradiction between the subject and predicate. For Contradictory sentences, then, false statements that have high frequency in the norms should be rejected more quickly than false statements given with low frequency in the norms. In agreement with this prediction, Holyoak and Glass found that high-PF Contradictory sentences quantified by all or some were rejected significantly more quickly than low-PF Contradictory sentences (1319 versus 1468 msec). This result extended the previous findings for false Manystatements obtained by Glass et al. (1974). Furthermore, Holyoak and Glass had subjects rate the relatedness of the subject and predicate words for each of their false sentences, and found that their high-PF sentences were rated as significantly more related than their low-PF sentences. Since the feature-comparison model of Smith et al., (1974) predicts that high-related false sentences will be rejected slower than less related false sentences, these RT results are opposite to the prediction of the feature-comparison model.
330
Arnold L. Glass and Keith J. Holyoak
In Counterexample sentences (e.g., All birds are robins), the predicate does not directly contradict the subject. The marker-search model therefore predicts that in order to reject this type of sentence, the person must discover some marker (representing an exemplar) that dominates the defining subject marker (e.g., (canary)) and that contradicts the predicate. Accordingly, the RT should be fastest for those sentences for which a disconfirming counterexample was produced most frequently as a true Some-completion. Specifically, the RT to reject a Counterexample sentence such as All birds are robins should be faster the higher the frequency with which the most common counterexample (e.g., ‘canary’) was given as a true completion of the sentence Some birds are Since there is no direct contradiction between the subject and predicate for this type of sentence, the production frequency of the sentence itself (bird to robin) should have no appreciable effect upon the time to reject it. To test this prediction, Holyoak and Glass selected Counterexample sentences in which the PF of the most common counterexample was varied orthogonally with the PF of the sentence itself. As predicted, sentences with high-PF counterexamples were rejected significantly more quickly than sentences with low-PF counterexamples (1397 versus 1506 msec), while the PF of the sentence itself had no significant effect on RT. Since PF and semantic relatedness were again positively correlated, the feature-comparison model is unable to account for these results. To summarize, experimental evidence supporting the marker-search model comes from several sources. The model accounts for observed semantic relationships between true and false sentence completions. It predicts the strong correlation between production frequency and RT to verify true sentences. Most strikingly, the model successfully predicts the RT to reject meaningful false sentences (both Contradictory and Counterexample sentences). The same data for false sentences disconfirm a major prediction of the feature-comparison model, namely, that false sentences with subject and predicate words closely related in meaning are necessarily slow to be rejected.
4. Issues in semantic representation In this section we examine some theoretical issues that can be highlighted by a comparison of the feature-comparison and marker-search models. These issues center on the distinctions between set-theoretic and network models outlined earlier, and particularly on the differing conceptions of the representation of word meaning which emerge from the two approaches. Semantic redundancy
rules and category size
The marker-search model postulates abstract markers representing the entire set of elementary components associated with the definition of a word (e.g., the marker (avian)
Alternative conceptions
of semantic memory
331
would dominate all the markers that define the word bird). The markers that define the subject and predicate can then be matched directly during sentence verification; e.g., to verify All canaries arc birds the system need only discover that a set inclusion relation holds between (canary) and (avian). But should it be necessary, more elementary components can also be recovered by means of associations representing semantic redundancy rules (Bierwisch, 1969; Katz, 1972). Thus, since a set-inclusion relation holds between (canary) and (avian), and also between (avian) and (animate), the sentence All canaries are animafs can also be verified. While the latter sentence requires a longer search, the final match that allows a decision is again between just two markers ((canary) and (animate)). Accordingly, the ‘size’ of a category, in terms of the number of elementary components associated with the defining marker of a word, is not a relevant variable in predicting RT. In contrast, the feature-comparison model does not postulate markers that represent sets of meaning components. The features of such a model are not structured hierarchically. This theoretical distinction yields a very different description of the verification process than that given by the marker-search model. In the second stage of the featurecomparison model, verification of All canaries are birds would require matching the set of defining features of bird (e.g., ‘feathered’, ‘egg-laying’, ‘breathing’, ‘solid’, etc.) with the defining features of canary. If the predicate were animal, its feature list would be shorter, so that second-stage processing should require less time. But as indicated in the first section, there is no evidence that ‘this prediction holds. The introduction of redundancy rules as a psychological construct thus serves at least two functions: It simplifies the search process presumed to occur during verification, and it explains the negative finding that sentences with relatively abstract predicate categories are not verified any more quickly than sentences with less abstract predicates, as long as production frequency is controlled. The defining/characteristic
distinction
A related issue concerns Smith et al.‘s proposal that features can be weighted by their degree of ‘definingness’, ranging from clearly defining to simply characteristic features. They cite Lakoffs (1972) analysis of hedges as major linguistic support for this defining versus characteristic distinction among features. In particular, three hedges are alleged to clearly differentiate the types of features shared by the subject and predicate of certain sentences. According to their analysis, the sentence A robin is a true bird is acceptable because robin shares both the defining and characteristic features of bird, since the features characteristic of a category are those that define common instances. The sentence Technically speaking, a chicken is a bird is acceptable because chicken shares the defining but not the characteristic features of bird. Finally, the sentence f~oosely speaking, II bat is a bird is acceptable because bat shares the characteristic features of bird, but not the defining ones. If the hedge in any of these examples is replaced with one
332
Arnold L. Glass and Keith J. Holyoak
of the other hedges (e.g., Technically speaking, a bat is a bird), the resulting sentence is less acceptable, since the subject and predicate do not share the type of features specified by the hedge. Smith et al., conclude that this linguistic result is best explained by assuming that features differ in the degree to which they define words. However, one may explain such hedges in another way. It seems that certain common English words have at least two definitions. One is a popular definition, often learned early in life, such as the fact that a bird is a small flying animal with wings. The other is a technical definition, first agreed upon for some specific purpose by scientists or lawyers, and eventually picked up by dictionary writers and imposed upon the general public. The biological definition of a bird is an example of a technical definition. When an instance fulfills the requirements of both the technical and popular definitions, it may be said to be ‘a true’ member of the category (as robin is for bird). When an instance fulfills only the technical definition (e.g., chicken for bird), then we say ‘technically speaking’; when only the popular definition is satisfied (e.g., bat for bird), we say ‘loosely speaking’. This explanation of hedges can account for the unacceptability of certain sentences that create problems for Smith et al.‘s defining/characteristic explanation. For instance, if we use a category as the subject of a sentence, and a ‘true’ instance as the predicate, the subject will still share characteristic features with the predicate. Accordingly, the hedge ‘loosely speaking’ should apply, producing such examples as Loosely speaking, a bird is a robin. But such sentences are unacceptable. A similar difficulty occurs with two ‘true’ instances of a category, which surely share many characteristic features. The Smith et al., analysis therefore predicts that a sentence such as Loosely speaking, a robin is a canary should be acceptable, but again it is not. The defining/characteristic explanation of hedges requires some additional assumption to explain these cases.* In contrast, the unacceptability of these latter sentences follows directly from our popular/technical explanation, since in neither of these cases does the predicate appear to have two definitions, nor does the subject satisfy any definition of the predicate. Furthermore, one suspects that the as yet unspecified characteristic and defining features on which the hedges ‘technically’ and ‘loosely’ are supposedly based can never be specified. Do people know that the sentence Technically speaking, a whale is a mammal is true because a defining feature for mammal is shared by whale? Or do they know it is true simply because they know that there exists a technical definition of mammal that includes whale, even though they don’t know what it is? In this case, the technical definition actually becomes definition by enumeration. *One such additional assumption would be that the predicate category must always bc more general than the subject category. However, it is not clear that an adequate metric of generality can be specified in terms of the feature-comparison model. One might suppose that more general categories are those with fewer defining features. But this tack allows comparisons of generality only when the categories arc logically nested. But two instances, such as canary and robin, are not nested; consequcntly, the ‘number of defining features’ metric does not specify how we know that canary is not more general than robin. There is no apparent a priori reason to suppose that all instances have an equal number of defining features.
Alternative conceptions
of semantic memory
333
An explanation of these hedges in terms of popular and technical definitions has certain testable implications. For instance, we would not expect people in primitive ‘non-technical’ cultures to make a distinction equivalent to the difference between ‘loosely speaking’ and ‘technically speaking’. Also, since children presumably acquire popular definitions of words earlier than technical definitions, young children should classify an instance as a ‘correct’ member of a category if and only if it satisfies the popular definition. For example, young children presumably would classify a bat as a bird, and a whale as a fish. The status of network models
Smith et al., cite the evidence from hedges as a general source of difficulty for network models of semantic memory, of which the marker-search model is an example. They do point out that a network model could incorporate the notion of popular and technical definitions of words by including separate markers for the two definitions. But Smith et al., object to this tack as unparsimonious, presumably because it leads to a proliferation of markers. But it is not clear that this solution is less parsimonious than that offered by the feature-comparison model, which is to assume that a weight indicating ‘definingness’ is stored with every feature-category pair in memory. That is, given a model based on meaning components, is it more parsimonious to add a finite number of extra components (e.g., popular and technical definitions) or a completely new theoretical mechanism (e.g., definingness weights)? We should make clear that every use of a hedge does not require a network model to postulate an additional definition for a word. Thus Smith et al., suggest that the acceptability of the sentence A decoy is a fake bird requires a network model to postulate an additional definition of bird representing ‘pseudo-instances’. But in terms of the semantic theory of Katz (1972) the definition of fake would be represented by markers corresponding to ‘intended to appear as’. Through syntactic and semantic amalgamation rules, a fake bird would thus be an object intended to appear as a bird. Any word with defining markers that contain the markers of this compound (such as decoy) would be correctly classified as a fake bird. This treatment of the meaning of a noun plus modifier could be incorporated into a network model. In contrast, Smith, Rips and Shoben (1974) claim that fake is used when the subject and predicate share only characteristic features. However, this proposal is clearly incorrect, as is demonstrated by the unacceptability of the sentence A bat is a fake bird. While a bat, just like a decoy, both looks like a bird and isn’t a bird, its resemblance is essentially accidental rather than intentional. Consequently, this sentence is unacceptable. It is not clear, however, how the feature-comparison model could distinguish between the acceptability of these two examples simply on the basis of feature overlap. A second criticism that Smith et al., direct at network models is the finding that typical instances are categorized more quickly than atypical ones (Rips et al., 1973;
334
Arnold L. Glass and Keith J. Holyoak
Rosch, 1973; Smith et al., 1974). Smith et al., discuss two ways in which network models could incorporate this finding. It may be that intermediate nodes are interposed between the markers representing an atypical instance and the category (e.g., (chicken) might first imply (domestic bird), which then in turn implies (avian)). Also, a network model could account for the effect of typicality on RT simply by assuming that the defining marker of the category is higher on the search list for typical instances than for atypical instances. Both these possibilities seem plausible. As Smith et al., point out, typicality effects pose difficulties only for those network models that assume that search order somehow mirrors the logical structure of the concepts (Collins & Quillian, 1969). But since the marker-search model does not make this assumption, it avoids these difficulties. To illustrate, consider how the marker-search model handles one typicality result obtained by Smith et al., which they view as particularly troublesome for network models. To use their example, they found that robin is more typical than chicken for the superordinate bird, whereas chicken is more typical than robin for the superordinate animal; moreover, this interaction is reflected in verification RT. This result can be represented very simply in terms of the marker-search model. The markers (chicken) and (robin) can both have direct associations to both (avian) and (animate). If search is serial, in each case there must be a trade-off in terms of whether the association to (avian) or to (animate) is searched first. Which ever has priority, the other must be accessed more slowly. Production frequency norms would presumably indicate that for (chicken) the association to (animate) is searched relatively early, whereas for (robin) the association to (avian) has priority. Consequently, people will be relatively quick to verify both A chicken is an animal and A robin is a bird, but slow to verify A chicken is a bird and A robin is an animal. Not only is this result consistent with the marker-search model, it in fact provides evidence for the assumption that search is reliably ordered. The basic issue, therefore, is whether the effects of typicality are best explained by a model based on continuous variation in degree of category membership, or by a model based on the discrete nodes of a network structure. The linguistic and experimental results cited by Smith et al., do not discriminate between these two conceptions of semantic memory. Can the feature-comparison
model be extended.?
In order to handle the RT results reviewed in the previous section, the feature-comparison model could incorporate ordered search strategies of the type we have described as an elaboration of the second processing stage. ‘False’ decisions could be based on the discovery of a contradiction, defined in terms of a relationship between features. But it would then appear that the first stage of the revised model would either be superfluous or redundant, unless RT data can be found that could not be explained by this new secondstage process alone. If the processing mechanisms of the feature-comparison model were to become more clearly specified, it might therefore become more similar to a network model.
Alternative
conceptions
of semantic memory
33.5
It is important to evaluate current theories not only in terms of how well they handle available data, but with respect to their prospects for extension to the wider range of phenomena with which cognitive psychology must eventually deal. So far semanticmemory research has been almost exclusively concerned with meaning-comparison tasks involving just two words, sometimes with variations in the quantifier. The feature-comparison model can account for some of this data reasonably well. But if we take as our goal the description of the psychological representation of meaning, our theories must eventually analyze verbs, and allow us to represent the meaning of sentences of considerable complexity. Will it be possible to extend the feature-comparison model to accomplish this task? There is reason to doubt it. To understand the apparent theoretical deficiencies of the feature-comparison model, it is helpful to distinguish between two aspects of meaning, which in linguistics are termed ‘reference’ and ‘sense’ (Frege, 1952). ‘Reference’ refers to the relationships between words and objects or events in the world, while ‘sense’ refers to the relationships of words to other words - that is, to their meanings within the linguistic system. Both of these aspects of meaning represent important problems for psychology, and in many respects they are clearly interrelated (e.g., in regard to the acquisition of word meanings). Nevertheless, the conceptual distinction between reference and sense is an important one for theories of semantic memory. The intuition underlying the feature-comparison model - that some instances are more ‘typical’ members of a category than others - essentially concerns referential meaning. However, the verification tasks that have provided the data base for semantic-memory models require the subject to rapidly compare word meanings - that is, to evaluate the ‘sense relations’ between words. The main structural assumption of the feature-comparison model, that word meanings are represented by sets of features weighted on their definingness, constitutes a hypothesis concerning the representation of sense relations. The feature-comparison model has thus been directed at the problems of both reference and sense, although the distinction has not been kept clear. The marker-search model, on the other hand, is directly concerned only with sense relations. In the framework of the latter model, a major theoretical problem is to specify how a system of semantic markers can be mapped onto the perceptual system to account for people’s basic ability to use words to refer to objects and events in the world. However, in the present paper we are solely concerned with evaluating semantic-memory models as accounts of the representation of sense relations. We will argue that the feature-comparison model lacks the theoretical power to represent a variety of concepts. As a set-theoretic model, the feature-comparison model treats features as if they were independent ‘units’ of meaning. The decision process in both of the two stages in the feature-comparison model can be conceptualized as the ‘summing up’ of information from a set of independent comparisons between pairs of features. The principal theoretical device of this model - the weighting of features with respect to their definingness - thus makes it disturbingly similar to a class of weighted-feature models
336
Arnold
L. Glass and Keith J. Holyoak
(perceptrons) that Minsky and Papert (1969) have proved to be in principle inadequate as theories of pattern recognition. The basic problem with such models is that they are unable to recognize visual properties such as connectedness, which depend on the relationships between features. If concepts in natural language can also be shown to be relational, this would suggest that perception-type models will also prove inadequate in the domain of language. With respect to the representation of sense relations, a set-theoretic feature theory faces two hurdles - first, the problem of specifying a set of independent features; and second, the necessity to demonstrate that these features are sufficient to define concepts in natural language. It seems possible that these hurdles will prove insurmountable. A simple set-theoretic model can represent the relation of containment, as expressed in a sentence such as A canary is a bird, by assuming that the set of features defining the predicate is a subset of the features defining the subject. But how could it represent the relation of possession, which is expressed by a verb such as has (e.g., The man has a car)? This sentence does not refer to an entity that combines the features of man and car; nor does the sentence imply that man and car share some features. It does not seem possible to represent the abstract relationship expressed by has (or many other verbs, such as can and does) by combining sets of features using the simple boolean operations to which set-theoretic models have restricted themselves. If the relation of possession cannot be represented by a set-theoretic feature model, then such models will also be unable to account for category membership, the problem which the feature-comparison model allegedly addressed directly. For example, instances of money are defined by their function as a medium of exchange. If we then analyze the concept ‘exchange’, we find it refers to an event in which possession of one object is superceded by possession of another. Then if we analyze the concept of ‘possession’, we find that it is similar or identical to the relationship expressed by the verb has, which it appears cannot be represented by a set-theoretic feature model. It follows that there can be no set of independent defining features for the category money; rather, membership in this, and many other categories (e.g., toy, government, pet, game) is defined by a relational decision rule. Such rules cannot be represented in terms of sets of independent features. Consequently, it appears that set-theoretic feature models, such as the featurecomparison model,are not sufficiently powerful to account even for category membership. It might be argued that any theory of sense relations will ultimately be reducible to a theory of reference. Indeed, the initial plausibility of the feature-comparison model is largely dependent on the fact that it has been mainly applied to categories with clear perceptual referents, so that the postulated features have referential meaning. For example, one might suppose that many of the defining features of canary can literally be seen (e.g., ‘yellow’, ‘small’, ‘flies’, ‘sings’, etc.). However, Smith et al., have not answered the classic objections to the assumption that sense relations can be explained by a theory of reference (Frege, 1952; Katz, 1972). Accordingly, we presume that these objections stand against the feature-comparison model if it is considered to be such a theory.
Alternative conceptions of semantic memory
337
It might also be asked, of course, whether network models can be extended to represent semantic relations other than containment (e.g., possession). While this is an open question, there is reason to believe they may. For instance, several network theorists (Anderson & Bower, 1973; Rumelhart, Lindsay & Norman, 1972; Schank, 1972) have used labeled relational arrows in order to represent different formal relationships between components. In linguistics, Katz (1972) has presented detailed proposals for representing the meanings of complex concepts in forms equivalent to graph structures. Thus while no network theory has yet been developed to deal with the full complexity of meaning, it at least may have the potential. Given the theoretical deficiencies of current set-theoretic models, it seems likely that if such models are elaborated they will become equivalent to a network representation. But at the same time, it is clear that network models also face numerous serious difficulties in accounting for meaning, some of which have been mentioned in passing. The central issue is how network representations can be constrained. The distinction between set-theoretic and network models is, at heart, a difference in the conceptual resources that the two allow. Set-theoretic models are conceptually meagre and bareboned, and as a result seem unable to capture the variety of tricks the human memory seems capable of performing. Network models face quite a different problem, for there are far fewer constraints on the kinds of representations they allow. The marker-search model has attempted to establish a few empirical constraints for a tiny fragment of English. While this attempt has met with some success, problems remain even here. If any such model is to be extended to other areas of language, it is essential that additional constraints be found. This paper has focussed directly on the relative merits of two particular classes of psychological models of semantic memory. Nevertheless, we believe this discussion has significance for linguistics and philosophy as well, since it provides evidence that is inconsistent with some theories in these domains while demonstrating an empirical basis for others. If no psychological process model that can account for sentence verification results is consistent with the claim that truth is a continuous dimension, then a linguistic competence model based on that claim is unlikely to be justified. On the other hand, the psychological evidence that supports the marker-search model provides a broader empirical basis for the existence of those constructs (markers) that Katz (1972) needs to construct his definition of analyticity. The choice between models thus reflects not only narrow concerns within psychology, but also basic disagreements about the nature of human understanding. REFERENCES Anderson, J. R., & Bower, G. H. (1973) Human Associative Memory. Washington, V. H. Winston & Sons. Bierwisch, M. (1969) On certain problems of semantic representation. Found. Lang., 5, 153-184. Clark, H. H. (1973) The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. J. verb. Learn. verb. Beh., 12. 335-359.
338 Arnold 1,. Glass and Keith J. Holyoak
Clark,
Ii. H., & Chase, W. G. (1972) On the process of comparing sentences against pictures. Cog. Psychol., 3, 4122517. Collins, A. M., & Quillian, M. R. (1969) Retrieval time from semantic memory. J. verb. Learn. verb. Beh., 8, 240-248. l:rege, G. (1952) On sense and reference. In Geach, P., & Black, M. (Eds.), Translations from the Philosophical Writings of Gottlob Frege. Oxford, Basil Blackwell & Mott. Glass,. A. L., & Holyoak, K. J. (1974) The effect of some and all on reaction time for semantic decisions. Mem. Cog., 2, 436-440. Glass, A. L., Holyoak, K. J., & O’Dell, C. (1974) Production frequency and the verification of quantified statements. J. verb. Learn. verb. Beh., 13, 231-254. Holyoak, K. J., & Glass, A. L. (In press) The role of contradictions and counterexamples in the rejection of false sentences. J. verb. Learn. verb. Beh. Katz, J. J. (1972) Semantic Theory. New York, Harper & Row. Kintsch, W. (1972) Notes on the structure of semantic memory. In Tulving, E., & Donaldson, W. (Eds.), Organization of Memory. New York, Academic Press. Lakoff, G. (1972) Hedges: A study in meaning criteria and the logic of fuzzy concepts. Papers from the Eighth Regional Meeting, Chicago Linguistics Society. Chicago, University of Chicago Linguistics Department. Landauer, T. K., & Meyer, D. E. (1972) Category size and semantic memory retrieval. J. verb. Learn. verb. Beh., 11, 539S549. Loftus, E. F. (1973) Category dominance, instance dominance, and categorization time. J. exp. Psychol., 9 7, 70-94. Meyer, D. E. (1970) On the representation and retrieval of stored semantic information. Cog. Psychol., I, 242-300. Meyer, D. E., & Schvaneveldt, R. W. (1971) Facilitation in recognizing pairs of words: evidence of a dcpendencc between retrieval operations. J. exp. Psychol., 90, 227-234. Minsky, M., & Papert, S. (1969) Perceptrons. Cambridge, The M.I.T. Press. Posner, M. I. (1970) On the relationship between letter names and superordinate categories. e. J. exp. Psychol., 22, 219-287. Rabbitt, P. M. A. (1966) Errors and error correction in choice response tasks. J. exp. Psycho/., 71, 264-272. Rips, L. J., Shoben, E. J., & Smith, E. E. (1973) Semantic distance and the verification of semantic relations. J. verb. Learn. verb. Beh.. 12, l-20. Rosch, 1~. R. (1973) On the internal structure of perceptual and semantic categories. In T. M. Moore (Ed.), Cognitive Development and Acquisition of Language.New York, Academic Press. Rumclhart, D. E., Lindsay, P. H., & Norman, D. A. (1972) A process model for long-term memory. In E. Tulving & W., Donaldson (Eds.), Organization and Memory. New York, Academic Press. Schaeffer, B., & Wallace, R. (1970) The comparison of word meanings. J. exp. fsychol., 86, 144-152. Schank, R. C. (1972) Conceptual dependency: a theory of natural language understanding. Cog. Psychol., 3, 552-631. Smith, E. E. (1967) l:ffects of familiarity on stimulus recognition and categorization. J. exp. Psychol., 74, 324-332. Smith, E. I:.., Rips, L. J., & Shoben, E. J. (In press) Semantic memory and psychological semantics. In G. H. Bower (Ed.), The Psychology of Learning and Motivation, Vol. 8. New York, Academic Press. Smith, E. E., Shoben, F. J., & Rips, L. J. (1974) Structure and process in semantic memory: a featural model for semantic decisions. Psychol. Rev., 81, 214-241. Sternberg, S. (1969) The discovery of processing stages: extensions of Donders’ method. In W. G. Koster (Ed.), Attention and Performance If. Amsterdam, North-Holland Publishing Company. Wilkins, A. T. (1971) Conjoint frequency, category size, and categorization time. J. verb. Learn. verb.
Beh., IO, 382-385.
Alternative conceptions of semantic memory
339
RPsumP On discute le fait que les theories de la memoire semantiquc suivcnt une divergence parallele a celle des controverses concernant la representation de la signification. Le mod& de comparaison des traits (Smith, Shoben et Rips, 1974) applique la theorie linguistique de Lakoff (1972) pour predire le temps de reaction dans la verification des phrases, alors que le modele de recherche des marques, ddcrit ici, utilise le type de representation semantique defini par Katz (1972) pour cxpliquer des don&s analogues. Les deux modeles sont decrits et leur port&e est revue. Le modele de recherche de marques se verifie bien mais en revanche une prediction majeure de modele de comparaison des traits est infirmke. On discute le fait que le modkle de comparaison des traits est inadequat pour rendre compte de ia representation semantique tant que sa conception des consituants skmantiques reste inchangee.
3
Conservation
accidents
JAMES McGARRlGLE MARGARET
DONALDSON
University of Edinburgh Abstract Eighty children aged between 4 years 2 months and 6 years 3 months were tested on length and number conservation, both when the transformation occurred because of a direct action by the experimenter and when it happened ‘accidentally’as the by-product of an activity directed towards a different goal. Fifty children conserved when the transformation was ‘accidental’, whereas only 13 were successful when it was intentional. These results are interpreted as evidence that characteristics of the experimenter’s behaviour, in particular his actions towards the task materials, can influence children’s interpretation of utterances by suggesting the experimenter is thinking about a different attribute from that specified linguistically. It is suggested that traditional procedures may underestimate children’s cognitive abilities.
Introduction A substantial body of research has grown up around Piaget’s conservation tasks (Piaget, 1952; Piaget and Inhelder, 1969). Piaget’s findings have been replicated by many investigators using standardized procedures based on Piaget’s original method (Elkind, 1961; Dodwell, 1960; Hood, 1962; Smedslund, 1964). However a considerable amount of evidence has accumulated suggesting that children may have the knowledge necessary for conservation long before they succeed in the traditional conservation task (e.g., Frank, 1964; Gelman, 1972; Rose and Blank, 1974). These studies have usually involved ingenious methods for circumventing those features of the conservation task which the authors thought were particularly problematic for the child. For example, those suggesting attentional/perceptual difficulties have used screening procedures (Frank, 1964) or trained children to attend to the relevant attributes (Gelman, 1969). Those postulating linguistic difficulties have used pretraining in the use of the relational terms of the task (Gruen, 1965) or developed conservation games not involving the relational terms (Gelman, 1972). Each of these diverse procedures gave indications that children can conserve at an earlier age than is suggested by traditional methods.
Coonition
3141. DO. 341 - 350
342 James McGarrigle and Margaret Donaldson
This paper is addressed to the paradox presented by these conflicting results - results which show that the child can, in some contexts, demonstrate his knowledge about the invariance of certain attributes of objects while at the same time he fails to exhibit such knowledge in the typical Piagetian situation. The traditional method of assessing conservation is scrutinized and one potentially important feature of the situation is identified. It is suggested that this feature, which is irrelevant to the logical requirements of the conservation task, contributes substantially to the child’s difficulty in the classic situation. Consider the usual procedure for assessing conservation of number. Two lines of counters, equal in number, are arranged in one-to-one correspondence in front of the child, and a question about their relative numerosity is presented (Ql, e.g., are there more counters here or more counters here or are they both the same number?). The child makes the judgement (Jl) that they are the same and the experimenter then rearranges one of the rows and repeats the original question (Q2). The nonconserving child typically changes his judgement in favour of the longer row (52). It is customary to associate the child’s changing choice from Jl to 52 with what appears to be the only other feature of the situation that has changed - the perceptual configuration of the row of counters so that the explanation of the child’s behaviour focusses on his susceptibility to perceptual influences. Suppose however that, despite their formal or surface identity, Ql and Q2 are given differing interpretations by the child. This is not as unlikely as it seems on first reflection, and it is not difficult to imagine how two identical questions might be used to interrogate different aspects of a static array. For example, some of the counters might be covered in spots. In this situation the original conservation question (Ql and Q2) could be used to enquire about the relative numerosity of just the counters with spots, provided that preceding contextual information, either linguistic or non-linguistic, made it clear that the questioner was now interested in this aspect of the array. The point of this unlikelysounding example is that we do not know the sources of ‘contextual information’ which could lead the child to vary his interpretation of an unchanging question. One conspicuous event that intervenes between Jl and the presentation of Q2 is the experimenter’s simple action of changing the length of one of the rows of counters. Could this provide the contextual information which leads the child to misinterpret Q2? Recent theoretical proposals in the area of language acquisition indicate the paramount importance, for the development of language, of the actions of the participants in early interactions. These suggestions draw upon the communication of intent model proposed by philosophers of language in which an important distinction is made between speaker’s meaning and utterance meaning (Grice, 1968). Macnamara (1972) has suggested that children learn language on the basis of their independent hypotheses about speaker’s meaning, derived from their intercourse with the world. Ryan (1973) and Bruner (1975) have argued that mother-child interactions provide contexts of mutual action in which intentions, initially communicated non-linguistically, come to be mapped on to their linguistic means of expression. They suggest a central role for non-linguistic means of
Conservation accidents
343
collaboratively directing attention as a basis for the referential function of language. Although the detail of these processes remains to be specified, this approach clearly indicates the importance of the intentional structure of the speaker’s non-linguistic behaviour for the child’s interpretation of utterances. Since language acquisition continues at least until the period of concrete operations (Palermo and Molfese, 1973) then it is important to consider the possible effect of the intentional activity of the experimenter on the child’s interpretation of the language of the conservation task. The structure of the conservation task seems to involve a significant deviation from normal adult-child interactions. In the conservation task the adult’s non-linguistic behaviour is uncoupled from the linguistic element, in that the non-linguistic component is irrelevant to the interpretation of the utterance. Moreover the non-linguistic behaviour is highly relevant for an utterance of a different type - one concerned with the length of the row rather than number. It could be that the experimenter’s simple direct action of changing the length of the row leads the child to infer an intention on the experimenter’s part to talk about what he has just been doing. It is as if the experimenter refers behaviourally to length although he continues to talk about number. If the young child’s procedures for interpreting language initially depend heavily on the non-linguistic component of the interaction, it would not be surprising if he ignored normal word-referent relationships and interpreted the question on the basis of the experimenter’s intentions as evidenced in his behaviour. If this proposal is correct, it follows that it should be possible to vary the incidence of conservation success by manipulating the intentional structure of the circumstances leading to the transformation. One way to do this is to make the transformation appear accidental. It cannot ofcourse be genuinely accidental. The notion of ‘deliberate accident’ is self-contradictory, so one cannot build true accident into an experimental design. However one can attempt to ensure that an event will not appear to the subject to have been produced with calculated intent by the experimenter. One can make it seem to be a by-product of some other activity and to that extent fortuitous. The method employed here exploits the child’s willingness to attribute agency to inanimate objects. A small teddy bear with malicious intentions towards the task materials is operated by the experimenter to effect the transformation. The child makes his judgements before and after the transformation. In one condition, the transformation results from a direct intentional action of the experimenter like that used in the classic method. In the other condition the same transformation happens ‘accidentally’ in the course of the teddy bear’s activity which is directed towards the goal of ‘spoiling the game’. Since in both conditions the child is faced with the usual misleading perceptual array then, if orthodox accounts of conservation performance are correct, both methods of assessment should yield similar results. If, on the other hand, the intentional character of the experimenter’s actions can govern the child’s interpretation of the question, then better performance is predicted where the transformation occurs without this intentional background.
344
James McGarrigle and Margaret Donaldson
Method Subjects:
Eighty children, of mean age 5 years and 4 months (range 4 years 2 months to 6 years 3 months) took part in the study. Forty of these were attending Edinburgh nursery schools and their mean age was 4 years 10 months (range 4 years 2 months to 5 years 4 months). The other 40 were attending a local State primary school and their mean age was 5 years 10 months (range 5 years 3 months to 6 years 3 months). There was an equal number of boys and girls in each of the samples.
Design:
Each child received number and length conservation problems under conditions in which the transformation was effected intentionally (Intentional Transformation, IT), and under conditions in which it occurred accidentally (Accidental Transformation, AT). There were two situations involving the conservation of number (equal and unequal) and two involving length conservation (equal and unequal). Children were assigned to one of two groups, with care taken to balance the sex and age composition of the groups. Group I completed all the judgements involving AT before encountering any in the IT condition; Group II received the IT condition first. Within these groups, one half of the children received the number situations first and the other half received the length situations first. The equal and unequal situations appeared first equally often. The order of presentation of situations was held constant across the IT and AT conditions.
Materials and Situations:
The task materials were counters of 1%” diameter, some lengths of string, and a small teddy bear (height 3”) with a box large enough to conceal him. In the number equal situation, four red and four white counters were arranged in oneto-one correspondence into two rows of equal length. The transformation, for both IT and AT conditions, involved moving the counters of one row until they were touching each other, thereby shortening that row. Before and after the transformation the child was asked: “Is there more here or more here or are they both the same number?” while the experimenter pointed along each of the rows at the appropriate points in the utterance. For the number unequal situation, rows of four and five counters were used, and the question was: “Which is the one with more - this one or this one?” For the length equal situation, two 10” lengths of red and black string were used. These were placed beside each other, with their end-points coinciding, and the child was asked: “Is this one longer than this one or are they both the same length?” while the experimenter touched each of the pieces of string in turn.* One piece of string *It has been obiected, perhaps with some reason, that it was unwise to touch the string, since the children might regard even this act as somehow did so, however, in view of the large differences thelcss obtained.
transforming the material. It seems unlikely that they between the IT and AT conditions which were none-
Conservation accidents 345
was then transformed by bending it into a crescent shape, and the question was repeated. For the length unequal situation one of the 10” strings and an 8” length of white string were used and the question was: “Which is the long one - this one or this one?”
Procedure At the beginning of the session only the cardboard box containing teddy was present on the table. E lifted teddy out of the box, and showed him to the child, explaining that teddy was very naughty and that he was liable to escape from his box from time to time and try to ‘mess up the toys’ and ‘spoil the game’. E then arranged the materials for the child’s first situation. During the IT condition, the teddy bear did not appear. The IT procedure was modelled on the traditional method used in conservation studies. After the child’s first judgement, E said “Now watch...” and rearranged the materials with a single direct action, and then repeated the question. In the AT condition, following the initial judgement, E pretended to experience surprise and alarm as he moved teddy towards the materials making such remarks as: “It’s naughty teddy!” “ Oh! look out, he’s’ going to spoil the game.” At this point E quickly moved teddy over the string or the row of counters making sure that they were appropriately disarranged in apparently haphazard fashion. Then he allowed the child to return teddy to his box. On four occasions during the AT procedure the child, having removed teddy, restored the string or counters to their original state before E could elicit the conservation judgement. When this happened that situation was begun again using a modified procedure.* Instead of chasing teddy back to his box, the child was asked to keep teddy prisoner that is, to hold him in his hands. This ensured that the transformation remained unaltered until the judgement was elicited. The transformation, for both AT and IT conditions, was always carried out on the material furthest from the child. This procedure was adopted to allow teddy more opportunity for carrying out his mischief before the child could intervene. Each child saw both red and white counters undergo transformation within any condition (i.e., in one situation the red row was transformed, in the other the white), and similarly for the red and black string. When each situation was encountered again under the second condition (IT or AT), the child witnessed the same material being transformed as on the first occasion. The order in which E pointed to the materials in conjunction with the question was varied systematically. E gave no feedback concerning the child’s performance. The method employed here for assessing conservation does not involve the eliciting of justifications. This strategy was adopted for two reasons. In the first place, Brainerd *It might have been better simply to stop testing in these number, the pattern of the results is not substantially affected.
cases.
But since they
were so few in
346
James McGarrigle and Margaret Donaldson
(1973) has argued that, even from the point of view of Piagetian theory, the justification criterion is too strict. More important for present purposes, the attempt to elicit justitkations would have involved the child and E in further complex interaction, the characteristics of which could have influenced the child’s subsequent behaviour in a number of ways.
Results Tables 1 and 2 show that correct responses were far more frequent when the transformation occurred accidentally .than when it was effected in the traditional manner. Nearly three-quarters of responses were correct in the AT condition, whereas only one-third were correct in the IT condition. If the criterion for successful conservation is set at four out of four judgements correct, then 50 of the 80 children are diagnosed as conservers under the AT method. In contrast, when these same 80 children made their judgements in the IT condition, only 13 achieved the criterion of successful conservation. This conclusion remains essentially unaltered if a laxer criterion is employed.
Table 1.
Total of correct responses given by Groups I and II for each situation under AT and IT conditions
Situations A Number Equal B Number Unequal C Length Equal D Length Unequal Totals
Table 2.
Group I (n = 40) AT ----+ IT
Group II (n = 40) IT ----+ AT
32 36 34 31 139
14 13 8 9 44
19 18 15 12 64
22 22 24 23 91
Percentages of correct responses for Groups I and II under AT and IT conditions
Nursery School Children Primary School Children Overall Mean AT Performance Mean IT Performance
Group I AT -
IT
Group IT-
80.0 93.7 86.9
31.25 48.8 40.0
22.5 32.5 27.5 71.9 33.7
II AT 46.25 67.5 56.9
Conservation accidents
347
The performance in the AT condition of Group II, which had received the IT condition first, was significantly poorer than that of Group I (x2 = 4.85, p < 0.05). Hence it may be more appropriate to compare only the initial four judgements of each group. On this analysis 30 of the 40 children of Group I successfully conserved (AT condition) whereas only 5 of Group II did so (IT condition) (x2 = 29.2, p < 0.001).
Discussion These results give clear indications that traditional procedures for assessing conservation seriously underestimate the child’s knowledge. Most of these four- and five-year olds achieved the criterion for successful conservation of length and number when the transformation was accidental, and yet most of the same children failed when the transformation was effected in the traditional manner. Indeed the results suggest that the performance in the AT condition of Group II subjects was depressed because of their prior experience with IT, so that even the AT procedure may have underestimated their knowledge of conservation. The verbalisations of those Group II subjects who were unsuccessful in the AT condition suggest they ignored the teddy’s stated intentions and interpreted his activity as being directed towards changing the length of the row of counters, presumably because they had just witnessed the experimenter exhibit such activity. For example one child in the Zength equal situation chose the untransformed string as being longer after the teddy’s activity “because you bent that one doing it and it’s the shortest one”. The present approach differs from earlier ones which have emphasized language comprehension difficulties in that it considers the possibility that extralinguistic features of the testing situation, in particular the non-verbal behaviour of the experimenter, can influence the child’s interpretation of the language. Several investigators have suggested there is a failure to understand key words in the question, and Griffths, Shantz and Sigel (1967) have argued for the use of pretests which assess the comprehension of the relational terms in non-conservation contexts. However it is possible to understand a word in one context while failing to do so in others. The framework adopted here has allowed the identification of one feature of the testing situation which has precisely this effect the intentional character of the experimenter’s behaviour. It appears that this feature of interactions is implicated in the young child’s normal procedures for interpreting language. In the early stages of language acquisition, the child interprets the meaning of behavioural events to arrive at a notion of speaker’s meaning and this knowledge is utilized to make sense of the language around him. Eventually the child acquires a semblance of linguistic meaning, in that he can respect certain properties of the language where the non-linguistic components of the speaker’s activities do not conflict with the utterance. During this phase the intentional nature of the speaker’s activities, where this is at variance with the utterance, can govern what the child thinks is being talked about, so that his understanding of such concepts as number and length can be obscured.
348
James McGarrigle and Margaret Donaldson
Hence when the experimenter has effected an intentional action which changes the length of a row of counters, the child behaves as if the experimenter is asking a question about length rather than number. When the length of the row changes, but without the experimenter appearing to have intended it, the child has no conflicting behavioural evidence relevant to his interpretation of the question, and so can correctly answer the experimenter’s question on the basis of number. Similarly, when the experimenter carefully arranges one of the pieces of string into a crescent shape, the child compares the length of the untransformed string with one of the properties of the new figure usually the distance between the sides of the crescent. In contrast, when the crescent shape appears as a result of the teddy’s activity, the child is able to base his comparisons on the appropriate attribute. Issues closely related to these have recently been discussed by Donaldson and McGarrigle (1974). They presented evidence that children, in judging which of two sets of toy cars had ‘more’, were systematically influenced by the presence of garages around the cars. The irrelevant attribute, fullness of the garage, was used as a criterion for response by about one-third of the children when the garages were present; when they were absent a different criterion was employed. Donaldson and McGarrigle suggested that these and related findings were evidence of the operation of non-linguistic rules which function in particular contexts to provide a specific interpretation of utterances when the child’s linguistic knowledge by itself cannot do so. These rules, known as local rules, specify a hierarchy of attributes of the referent which are used in interpreting utterances. The results of the present experiment indicate that characteristics of the speaker’s actions can also influence the child’s interpretation of utterances, suggesting the existence of a different kind of local rule, one which is sensitive to the intentions of the speaker. It is difficult to find alternative explanations for the present findings.* It is clear that accounts which emphasize perceptual effects are inadequate by themselves, since in both IT and AT conditions the child faced the usual misleading configuration of materials. Gelman (1969) has argued that attentional deficiencies could explain conservation failure. She proposed that young children define quantity multi-dimensionally so that a variety of irrelevant attributes are attended to,and suggested that the change in value of an irrelevant attribute brought about by the transformation increased the likelihood of attention to that attribute. Gelman used discrimination learning set training to overcome this tendency, and showed that five-year olds could, after extensive training, succeed in conservation tasks. However it is not easy to reconcile the results reported here with the attentional analysis since in the AT procedure the children were able, without any training whatesoever, to continue responding on the basis of the criteria1 attribute even though an irrelevant attribute had changed. Nonetheless the framework adopted in the *The interval between the first and second judgements for each situation was approximately 30 seconds in the AT condition, whereas it was only 3 - 5 seconds in the IT condition. A separate study on a small number of children found no evidence that the longer time interval between Jl and 32 could account for the improved performance in the AT procedure.
Conservation accidents
349
present paper does have implications for attentional processes. Rather than postulating an attentional deficit, we are suggesting that the child’s normal procedures for interpreting language encourage special sensitivity to certain characteristics of the speaker’s behaviour. If the explanation of the findings reported here is correct then there are important implications for those procedures which combine questions with intentional manipulative activity by an adult as a means of examining cognitive abilities. Far greater attention must be given to the features of the interactional setting in which the child’s knowledge is assessed before any conclusions about the child’s competence can be drawn. Further evidence is needed concerning the precise characteristics of speaker’s intentional actions and how these influence the child’s interpretation of utterances. It is possible that the achievements of the concrete operational stage are as much a reflection of the child’s increasing independence from features of the interactional setting as they are evidence of the development of a logical competence.
REFERENCES Brainerd, C. J. (1973) Judgements and explanations Psychol. Bull., 79, 172-179. Bruner, J. S. (1973) The ontogenesis of speech
as criteria acts.
for the presence
of cognitive
structures.
J. Child Language, 2, l-19.
Dodwell, P. C. (1961) Children’s understanding of number concepts: Characteristics of an individual and of a group test. Can. J. Psychol., 15, 191-205. Elkind, D. (1961) The development of quantitative thinking: A systematic replication of Piaget’s studies. J. gen. PsychoI., 98, 3746. I-rank, cited in Bruner, J. S., Olver, R. O., and Greenfield, P. M. (1966) Studies in Cognitive Growth, Wiley, New York, p. 193. Donaldson, M., & McCarrigle, J. (1974) Some clues to the nature of semantic development. J. Child Language, I, 185-l 94. Gehnan, R. (1969) Conservation acquisition: A problem of learning to attend to relevant attributes. J. Exper. child Psychol., 7, 67-87. Gelman, R. (1972) Logical capacity of very young children: Number invariance rules. Child Devel., 43, 75-90. Grice, H. P. (1968) Utterer’s meaning, sentence-meaning, and word-meaning. Foundations of Language, 4, 225-242. Griffiths, J. A., Shantz, C. A., & Sigel, I. E. A methodological problem in conservation studies: The use of relational terms. Child Devel., 38, 841-848. Gruen, G. E. (1965) Experiences affecting the development of number conservation in children. Child Devel., 36, 963-979. Hood, H. B. (1962) An experimental study of Piaget’s theory of the development of number in children. Brit. J. Psychol., 53, 273-286. Macnamara, J. (1972) Cognitive basis of language learning in infants. Psycho!. Rev., 79, l-l 3. Palermo, D. S., & Molfese, D. L. (1972) Language acquisition from age five onward. Psychol. Bull., 78, 409428. Piaget, J. (1952) The child’s conception of number. London: Routledge and Keaan Paul. Piaget, J., & lnhelder, B. (1969) The psychology of the child. New York: Basic Rooks. Rose, S. A., & Blank, M. (1974) The potency of context in children’s cognition: An illustration through conservation. Child Devel., 45, 499.-502.
350
James McGarrigle and Margaret Donaldson
Ryan,
J. (1973) Interpretation and imitation in early language development. In R. Hinde & J. S. Hinde (Eds) Constraints on learning - Limitations and predispositions. London: Academic Press. Smedslund, J. (1964) Concrete reasoning: A study of intellectual development. Mono. Sot. Res. Child Devel.. 29.
R&urn& 80 enfants de 4;2 et 6;3 ans ont it8 testis sur la conservation de la longueur et du nombre dans les deux conditions suivantes: Lorsque la transformation est le rhltat de l’action directe de l’expkrimentateur et lorsque la transformation dkcoule, indirectement, d’une action ayant un but diffkrent (transformation ‘accidentelle’). 50 enfants conservent lors d’une transformation ‘accidentelle’ alors que 13 seulement, repondent correctement dans le cas d’une transformation intentionnelle. Ces rdsultats montrent l’influencc de I’attitude de I’expkrimenteur et notamment de ses actions sur le matkiel, sur l’interprdtation de la tlche par l’enfant. En particulier, l’attitude de l’expkrimentateur pcut suggkrer i l’enfant que les caractbristiques pertinentes ne sont pas celles qu’il exprime linguistiquement. On peut alors penser quc les procedures classiques ont tendance i sousestimer les capacitks cognitives de I'enfant.
4
On the psychological
reality of a natural rule of syllable structure*
SANFORD A. SCHANE BERNARD TRANEL HARLAN LANE University of California at San Diego
Abstract If natural rules in phonology, such as the rule which deletes a word final consonant before a consonant, are frequently found in unrelated languages, it must be because they tap universal features of production and/or perception. The present experiment employed a learning task to see whether naive subjects have a predisposition for the natural rule as opposed to its converse (consonant deletion before a vowel). The Ssfirst learned four novel words (nouns) - two beginning with a consonant, two with a vowel - as paired associates to English ‘translations’. Then three novel adjectives were combined with each of the four nouns, following the natural rule for one group of Ss, the unnatural rule for the other. The twelve phrases were cued by their English translations and the S had to respond to each with the phonologically correct sequence of adjective and noun; confirmation followed each response. The Ss learning the unnatural corpus had a strong tendency to give natural responses, whereas the converse was not true. Consequently they made many more errors en route to mastery than their natural counterparts, even when the operative rule was displayed on the first trial by presenting in turn each adjective with its four following nouns. It appears that our Ss had implicit knowledge of the natural rule, even though it does not operate to any significant extent in English. Most phonologists maintain that the consonant-vowel-consonantlrowel syllable structure (CVCV) is the most fundamental in language (Jakobson, Fant and Halle, 1956). In English, a word such as ‘banana’ exemplifies this pattern while, on the other hand, ‘streak’ does not, since it begins with three consonants and ends with one. The claim is *This experiment was conducted during the summer of 1969 at the Center for Research on Language and Language Behavior, University of Michigan, under contract with the Language Development Section, U.S. Office of Education. We are indebted to the distinguished phonetician, Professor J. C. Catford, who read our corpus aloud for tape recording. The data analysis was carried out at the University of California, San Diego, with support from the Academic Senate Committee on Research. We are grateful for the assistance of Jo-Ann Flora and Allana Elovson. Bernard Tranel is now at the Department of Romance Languages, University of California, Riverside and Harlan Lane at the Department of Psychology, Northeastern University. Reprints: Dr. S. Schane, Department of Linguistics, University of California, San Diego, La Jolla CA 92037.
Coanition
3/41. mx 351 - 358
352
Sanford A. Schane, Bernard Tranel and Harlan Lane
further made that a greater number of consonants in a cluster leads to a more elaborate (or more marked) syllable structure (Chomsky and Halle, 1968). Various facts have been adduced in support of the claim that the CVCV syllable type is basic. (1) All languages have this particular syllable structure as one of the permitted ones (Jakobson, Fant, and Halle, 1956). (2) The CVCV pattern is the first to emerge in child language (Jakobson, 1968). (3) Various phonological processes (or rules) found in language convert more complex syllable structures to the CVCV one (Schane, 1972). It is with one of these processes that we shall be most concerned. Maori, spoken in New Zealand, and French both contain rules leading to the CVCV pattern, although these two languages have no genetic relationship. The following forms illustrate this process for Maori. Active verb afi hopu aru
Gerund afit-aga hopuk-aya arum-aga
‘embrace’ ‘catch’ ‘follow’
Speaking historically, the final consonant of the verb stems (afit-, hopuk-, awn-) was deleted in the active, where the consonant was not followed by a vowel. In French, there are alternations between the presence and absence of word-final consonants, depending on whether the following word begins with a vowel or a consonant. These alternations, which traditional French grammars call the presence or absence of ‘liaison’, are especially apparent with preposed adjectives. petit ami groz ski 15k ete
peti gars.i gro per 1.5pr@ta
‘little friend’ ‘big uncle’ ‘long summer’
‘little boy’ ‘big father’ ‘long spring’
The adjectives terminate in a consonant in their underlying representations - petit, groz, lik, and there is a rule deleting these consonants whenever the following word begins with a consonant (Schane, 1968). The consonant deletion rules of Maori and of French are examples of what phonologists today call ‘natural’ rules. In both languages, the result of the rules is a simplification in syllable structure between words. On the other hand, a rule deleting word-final consonants before vowels, but not before consonants, would be classified as ‘unnatural’. Such a hypothetical rule, applied to the French data, would yield either clusters of consonants - e.g. *[petit gars?‘], or else sequences of vowels - e.g. *[peti ami]; these would be considered less desirable syllable structures. Although there is a fair amount of linguistic data - based on the typology of languages, acquisition, historical change, and phonological alternations - to support the linguist’s claim that a certain phonological process is natural and that natural rules predominate over their unnatural counterparts, we are interested in seeing whether another type of corroboration is forthcoming when appropriate performances are demanded of subjects
Psychological reality of a natural rule of syllable structure 3.53
in psychological experiments. In general, if we were to take two groups of Ss and expose one group to linguistic data exemplifying the natural process and the other group to data exemplifying the unnatural process, would the natural group have less difficulty in discovering and using their rule ? The outcome should be independent of the S’s native language and, in fact, it is important to find a natural rule which is not part of his language. Otherwise, he surely will be more adept at processing exemplars of the natural rule, if only because he is already familiar with it. The natural rule which deletes a consonant before another consonant exists in French, but not in English, and therefore ought to be a good rule to use on English Ss who do not know French. It should be pointed out, however, that the process exists marginally in English in the distribution of the indefinite article a/an or in the ‘r-less’ dialects, where the final r is pronounced before a vowel but not before a consonant. However, since this process does not permeate large parts of the English phonological system, it is probably safe to ignore these few instances. In order to present subjects with corpuses that exemplify the operation of either a natural or an unnatural rule, an artificial language was constructed using four nouns and three adjectives. The nouns were: sipu ‘dog’, pas’i ‘house’; oga ‘book’, ibi ‘man’; the adjectives were: tupak ‘small’, amuf ‘white’, gomet ‘old’. One group, the natural group, heard adjective-noun combinations where the final consonant of the adjective is deleted before a noun beginning with a consonant, e.g., tupak oga, tupa sipu; amuf ibi, amu paii. The other group, the unnatural group, heard adjective-noun combinations in which the final consonant of the adjective is deleted before a following vowel, e.g., tupa oga, tupak sipu, amu ibi, amuf paii. Suppose speakers behave in the way described by the natural rule but undertake to learn the unnatural version of the artificial language. They are similar to students of a second language in some respects. Students make many errors in those cases where the phonological rules of their native language are quite different from those in the second language, and these errors often reflect precisely the contrast between the rules they tend to follow and those they must learn to follow. With training, however, students can often master the required performance. Consider the student who has learned, for example, the words for ‘book’ and ‘dog’, oga and sipu, and who must learn that ‘small book’ and ‘small dog’ are in this unnatural corpus, tupa oga and tupak sipu, respectively. He is liable to recall these phrases erroneously, when he recalls them at all, as tupak oga and tupa sipu, if in fact the natural rule has psychological reality and influences his performance on this task. Nevertheless, he will ultimately learn the correct forms. The student learning the natural language, conversely, has a large initial advantage, since his initial disposition facilitates rather than interferes with the new task; even when he is unsure of a phrase, he will tend to guess the correct alternation. Of course, the corpuses could prove to be equally difficult for an English speaker to master, since neither deletion rule is generally operative in his language. In this experiment, we ask whether a natural corpus is easier to learn than an unnatural one, and we pose a supplementary question: If Ss are induced to discover the rules operative in the corpus, will they overcome their disadvantage, if any, in learning the unnatural version?
354
Sanford A. Schane, Bernard Tranel and Harlan Lane
Method Subjects Thirty:one undergraduates at the University of Michigan served individually as Ss. All were native speakers of a Midwestern dialect of American English and hence did not speak one of the ‘r-less’ dialects mentioned earlier. Twenty-four knew no French, although most had studied at least some other foreign language. To the best of our knowledge, the rule of consonant deletion did not function in any appreciable way in these other languages. Seven Ss knew French well (i.e., a minimum of four years at the high school level). Since the consonant deletion rule is operating in French, we wanted to see what effect knowledge of that rule had on the performance of these seven Ss. The important group, of course, is the one with no French. The subject was comfortably seated in an audiometric room and told that he would be engaged in a learning task and would be paid according to his rate of learning. In front of him were an error counter (remotely controlled by f?) and a microphone. He was told that every time he made an error a red light would flash and the error would be entered on the counter. Materials and Procedure All of the word lists in the experiment were tape-recorded (Ampex 300) by a phonetician who read the utterances as if they were English. In fact, they all conformed to English phonetics (with the possible exception of the final vowel in the elided version of tupak): they contained initial primary stress, aspirated stops, etc. The “nouns” were: , [s Ty ph iw], [pb asi y], [ow g a], [iy b i y]. The “adjectives” were: [th tw ph ak], I
,
[am uw f], [g ow m ey t]. Two-word phrases were always read with juncture at the word boundary ‘(amuffibi, not amu/fibi): liaison might have selectively hindered the natural group, since junctures would no longer correspond to word divisions. Copies of the master tape were played to S (Ampex 351-2) who listened with binaural head-phones (Grason Stadler PDR-8). At no time did S see an orthographic representation of the utterances. Part A: Ten different random lists of the words sipu, paii, oga and ibi were tape recorded. All Ss were exposed to the same lists given in the same order. The four nouns were presented with their English translations as paired. associates. The subject heard the English word’first, followed by a pause of two set during which time he was to recall its translation in the artificial language. He then heard the appropriate response and had to repeat it after the model. The set of four nouns was presented over and over, each time in a different order. Criterion was reached after two series without errors. Learning nouns in isolation served two purposes. The results were used to match Ss when assigning them
Psychological reality of a natural rule of syllable structure 35.5
to treatment groups in Part B. Moreover, it insured that S would not have to be concerned, in the later parts of the experiment, with the placement of word boundaries between the adjective and the noun, since he would know the form of the noun in isolation. Part B: Twelve adjective-noun combinations (three adjectives combined with each of the four nouns of Part A) were presented. On the basis of performance on Part A, Ss were placed in one of four groups: Natural Systematic (NS), Natural Random (NR), Unnatural Systematic (US), Unnatural Random (UR). N and U groups heard the natural and the unnatural corpuses, respectively. Each of the four groups heard three different random lists of 12 adjective-noun combinations, after which the set of three lists was repeated until criterion was reached. For just the systematic groups, however, each adjective was presented with all four nouns before the next adjective was introduced in the first series of 12 items. For this first systematic series, the order of presentation for a given adjective was as follows: form without consonant deleted, form with consonant deleted, form without consonant deleted, form with consonant deleted. Subsequent series were the same as for the random groups. The procedure for learning the corpus was the same as that used for learning the nouns in isolation, except that criterion was reached after one series of 12 items with no more than one error. Any response that conformed to the appropriate syllable structure was counted as correct. For example, if S should have produced gomet ibi but instead said gomek ibi or even amuf ibi, these responses were considered correct as long as the correct noun was given, since he had the adjective terminating in a consonant and the appropriate noun beginning with a vowel. Part C: 18 new nouns were presented, nine beginning with a consonant and nine beginning with a vowel. The subject was asked to combine each new noun with one of the three adjectives he had previously learned in Part B. The 18 items were given once and the list was the same for all Ss. The purpose was to see whether S could generate correct sequences not in the original corpus.
Results and discussion Figure 1 shows how Ss progressed in learning the natural and unnatural corpuses when the phrases were presented in random order (R), and when they were presented once systematically and thereafter randomly (S). In both cases, Ss learning the unnatural language make many more errors initially than their natural counterparts. In the first third of the experiment, the Ss learning the natural corpus averaged 56 per cent correct responses, those learning the unnatural one, 35 per cent (F = 8.4; df = 1, 20;~ < 0.01). Thereafter, the difference between the groups diminishes as both learned the lists, with their learning curves converging toward criterion (F = 3.5; df = 2, 40; p < 0.05).
356
Sanford A. Schane, Bernard Tranel and Harlan Lane
Figure 1.
Percent correct responses averaged over blocks of three trials for groups of six subjects learning 12 phrases in an artificial language. The phrases were either phonologically natural or unnatural and were presented either in random order throughout or systematically once and randomly thereafter. For clarity, the learning curves for the random groups are shifted one trial block to the right.
WI2
3
Blocks
of
4
three
5
6
7
trials
Turning first to the randomly ordered series, the influence of the natural rule is seen most clearly on the early trials, when the Ss had not yet memorized the 12 items. Subjects learning the unnatural corpus more often made natural responses (hence errors) than Ss learning the natural corpus made unnatural ones. The percentages for the first third of the experiment were 50 and 37, respectively (U = 5; p < 0.02). Moreover, Ss learning the unnatural corpus failed to respond more often, 38 versus 23 per cent of the trials (CJ = 9; p < 0.07). Apparently, application of the natural rule, despite the previous trials’ conditioning, led to response competition. A similar pattern exists contrasting the two groups receiving systematic presentation on the first trial: again, there is a greater tendency for Ss learning the unnatural corpus to make natural responses (41 per cent error) than for Ss learning the natural corpus to make unnatural ones (26 per cent error) (U = 11; n.s.). Moreover, the former failed to respond on 38 per cent of the trials, the latter on 15 per cent (U = 6; p < 0.03). The answer to the question, is a natural corpus easier to learn than an unnatural one, is yes. The learning curves for individual Ss reveal the influence of the natural rule quite strikingly in another way. Half of the Ss learning the unnatural corpus actually made
Psychological reality of a natural rule of syllable structure
357
increasing numbers of natural responses during the early trials; hence, their individual learning functions were non-monotonic. Not a single natural S showed this pattern of initially increasing errors. Figure 1 shows that when Ss receive systematic presentation of the corpus on the first trial, they perform better, an average of 14 per cent over the whole experiment (F = 5.27; a’f = 1, 20; p < 0.05), and the greater difficulty of the unnatural version is rapidly overcome. Asked whether they had followed some sort of rule in formulating their responses during the experiment, 25 per cent of the Ss who received random presentation replied that they had, and stated some version of a rule of consonant deletion, whereas 58 per cent of the Ss who had received systematic presentation on just one trial gave this report. Any inference from these percentages of rule discovery would be risky given the small numbers of Ss involved and the vagaries of verbal report. Our confidence in the reports is increased, however, by the finding that the Ss who avowedly discovered a rule did better at the task of generating new utterances in the artificial language (Part C): 86 per cent correct compared with 48 per cent for their more workaday companions (t = 5.7, p < 0.01). It is only several trials after the systematic presentation that the Ss learning the unnatural corpus perform at about the same level as their natural counterparts. It seems reasonable to speculate that systematically displaying the alternation, whether the corpus was natural or unnatural, called attention to the fact that an alternation was indeed involved in the lists to be remembered. The natural Ss could then proceed correctly to perform in accordance with the ‘natural’ rule, which the results from the random presentation show they are disposed to follow. The Ss working with unnatural material find, however, that the ‘natural’ rule is not the correct one for their corpus, and require several trials to transform it into the complementary rule, the ‘unnatural’ one. The median number of trials to criterion for all 24 Ss in the experiment was 13.5. Of the seven additional Ss set apart from the main experiment because they had at least four years of High School French, five reached criterion in seven trials or less (four natural Ss, one unnatural), while the two remaining Ss (both unnatural) required 14 trials. This result is not unexpected. In learning French one is explicitly told about the rules for elision and liaison - that is, the deletion of consonants and vowels - and it is not surprising that many of these Ss transferred this knowledge to the learning task. Some of their remarks are revealing in this respect. Two who had been exposed to the natural corpus volunteered that their language worked exactly like French, while one unnatural S (the one who reached criterion rapidly) reported that his language worked like French but the other way around. These Ss, who reached criterion in 3, 3, and 4 trials, respectively, confirm that overt knowledge of a rule aids in solving linguistic problems. By the same token, we have already seen that there &as a significant correlation, for Ss who did not know French, between explicit knowledge of the rule and performance on Part C that is, the application of the rule to new linguistic data. We leave open the implications of this observation for second-language instruction.
3 5 8 Sanford A. S&me, Bernard Tranel and Harlan Lane
The linguist’s intuition that a certain sound pattern is natural is often based on its occurrence in diverse languages. If natural rules are frequent in unrelated languages, it must be because they tap universal features of production and/or perception. All speakers embody these features, even though the process in question may play no significant overt role, in their native language. With naive subjects, such phonological predispositions can be brought to light indirectly with an experiment; in the present case, a learning task. In 1972, Ohala wrote that “it has never been proven that ordinary language users can differentiate let alone prefer natural sound patterns over non-natural ones” (p. 42). The present experiment provides evidence that ordinary people do indeed have that implicit knowledge, at least in the case of one important natural rule.
REFERENCES Chomsky, N. &. M. Halle, (1968) The sound pattern of English. New York: Harper & Row, Inc. Jakobson, R. (1968) Child longuoge, ophosio. and phonological universals. The Hague: Mouton. Jakobson, R., C. G. M. Fant, & M. Halle (1956) Preliminaries to speech onolysis. Cambridge, Mass., M.I.T. Press. Jakobson, R. & M.Halle, (1956)FundomentoZs of Ionguoge. The Hague: Mouton. Ohala, J. (1972) How to represent natural sound patterns. POLA Reports, Phonology Laboratory, University of California, Berkeley. Schane, S. A. (1968) French phonology and morphology. Cambridge, Mass.: M.I.T. Press. Schane, S. A. (1972) Natural rules in phonology. In: Stockwell, R. E. & R. D. A. Macaulay (Eds.), Linguistic change and generative theory. Bloomington: Indiana University Press, pp. 199 229.
Rt%umP Si les regles naturelles de la phonologie, comme par exemple celle qui efface une consonne finale devant une consonne initiale, sont frequentes dans les langages non structures, ce doit etre parce que ces rkgles concernent les traits universaux de production et/au de perception du langage. La presente experience a pour but de voir, a partir d’une tlche d’apprentissage, si des sujets nal’fs ont davantage tendance a utiliser une regle naturelle que la reciproque (effacement de la consonne devant une
voyelle) Les sujets ont tout d’abord a apprendre quatre mots nouveaux (noms) - deux commencant par une consonne, deux par une voyelle. Ces mots sont don&s en panes associkes a une “traduction” anglaise. On doit ensuite combiner avec chacun des quatre noms trois adjectifs nouveaux, pour l’un des groupes de sujets suivant la regle naturelle, pour l’autre suivant la regle non-naturelle. Les 12 phrases sont index&es par leur traduction en anglais. Le sujet doit repondre i chacune par une sequence phonologiquement correcte de nom-adjectif. Apres chaque reponse, ii recoit une confirmation. Les sujets ayant appris le corpus non-nature1 ont tendance I donner des rdponses naturelles alors que la reciproque n’est pas vraie. En consequence ces sujets font, au tours de la &he, beaucoup plus d’erreurs que les autres mdme si la regle operatoire leur est donnee au premier essai par une presentation systematique de chaque adjectif suivi des quatre noms correspondants. M&me si un tel processus n’a pas un role significatif en anglais, il semble que nos sujets ont une connaissance implicite de la rbgle naturelle.
5
Converging
evidence for the functional significance of imagery in problem solving* PHILLIP
SHAVER,**
LEE PIERSON, AND STEPHEN LANG Columbia University, New York City
Abstract Two kinds of explanations have been offered for the process by which three-term series problems are solved, one in terms of linguistic principles and the other in terms of visualspatial imagery. Two experiments are reported in which three different classes of operations are brought to bear on the problem: (1) Manipulation of stimulus attributes (characteristics of problems), (2) manipulation of variables that selectively encourage or inhibit the use of imagery (facilitating instructions; the suppression of visualizationby reading), and (3) measurement of relevant individual differences (spatial-reasoning ability). All of the results indicate that imagery plays a functional but not a necessary role in the solution of three-term series problems; it is suggested that imaginal representation is functional because it reduces the load on memory. An adequate explanation of problemsolving will have to address certain general issues, such as the diversity of forms of cognitive representation and differences within and between individuals in the choice of problem-solving strategies. Recent research on syllogistic reasoning has focused on the distinction between visual and verbal information processing (Clark, 1969a, 1971; Huttenlocher, 1968; Huttenlocher & Higgins, 1971), as has much related work in cognitive psychology (e.g., Paivio, 1971). Often the problems used in this research are of the following sort: “Tom is taller than Harry. Joe is shorter than Harry. Is Tom taller than Harry? (or Who is tallest?)“. Huttenlocher (1968) following DeSoto, London, and Handel (196.5) has argued that people use visual-spatial imagery in solving these problems. Clark (1969a) has argued that the postulation of imagery is unnecessary, since strictly linguistic principles can account for the data presently available. *Experiment
I was part of a doctoral dissertation submitted to the University of Michigan by the
first author during his tenure as a National Science Foundation Predoctoral Fellow. Experiment II and the preparation of this paper were supported by a grant from Columbia University. The authors are grateful to Susan Saegert and Stuart Katz for advice and assistance throughout the project. **Requests for reprints should be addressed to Phillip Shaver, who is now in the Department of Psychology, New York University, New York, New York 10003.
Cognition
314J, pp. 359 - 375
360
Phillip Shaver, Lee Pierson, and Stephen Lang
Clark (1971) ended a recent debate concerning his linguistic theory with the following claim: “... the only firm conclusion we can draw at this time is that it has not been demonstrated that the use of spatial imagery differentially affects the solution of threeterm series problems” (p, 513). This reply to Huttenlocher and Higgins (1971), who had extensively criticized Clark’s theory, was quoted directly from an earlier paper (Clark, 1969a) - presumably indicating that despite much debate and thought in the intervening years, there was still no way to determine whether imagery plays afunctional role in the solution of three-term series problems. The word functional is emphasized because no one who has done research in this area doubts that many subjects say they use imagery; the question is whether such reports reflect the mechanism actually responsible for performance. According to Huttenlocher’s analysis, subjects solve problems like the example above by placing the names Tom, Harry, and Joe in an imaginary spatial array, then answering the question “Who is tallest?” by reporting which person is highest (or left-most) in the array. The same procedure works in principle for all three-term series problems, regardless of the adjectives involved. Besides introspective reports (DeSoto et al., 1965; Huttenlecher, 1968), the main source of evidence for the spatial imagery explanation is that patterns of errors and reaction times obtained in studies of three-term series problems are parallel to results obtained in studies of structurally analogous spatial arrangement tasks using real objects - e.g., “The red block is above the green block. The yellow block is below the green block. Which is highest?” (These statements refer to blocks which are actually manipulated by subjects; see, for example, Huttenlocher, Eisenberg, & Strauss, 1968; Huttenlocher, Higgins, Milligan, & Kaufman, 1970; Huttenlocher & Strauss, 1968). Clark’s alternative linguistic theory is considerably more complex. It is based on three ‘linguistic principles’ - functional relations, lexical marking, and congruence - and on one ‘strategy’, compression. The purported advantages of the more complex theory are generality, continuity with other explanations in psycholinguistics, and avoidance of Since his theory makes essentially the the still somewhat slippery imagery construct. same predictions as Huttenlocher’s for problems of the sort discussed above, Clark (1969a, b) has attempted to compare predictions from the two theories using related problems stated in negative equative form, for example: “Harry is not as tall as Tom”. Unfortunately it is not clear what predictions should be made from Huttenlocher’s theory about negative equative problems (Clark, 1972; Huttenlocher et al., 1970; Johnson-Laird, 1972; Jones, 1970), and much of the recent debate has centered on this issue. For our purposes, it is sufficient to note that the debate has reached an impasse which could conceivably be overcome by introducing evidence generated by alternative methods. Paivio (1969, 1970, 1971) has described a similar impasse which for a number of years barred the concept of imagery from research on paired-associate learning. Many investigators claimed that while imagery might be a useful concept for describing one’s own experiences, it played no functional role in learning and memory. However, Paivio (1971)
Imagery and Problem Solving 36 1
and others (e.g., Bower, 1970) have since established the functional significance of imagery by approaching the problem through three classes of converging operations. The first is manipulation of stimulus attributes which are assumed to enhance or inhibit the arousal of imagery. In Paivio’s research, for example, stimulus words that had been rated for ‘concreteness’ were used to show that easy-to-image words are easier to remember in free-recall situations and are easier to learn as paired-associates. The second operation involves experimental procedures that selectively encourage or inhibit the use of imagery, such as direct instructions telling subjects to employ imagery. This procedure is based on the assumption that subjects can use alternative symbolic coding systems at will. The third operation, measurement of relevant individual difference variables, involves the selection of subjects according to their symbolic habits or skills, such as the customary vividness of their imagery or their spatial-reasoning or figuraltransformation abilities. In the studies to be reported here, all three kinds of operations were used. In Experiment I, stimulus attributes were manipulated by using three different pairs of bipolar adjectives which earlier research had suggested might differ in their capacity to arouse spatial imagery. One pair, above-below, directly describes a spatial arrangement. A second pair, better-worse, has been found by DeSoto et al., (1965; Handel, DeSoto, & London, 1968) to be treated in an analogous manner by many subjects, perhaps because spatial metaphors are frequently used to make social comparisons. In contrast, the third pair, lighterdarker (hair), did not yield a consistent spatial arrangement, perhaps because spatial metaphors are not commonly used to describe hair color.* The second class of operations, experimental procedures that selectively encourage or inhibit the use of imagery, was represented in the first experiment by a method developed by Brooks (1967, 1968), who has shown that intake of sensory information through either the visual or auditory modality interferes with the construction of imagery in that modality. For example, subjects find it harder to recall verbal statements about spatial relations if they read the statements rather than listen to them being read, but they find it easier to recall statements about nonspatial relations when they have read them. In our first experiment all problems were presented alternately in visual and auditory modes (see Atwood, 1971, for a variant of this procedure). The third class of operations, measurement of relevant individual differences, was represented in Experiment I by three short tests of spatial-reasoning or figural-transformation ability. Several recent studies (e.g., Neisser, 1970; Paivio, 1971) have indicated that this ability relates to performance on tasks involving imagery. In addition, subjects were asked at the end of Experiment I to provide introspective reports. Experiment II was a much simpler study designed to assess the effect of imagery instructions on the solution of three-term series problems. Several studies have shown *It should be noted that according to Clark’s (1969a) principle are. unmarked and worse and below are marked, whereas neither marked.
of lexical marking, better and above member of the lighter-darker pair is
362 Phillip Shaver, Lee Pierson, and Stephen Lang
that such instructions have a striking effect on memory (Bower, 1970; Paivio, 1971) and it was expected that a similar effect might be obtained for problem solving, which requires a person to operate on the remembered essentials of a problem. Taken together, these two experiments should provide important information concerning the role of imagery in problem solving.
Experiment
I
Four kinds of problems, each consisting of a different combination of premises and thus varying in difficulty, were taken from a study by DeSoto et al. (1965). These are shown here in Table I. Results from the DeSoto et al., study led us to believe that the three Table 1.
Model problems based on four types of premise combinations and three relations Relation
Premise Combination
Above-below
Better-worse
Lighter-darker
A is above B B is above C Is A above C? 65.8
A is better than B B is better than C Is A better than C? 61.5
A has lighter hair than B B has lighter hair than C Does A have lighter hair than C? 63.0
C is below B B is below A Is C below A? 50.0
C is worse than B B is worse than A Is C worse than A? 47.0
C has darker hair than B B has darker hair than A Does C have darker hair than A? 64.4
Type 3
A is above B (? is below B Is C above A? 65.1
A is better than B C is worse than B Is C better than A? 66.7
A has lighter hair than B C has darker hair than B Does C have lighter hair than A? 39.7
Type 4
B is below A B is above C Is C above A? 46.6
B is worse than A B is better than C Is C better than A? 35.9
B has darker hair than A B has lighter hair than C Does C have lighter hair than A? 24.1
Type 1
Type 2
Note: After DeSoto et al. (1965) correct obtained by DeSoto et al.
Table
3. The number
below
each
problem
indicates
percentage
relations, above-below, better-worse, and lighter-darker, differed in the degree to which they suggested an imaginal representation. Above-below can be directly translated into spatial-image form, so it should be easiest to work with; lighter-darker apparently does not suggest spatialization, and should therefore be more difficult. Better-worse should be intermediate in difficulty. If reading interferes with visual-spatial imagery, as Brooks (1967, 1968) has argued, the reading condition should be more difficult than the listening
Imagery and Problem Solving 363
condition and the interference effect should be greatest for problems involving spatial relations, intermediate for problems involving social-status relations, and least for the non-spatial hair-color problems. Previous findings regarding spatial-reasoning ability and its enhancement of memory in studies on the effectiveness of imagery mnemonics led us to predict better problemsolving performance for higher scorers on spatial-reasoning tests. improved performance was expected to be most marked for the more difficult problems, where individual differences could most easily show themselves (Ernest & Paivio, 1969, 1971). In Experiment I, 24 males and 24 females solved three-term series problems based on the three different comparative relations: Above-below (henceforth called ‘spatial’), better-worse (called ‘social’, to emphasize that a social comparison between individuals is being made), and lighter-darker (hair color). Half of the problems in each category were read to subjects from an audiotape-recording (the auditory or listening condition); the other half were read by them from a TV screen at a rate exactly matching the auditory tape-recording (the visual or reading condition). The major dependent measure was number of errors. In addition, each subject (1) completed a post-experimental questionnaire designed to gather introspective reports relevant to the hypotheses and (2) took three short spatial-reasoning tests. Method Overview of the Design Each of the 48 Ss (students in an introductory psychology course at the University of Michigan) was presented with 30 three-term series problems, one syllogism of each type (cf., the four rows in Table 1) in each of two modes of presentation, visual and auditory, for each of the three kinds of relations. Therefore, each S received 24 (4 X 2 X 3) problems to be scored for analysis. In addition, six filler problems were included in which the letter ‘X’ replaced the name of the person mentioned last in the question. These special problems were designed to encourage Ss to wait on all problems until complete information had been presented before giving an answer; they were not scored for analysis. Orders of presentation of the three kinds of relations, and of the visual and auditory modes of presentation, were counterbalanced across subjects. Since presentation order of the three kinds of relations was counterbalanced, differences between relation types can be analyzed either within or between subjects. In other words, all Ss received all treatments, but one-third got one relation first, another third a different relation, and so on. For each problem, an answer (yes, no, or failed to answer in 5 seconds) was recorded. In addition to providing these data, each S completed a post-experimental questionnaire and three spatial-reasoning tests from the Chicago Tests of Primary Mental Abilities (Thurstone & Thurstone, 1943).
364
Phillip Shaver, Lee Pierson, and Stephen Lang
Materials In order to keep Ss from reading ahead at an uncontrolled pace (a difficulty encountered in a pilot study), the premises and questions were each broken into three segments, and these were filmed with a 16 mm movie camera set to shoot one frame at a time. Each segment was photographed a specified number of times, the number corresponding to the amount of time taken to read these aloud in the auditory condition. (Exact specifications for the film have been provided by Shaver, 1970.) An example of a segmented problem appears below, with slashes indicating breaks: Line 1. Jack / is better/ than Walt. Line 2. Steve / is worse / than Walt. Line 3. Is Jack / better / than Walt? When the resulting film was shown, each line appeared one segment at a time, from left to right on the screen. This 16 mm animated film was transferred to video-tape for use in the experiment. Both visually- and auditorily-presented problems were replayed on a videotape recorder. Visual problems appeared on a 12 inch (diagonal) TV monitor; auditory problems were presented through the speaker system of the same monitor.
Procedure Experiment Each S was tested individually while seated at a table four feet from a TV screen. A set of videotaped instructions explained that S would be presented with problems made up of two factual statements and a question and would be allowed five seconds in which to answer the question out loud. It was explained that some problems would be presented auditorily, some visually, and that some would end in ‘X’, in which case S was always to answer ‘No’. There followed a number of practice problems involving the relations moreless, happier-unhappier, and younger-older. These were logically similar to the problems actually used in the experiment but did not involve the same relations. Following this, segments of the videotape were shown to S, each segment preceded by brief instructions stating whether the next several problems would be presented visually or auditorily, and stating which relation would be involved. Post-experimental questionnaire When the experiment was finished, S filled out a questionnaire covering the following issues: (1) Methods used to solve the problems; (2) whether these methods were different for the different relations (spatial, social, and hair color); and (3) for each kind of relation whether the visual or auditory form of presentation was more difficult to deal with. Upon completion of this questionnaire, S was introduced to a second E who escorted him to another room to take the spatial-reasoning tests.
Imagery and Problem Solsing
365
Red ts Errors Error data from the experiment were analyzed by a six-way analysis of variance having three between Ss factors and three within S’s factors. The between Ss factors were: (1) Sex, (2) type of relation received first, and (3) presentation modality received first (visual or auditory). The within Ss factors were: (1) Relation type (spatial, social, hair color), (2) reading versus listening presentation, and (3) premise combination (types 1 through 4, as shown in Table 1). Table 2 displays all statistically significant F ratios resulting from this analysis. Table 2.
Results of the analysis of errors
Source of variance
-df
Relation Types (spatial, social, hair-color) Reading vs. Listening Premise Combination Types Relation Types X Reading vs. Listening Relation Types X Premise Combination Types Reading vs. Listening X Premise Combination Types Reading or Listening Fz’rst
2148 l/24 3112 2148 61144 3112 l/24
-F 4.50** 22.12*** 22.25*** 2.30* 2.81** 2.29* 4.34**
* p < 0.10 p < 0.05 ***p < 0.01 **
The important main effects were all significant. The three kinds of relations differed in difficulty (p < 0.05) spatial being easiest to handle (19% error rate), social more difficult (26%), and hair-color most difficult (27%). Reading presentation proved more difficult than listening (30% compared with 18%; p < 0.01). The various premise combinations were different in difficulty (p < 0.01) and were ordered as follows: Type 1 (IO%), Type 2 (23%) Type 3 (24%), and Type 4 (39%). The pattern of differences between reading and listening means was as predicted; highest for spatial problems (a difference of 18%) intermediate for social problems (9%) and smallest for hair-color problems (8%). The F ratio for this interaction approached significance @ < 0.10). The interaction between relations and premise combinations was significant @ < 0.025). Inspection of the means indicated that this was due to a difference in the patterns of premise-combination difficulty between spatial and social problems, on the one hand, and hair-color problems on the other. The pattern closely paralleled that obtained by DeSoto et al. (1965; see the -percentage figures in Table 1 above). There was also an interesting, marginally significant interaction between reading versus listening and premise combination @ < 0.10). The gap between reading and listening got progressively wider as
366
Phillip Shaver, Lee Pierson, and Stephen Lang
Table 3.
Error percentages reflecting the interaction between relation type and mode of presentation
Relation
Mode of presentation
Spatial
31
9
22
Social
38
21
6
Hair color
36
33
3
Reading
Listening
Difference between reading and listening modes
the premise combinations became more difficult to handle. (The remaining significant effect was due to a procedural factor ~ whether reading or listening presentation came first. ‘Listening first’ was associated with a slightly higher mean percentage of errors. This is not important for present purposes since this factor did not interact with any of the substantive variables.) It seemed quite possible that the predicted interaction between reading versus listening and relation type was weak partly because of practice effects; therefore, a subset of the data was analyzed further according to a between Ss design. Data were included only for the relation S received first in the experiment. This analysis yielded significant F ratios for reading versus listening (F(1/36) = 8.107, p < 0.01) with reading proving more difficult, and for premise combinations (F(3/108) = 11.063, p < 0.01) with Type 1 easiest and Type 4 hardest. The hypothesized interaction between relations and reading versus listening was again only marginally significant (F(2/36) = 2.513, p < O.lO), but the pattern of means was exactly as predicted (see Table 3).* Questionnaire data At the end of the experiment subjects were asked to describe the method they had adopted for solving the problems, and to judge for each kind of relation whether visual, *In addition to the results presented here, Shaver (1970) reported analyses of response latencies. In general, these followed the same pattern as errors. Two internal analyses of the latency data provided additional evidence for the significance of imagery, however. (1) Subjects took longer to respond to hair-color problems presented in the auditory rather than visual mode only if they were presented before the spatial and social problems; this pattern was reversed when spatial or social problems were presented first. This suggests that subjects began to solve all of the problems spatially once they discovered the imagery technique, and that the discovery was less likely to be made when hair-color problems were presented first. (2) When trial blocks were treated as a factor in the analysis, in order to assess the effects of practice, a significant interaction was obtained between reading versus listening and trial blocks; improvement occurred only in the listening condition and performance actually got somewhat worse in the reading condition. This was interpreted as evidence that some subjects discovered the imagery strategy only after gaining experience with the problems; following the discovery, they improved in the listening condition but got worse in the reading condition, because reading interfered with their newly discovered strategy.
Imagery and Problem Solving 367
Table 4.
Self-reported
Visual-Spatial Unclassifiable Verbal
Frequency of judgments regarding which mode of presentation was more difficult given various strategies (within the hair-color condition) _____ strategy
mixture
‘More difficult’
mode of presentation
Reading
Neither
Listening
21 2 0
I 2 4
1 2 3
verbal, or neither mode of presentation seemed more difficult. Their answers were then classed as visual-spatial, verbal, or neither. (For examples of each category, see Shaver, 1970.) Despite the experimenter’s mild disapproval (subjects were asked to leave their hands resting in their laps), ten people used subtle combinations of finger counting, patterned finger interlocking, and so on. If finger manipulation is counted as a spatial strategy, the breakdown of results is as follows: Visual-spatial methods, 34 cases; verbal methods, 5 cases; methods not classified as either spatial or verbal, 9 cases. Scattered throughout the questionnaire responses were numerous comments that provide informal support for the reading interference hypothesis; for example: I tried to keep their names in order of lightness of hair, etc., in a column with lightness or badness or downness on the bottom. It was pretty easy with the verbal (mode of presentation), but with the visual I had to concentrate on reading the words, so I pretty much forgot where everyone was. Subjects were specifically asked which mode of presentation, reading or listening, seemed more difficult within each relation condition. A test of the perceived effect of the reading versus listening manipulation was performed in the hair-color condition, the only condition containing enough ‘verbalizers’ to allow meaningful analysis (see Table 4). Columns are defined by presentation modes judged most difficult; rows are defined by the methods subjects claimed to have used: Spatial, verbal, and strategies that were impossible to characterize in the dichotomous category system. Cell entries are the number of subjects in their column (e.g., all who say reading was more difficult) who claim to have used a given strategy (e.g., visual-spatial imagery). It is apparent that a relationship exists between method used and mode of presentation judged most difficult (gamma = 0.84, p < O.OOl), visual presentation being more difficult for ‘visualizers’. Spatial-reasoning scores A single spatial-reasoning score was obtained for each subject by summing across the three short tests given. (This was justified since the correlations between the tests were all highly significant for both sexes and the variances were nearly identical.) Composite scores for males and females were compared and, in line with previous research (Maccoby, 1966; Sherman, 1967) it was found that males achieved significantly higher scores than
368
Phi&p Shaver, Lee Pierson, and Stephen Lang
Table 5.
Correlations between spatial-reasoning scores and errors within each experimental condition for males and females
Conditions
Males r
Females
$w
r
322)
0.06 0.23 0.13
0.30 1.09 0.63
Spatial reading Spatial listening Total spatial
--0.12 -0.08 -0.12
0.58 0.36 0.58
Social reading Social listening Total social
-0.15 -0.31 -0.30
0.71 1.53* 1.47*
-0.05 0.34 0.11
0.21 1.67 0.53
Hair-color reading Hair-color listening Total hair-color
-0.33 -0.66 -0.60
1.64* 4.12*** 3.51***
0.15 -0.09 0.06
0.72 0.41 0.27
Total errors
-0.49
2.64**
0.16
0.77
* p < 0.10 ** p < 0.01 *** p < 0.001 females (t(46) = 2.96, p < 0.01, two-tailed). The mean and standard deviation for males were 111.67 and 31.27; for females these were 85.13 and 29.52. Thus, although females made no more errors than males in the experiment, their performance was considerably worse on the paper and pencil tests. Moreover, when correlations were computed to test for a relationship between number of errors and composite test scores, there was a highly significant negative correlation for males, r = -0.49 (t(22) = 2.66, p < 0.01) indicating that men who scored better on the standard tests tended to do better in the experiment as well. For females, however, there was no such relationship, r = 0.16 (t(33) = 0.77, ns). The reasons for this sex difference are not completely clear, although there were indications that the females’ scores were adversely influenced by motivational factors.* Whatever the dynamics underlying the women’s test performances, it is clear that their scores do not mean the same thing as the men’s Table 5 shows the correlations
*There is direct and indirect evidence for this interpretation. Horner (1970) has shown that American women perform poorly in many competitive situations because of conflicts created by noncompetitive norms for females. Such conflicts would be especially salient in the face of a conventionally masculine task such as solving spatial-reasoning problems. And, in fact, while males often comreadied themselves confidently and worked as mented, “Oh yes, I’ve taken tests like this before”, efficiently as possible, females typically said: “Oh no! I can’t do these; do I really have to do these?” Several continued to express conflict, aversion, and self-criticism throughout the test. Since females did as well as males during the experiment, which was not billed as a test, and all internal analyses and comments on the post-experiment questionnaires indicated that they used spatial imagery as frequently as men, it is quite plausible that their scores on the standard spatial-reasoning tests were invalid. This matter deserves further study in its own right.
Imagery and problem solving 369
between composite spatial-reasoning scores and number of errors within each condition of the experiment for both males and females. The males’ correlations show a highly systematic and statistically reliable pattern, while the women’s correlations are all nonsignificant. Further analyses will therefore As reported earlier, the three relations easiest, reasoning
social
intermediate,
ability
would
be confined to the men’s results. differed significantly in difficulty;
and hair color differentiate
most
between
difficult.
It was expected
Ss’ performances
most
spatial being that spatial-
in the difficult
conditions (as occurred in studies by Ernest & Paivio, 1969, 1971), and this was indeed the case. For spatial problems the correlation coefficient is -0.12; for social problems, -0.30; for hair-color problems, -0.60. Within the two more difficult conditions the correlation is lower for reading than for listening. This may indicate that spatial-reasoning ability is more effective when the problems are presented in a form compatible with visualization, i.e., through the auditory system, and is not as useful when reading interferes with the visualization (or spatialization) process. It may be asked whether a high correlation in the hair-color condition is consistent with our previous claim that these problems are not as likely as the other problems to be spatialized. Recall, however, that the claim was based on a comparison between people who received the hair-color problems first in the experiment and those who received them later. When the correlations between errors and spatial-reasoning scores are computed separately for these two groups, hair-color condition first (8 males) and hair-color condition later (16 males), the results are as shown in Table 6. Notice that the correlations are lower for the ‘first’ group, as expected. The samples are small, but the difference between -0.18 and -0.69 approaches significance (2 = 1.27, p = 0.10).
Table 6.
Correlations for males between spatial-reasoning scores and errors for two orders of presentation within the hair-color condition -t
Presentation orders
Conditions
Received hair color Problems First (N=8)
Hair-color reading Hair-color listening Total hair color
0.32 -0.48 -0.18a
0.85 1.34 0.45
Received hair color Problems Later (N=16)
Hair-color reading Hair-color listening Total hair color
-0.53 -0.72 -0.69a
2.34* 3.88*** 3.57**
-r
a The difference between these two correlations, interpreted as reflecting the difference spatialization on the first trial block and later trial blocks, yielded z = 1.27, p = 0.10. * p < 0.025 **p < 0.005 ***p < 0.001
between
370
Phillip Shaver, Lee Pierson, and Stephen Lang
Discussion of Experiment I Taken together the results support a modified imagery theory. (1) The three relations were not equally difficult to work with: Spatial (above-below) was easiest, social (betterworse) was intermediate in difficulty, and hair color (lighter-darker) was most difficult. Although other factors may have contributed to this effect, it is certainly compatible with a hypothesis based on the relative ease with which these relations can be spatialized. (2) The reading condition was more difficult than the listening condition. This was probably due, in some degree, to the novel form in which the visual problems were presented - i.e., as part of an animated movie. This factor alone, however, cannot explain the interaction between reading versus listening and relation type,the pattern of correlations between errors and spatial-reasoning scores, and many of the detailed comments made in response to the post-experimental questionnaire. (3) The significant differences between premise-combination types could perhaps have been predicted by either Huttenlecher or Clark. As mentioned earlier, their theories are not distinguishable on these grounds alone. Clark, however, would have difficulty explaining the pattern of.results for lighter-darker (hair-color) problems. The pattern of errors for the various premise combinations in the hair-color condition was distinctive both in the studies by DeSoto et al., (1965) and in the present study. DeSoto et al., called lighter-darker .a relation “with no tie to a spatial axis” (most people have) had little or no experience in thinking about lighterdarker comparisons with the aid of spatial imagery). Clark offered an alternative explanation in terms of the principle of lexical marking: Since lighter and darker are both marked, neither should be more difficult to work with verbally. Yet some subjects in the present study said explicitly in the post-experimental questionnaire that they experienced the difficulty implied by DeSoto’s explanation. For example: It was easy for me to set up the spatial problems because I could immediately see who was on top and who was on the bottom. But . .. it was hard to remember if it went from . dark to light, or vice versa. For darkness-lightness of hair I had a hard time trying to set them up in this way basically because I had a hard time trying to decide which finger would be lighter and which darker (i.e., what order to set names in). A few subjects mentioned another difficulty unique to the hair-color problems, i.e., that hair-color images may form a fixed, discrete ordering with three levels, whereas abovebelow and better-worse form infinitely continuous scales.) There are also some problems for the imagery theory, however, if it is interpreted as claiming that all people use imagery on all three-term series problems. There were indications in our results (1) that some relations are less likely than others to suggest an imagery strategy and (2) that practice with three-term series problems increases the likelihood that an imagery strategy will be adopted. Both of these findings imply that imagery is not a necessary component of the problem-solving process, although it is a useful
Imagery and problem solving 3 7 1
Diagram shown to subjects as part of instructions in the use of the visualspatial-imagery method.
Figure 1.
John “John is stronger than Bob” Bob “Tom
is weaker than Bob”
Tom
technique which can improve performance - perhaps by expanding short-term memory through the addition of an alternate coding system (Paivio & Csapo, 1973). If this interpretation is correct - i.e., if a strategy based on spatial imagery is not necessary, but enhances the ability to solve three-term series problems once discovered -- then number of errors should decrease when subjects are explicitly told about the imagery strategy before presentation of the problems. Experiment II was a test of this prediction.
Experiment
II
The ideal design for an experimental assessment of the efficacy of two problem-solving strategies would include three conditions: A no instructions condition and an instruction condition for each of the problem-solving strategies. The spatial imagery strategy is easy to describe, especially in reference to a diagram such as Figure 1, and this was in fact done in the present experiment. Subjects were asked to use the imagery technique “throughout the experiment whether or not it seems natural; the experiment depends completely on your adherence to this method”. Unfortunately, several readings of Clark’s papers did not suggest a way to instruct subjects in the use of his method. This is not necessarily a criticism of his theory, of course, since one need not specify mechanisms that are under a person’s conscious control. And, in fact, the psychological mechanisms related to lexical marking, congruence, etc., do not appear to be conscious. One part of the process specified by Clark’s theory, the strategy of ‘compression’, might be under conscious control; however, it does not provide an adequate solution for all three-term series problems, as Huttenlocher and Higgins (1971) have pointed out. Therefore, in the present experiment only two conditions were included: Instructions in the visual-spatial imagery technique and no instructions. In both conditions subjects were given practice
372
Phillip Shaver, Lee Pierson, and Stephen Lang
problems and were encouraged to “listen carefully and answer as quickly and accurately as you possibly can”. It was hoped that this instruction plus equalization of practice would reduce any motivational or attentional advantages that might have been created by having subjects in one group use a special technique.
Method Design and procedure
Thirty-two Ss (half males) enrolled in introductory psychology courses at Columbia University were randomly assigned to two conditions, Imagery Instructions and No Instructions. Following the instruction period described above, each S attempted to solve 32 three-term series problems involving the relations better-worse and lighter-darker. (Above-below problems were excluded in order to reduce the possibility that noninstructed Ss would quickly discover the imagery strategy.) The problems were given in four different random orders to reduce sequence effects; they were presented via audiotape-recording, each problem followed by a 10 second answer period. The major dependent variable was number of errors. Following the experiment, each S completed a short questionnaire concerning problem-solving strategy.
Results and Discussion of Experiment
II
The instructed subjects made an average of 8.13 errors compared with 12.13 errors for the non-instructed subjects (t( 14) = 1.80, p < 0.05). Examination of the post-experimental questionnaires indicated that the difference between conditions would have been even larger had we eliminated self-instructed subjects. For example, the best subject in the non-instructed group (who made only four errors) described his method as follows: In my mind I saw three levels, As the relationships were stated, I placed the names in the layers as they stood in relation to each other and then remembered who was at each extreme. In contrast, the worst non-instructed subject (23 errors) said: I answered with whatever came to my mind. I couldn’t always remember the names. I noticed that the problems were simpler when read (during the instruction period), harder when heard. In general, the results and comments supported the conclusions drawn from Experiment I. Subjects’ ability to solve three-term series problems was enhanced by the imagery strategy, but it is doubtful that the strategy was necessary. Several subjects in the noninstructed group, whose post-experimental comments indicated no use of imagery, did as well as instructed subjects. And a few instructed subjects indicated after the experiment
Imagery and problem solving
that the prescribed method seemed unnatural I, there were no significant sex differences.
and unnecessary.
373
Finally, as in Experiment
General Discussion
The results of two studies based on Paivio’s (197 1) three classes of converging operations indicate that imagery plays a functional but not a necessary role in the solution of threeterm series problems. Some of our specific findings warrant further exploration, especially the marginally significant interaction reported in Experiment I, but it seems unlikely that the general conclusion will have to be altered. In solving three-term series problems, many people find it useful to represent acquired information in the form of a unified visual-spatial image. Presumably the main function of the imaginal representation is to organize and store in an efficient form information regarding the relationships among objects or people (Huttenlocher, 1968; Huttenlocher & Higgins, 1971; Johnson-Laird, 1972). Imagery is especially suited for this task because it allows parallel or simultaneous processing of information that can be processed only sequentially in linguistic form (Paivio, 1971); it may also operate as an alternate coding system independent of language (Paivio & Csapo, 1973). Some people seem to discover the imagery technique only after practice with several problems. Johnson-Laird was thus correct to argue in a recent review paper that we “can no longer ask how an individual solves a three-term series problem without asking when in his intellectual development within the experiment it was given to him” (p. 81). However, he appears to have been mistaken in drawing other inferences from the previously available literature. For example, he hypothesized that imagery is abandoned in favor of a linguistic strategy after practice with three-term series problems. The opposite temporal sequence is indicated by our results, suggesting that, in this case at least, imagery provided the “more economical and specialized” strategy. At the moment no single theory ca.n adequately explain our results. Clark’s linguistic theory, which has focused more on comprehension than solution of problems, cannot explain most of our findings. Of course, this does not imply that his theory of comprehension is incorrect, only that it is insufficient to explain the way many people solve three-term series problems. Huttenlocher’s imagery theory, especially as expanded by Huttenlocher and Higgins (19’71), may be correct for some people at some stages in their development, but it is almost surely incorrect if interpreted as requiring that all people use imagery on all successfully solved three-term series problems. Johnson-Laird’s hybrid explanation, based on what he considered to be the viable elements in Clark’s and Huttenlocher’s theories, is also inadequate, especially if taken to include hypotheses like the one discussed above which are disconfirmed by our results. These theoretical failures may indicate not that the correct theory has yet to be constructed, but rather that there are several kinds of people, strategies, and task environments, and that we should not expect
374
Phillip Shaver, Lee Pierson, and Stephen Lang
to make accurate predictions of error rates or reaction times for groups of subjects without taking into account a number of variables besides type of problem. In the future it may be fruitful to focus on more general issues, such as the problem of cognitive representation and the reasons for the situational effectiveness of certain forms of representation (Newell & Simon, 1972; Paivio, 1971; Pylyshyn, 1972) rather than continue to concentrate on particular tasks. Meanwhile, Clark’s “only firm conclusion,” that “it has not been demonstrated that the use of spatial imagery differentially affects the solution of three-term series problems”, appears to be incorrect.
REFERENCES
Atwood, G. (1971)
An experimental study of visual imagination and memory. Cog. Psycho/., 2, 290299. Bower, G. H. (1970) Analysis of a mnemonic device. Amer. Sci., 58, 496-510. Brooks, L. R. (1967) The suppression of visualization by reading. Q. J. exp. Psychol., 19, 289-299. Brooks, L. R. (1968) Spatial and verbal components of recall. Can. J. Psychol., 22, 349-368. Clark, H. H. (1969) Linguistic processes in deductive reasoning. Psycho/. Rev., 76, 387-404. (a) Clark, H. H. (1969) The influence of language on solving three-term series problems. J. exp. Psychol., 82, 205-215. (b) Clark, H. H. (1971) More about “Adjectives, comparatives, and syllogisms”: A reply to Huttenlocher and Higgins. Psychol. Rev., 78, 505-514. Clark, H. H. (1972) Difficulties people have in answering the question “Where is it?“J. Verb. Learn. Verb. Beh., 11, 265-277. DeSoto, C., London, M., & Handel, S. (1965) Social reasoning and spatial paralogic. J. Pers. Sot. Psychol., 2, 513-521. Ernest, C., & Paivio, A., (1969) Imagery ability in paired associate and incidental learning. Psychon. Sci., 15, 181-182. Ernest, C., & Paivio, A. (1971) Imagery and verbal associative latencies as a function of imagery ability. Can. J. Psycho/., 25, 83 -90. Handel, S., DeSoto, C., & London, M. (1968) Reasoning and spatial representation. J. Verb. Learn. Verb. Beh., 7, 351-351. Horner, M. S. (1970) Femininity and successful achievement: A basic inconsistency. In _I. Bardwick, E. Douvan, D. Gutman, & M. S. Horner, Feminine personality and conflict. Belmont, California, Brooks/Cole. Huttenlocher, J. (1968) Constructing spatial images: A strategy in reasoning. Psycho/. Rev., 75, 550560. Huttenlocher, H., Eisenberg, K., & Strauss, S. (1968) Comprehension: Relation between perceived actor and logical subject. J. Verb. Learn. Verb. Beh., 7, 300-304. Huttenlocher, J., & Higgins, E. T. (1971) Adjectives, comparatives and syllogisms. Psychol. Rev., 78, 487-504. Huttenlocher, J., Higgins, E. T., Milligan, C., & Kauffman, B. (1970) The mystery of the “negative equative” construction. J. Verb. Learn. Verb. Beh., 9, 334-341. Huttenlocher, J., & Strauss, S. (1968) Comprehension and a statement’s relation to the situation it describes. J. Verb. Learn. Verb. Beh., 7, 527-530. Johnson-Laird, P. N. (1972) The three-term series problem. Cog., I, 57-82. Jones, S. (1970) Visual and verbal processes in problem-solving. Cog. Psychol., I, 201-214. Maccoby, E. E. (1966) Sex differences in intellectual functioning. In E. E. Maccoby (Ed.), The development of sex differences. Stanford, Stanford University Press. Neisser, U. (1970) Visual imagery as process and as experience. In J. S. Antrobus (Ed.), Cognition and affect. Boston, Little Brown.
Imagery and problem solving 3 75
Newell, A., & Simon, H. A. (1972) Human problem-solving. Englewood Cliffs, New Jersey, PrenticcHall. Paivio, A. (1969) Mental imagery in associative learning and memory. Psychol. Rev., 76, 241-263. Paivio, A. (1970) On the functional sigificance of imagery. Psychol. Bull., 73, 385-392. Paivio, A. (1971) Imagery and verbalprocesses. New York, Holt, Rinehart and Winston. Paivio, A., & Csapo, K. (1973) Picture superiority in free recall: Imagery or dual coding? Cog. Psychol., 5, 176-206.
Pylyshyn, 2. W. (1972) The problem of cognitive representation. Research Bulletin No. 227, University of Western Ontario, Department of Psychology. Shaver, P. R. (1970) Interference with spatial imagery during problem-solving. (Doctoral dissertation, University of Michigan) Ann Arbor, Michigan: University Microfilms, No. 71-23, 872. Sherman, 3. (1967) Problems of sex differences in space perception and aspects of intellectual functioning. Psychol. Rev., 74, 290-299. Thurstone, L. L., & Thurstone, T. G. (1943) The Chicago tests of primary mental abilities: Manual of instructions. Chicago, Science Research Associates.
Rksume’
On a propose deux types d’explications pour rendre compte de la &solution de problimes de series i trois termes: L’une en termes linguistiques, l’autre en termes d’images visuo-spatiales. On presente deux expdriences dans lesquelles trois diffirentes classes d’opiration portent SUI ces probl8mes: (1) La manipulation d’attributs de stimulus (caractiristiques des probl&mes); (2) La manipulation de variables qui avantagent ou au contraire, inhibent l’utilisation de l’image (consignes facilitantes; suppression de la lecture visuelle) et (3) la mesure de diffirences individuelles pertinentes (capacit6 $ lire dans l’espace). Tous les r6sultats montrent que l’image joue un rale fonctionnel mais non ndcessaire dans la r&olution de problimes de series $ trois termes. On suggere que la pr8sentation imagke est fonctionnelle dans la mesure oi elle rCduit le poids dc la m6morisation. Une explication adbquate de la solution du probl&me doit rendre compte de certaines questions g&n&ales telles que la diversit des formes de repr&entations cognitives et les diffirences entre individus dans le choix de stratkgies de solution de probl&mes.
6
Scientific
perspectives
and philosophical dead ends in modern linguistics
A. R. LURIA University
of Moscow
This paper is a discussion of Chomsky’s hypothesis on the ‘inherited nature’ of linguistic structures. The present writer highly disapproves of this hypothesis which has been the object of very vivid interest over the last decade. He has the feeling that it leads to a philosophical dead end and that the further development of modern linguistics will depend upon our putting it aside in order to turn to a careful study of epistemological issues and the psychological roots of speech and language. The contribution of Noam Chomsky, who is one of the leading scholars of our time, evokes a double feeling. In the first place, that of the highest appreciation for his revolutionary findings which have opened up new horizons in linguistics and psychology. In the second, doubt over the assumption of the ‘innate’ nature of deep linguistic structures and the feeling that both philosophically and psychologically this hypothesis leads nowhere. Although we must praise the body of his work, we feel impelled to point out some of its scientific shortcomings. As will be shown, these shortcomings are the result of defects in the philosophical base which underlies them.
I. For many years, linguistics remained the descriptive and comparative study of languages _ their general laws as well as their differences. However, a turning point was reached when certain scholars suggested that the precision of the linguistic field was close to that of mathematics and in recent years linguists have been faced with certain basic paradoxes stemming from this finding. It is a well known fact that there are probably several hundred languages in existence today. Furthermore, it is also known that there are an indefinite number of different forms and individual combinations used in separate languages. Now the question arises. How can a child master such a vast variety of linguistic combinations in such a short time? How can a man use such an incredible variety of verbal expressions without making mistakes? Neither classical concepts which attempted to trace the roots of language to certain forms of animal communication or to the fundamental rules of the mind or of
Cognition
3(4), pp. 377
- 385
378
A. R. Luria
‘symbolic forms’, nor the efforts of the associationists of the 19th century and the behaviorists of the 20th century have found a plausible answer to this basic problem. Thus the need to find certain fundamental laws which will help us to solve this paradox and to understand the general rules of language acquisition and use has been urgent. The first steps towards a solution to this problem were made a generation ago by a group of scholars including F. de Saussure (1916), N. Troubetzkoi (1939), R. Jakobson (1941 and 1950) and later, N. Chomsky. In 1939, Troubetzkoi published his well known ‘Foundations of Phonology’ which was followed by the publications of R. Jakobson and other of his collaborators. These studies opened new vistas in the analysis of the basic structures of linguistic sounds. They claimed that the entire wealth of sounds existing in all languages could be reduced to a system of polar opposites related to a limited number of phonemic or distinctive features. This reduction can .be looked on as one of the more important discoveries in modern science. It demonstrates that only a limited number of phonological rules exist and that if these are mastered an individual can master the entire range of language sounds. A similar approach was adopted by Chomsky who demonstrated that the same principle could be applied to syntactic structures. An.infinite and incredibly rich variety of concrete syntactic structures can be reduced to a limited number of ‘basic’ or ‘deep’ structures and by mastering these a child comes to learn a wealth of possible ‘superficial’ structures. This idea which is one of the cornerstones of ‘generative’ or ‘transformational’ grammar has served to transform modern linguistics into a science ruled by laws as precise as those of mathematics: “... Notice, incidentally, that the existence of definite principles of universal grammar makes possible the rise of the new field of mathematical linguistics, a field that submits to abstract study the class of generative systems meeting the conditions set forth in universal grammar. This enquiry aims to elaborate the formal properties of any possible human language. The field is in its infancy; it is only in the last decade that the possibility of such an enterprise has been envisioned.” (Language and Mind, p. 7 1) Furthermore, Chomsky believes that this new type of linguistics will point to new and important areas of future research: “Thus, mathematical linguistics seems for the moment to be in a uniquely favorable position, among mathematical approaches in the social and psychological sciences, to develop not simply as a theory of data, but as the study of highly abstract principles and structures that determine the character of human mental processes. In this case, the mental processes in question are those involved in the organization of one specific domain of human knowledge, namely knowledge of language.” (Language and Mind, pp. 71-72)
Scientific perspectives and philosophical dead ends in modern linguistics 379
II. Chomsky’s research rapidly led him to question the origin of this finite group of basic rules and ‘deep structures’ which a child masters with such ease. The usual scholar would probably have turned to the social history of language in attempting to find a step by step solution to the problem of how these structures are organized and in what way they are related to pre-verbal behavior. He would have tried to transcend the limits of the mind and the human organism in order to find antecedents in the history of our society and in the active forms of man’s relations with reality. But Chomsky’s fundamental philosophical approach has not led in this direction. He has not attempted to find the origin of linguistic structuresin the active relationship between the objective world and the non-verbal actions of the subject. Chomsky has kept the traditional concepts of ‘mind’ and ‘matter’ separate. This limitation, inherent in all traditional philosophy, is stifling to all future research and will prevent any attempt at solving the basic problem. In trying to find heuristic sources in modern psychology, Chomsky has looked to contemporary american psychology and more particularly behaviorism for an answer. This is a fatal mistake It is well known that American behaviorism does not deal with the complex forms of man’s active processing of the world nor with the possible historical origins of the complex psychological processes which underlie man’s conscious actions. Strict behaviorism as represented by B. F. Skinner (1937) is based on weak concepts borrowed from 19th century associationism and the physiology of conditioned reflexes (without entering into a discussion of the neurodynamic rules of the latter). Obviously, Chomsky cannot agree with such poor and purely mechanistic schemes which are based on the concept of an ‘empty organism’ and which attempt to relate all behavior to the primitive concepts of contiguity and reinforcement. His critique (1957) of Skinner’s Verbal Behavior moreover, is probably one of the most convincing evaluations of this totally untenable mechanistic approach to the complexities of human behavior. Chomsky’s mistake is that he looks on Skinner’s oversimplified schemes as psychology’s one and only attempt at approaching the problem of man’s creative processes, and he concludes his discussion in Language and Mind (1968) by saying “Honesty forces us to admit that we are as far today as Descartes was three centuries ago from understanding just what enables a human to speak in a way that is innovative, free from stimulus control and also appropriate and coherent. This is a serious problem that the psychologist and biologist must ultimately face and that cannot be talked out of existence by invoking ‘habit’ or ‘conditioning’ or ‘natural selection’.... The properties of human thought and human language emphasized by the Cartesians are real enough; they were then, as they are now, beyond the bounds of any well-understood kind of physical explanation. Neither physics nor biology nor psychology gives us any clue as to how to deal with these matters.” (pp. 12-13). Having arrived at the right conclusion concerning the fact that the mastering of the infinite variations of linguistic forms cannot be deduced from the simple rules of
380
A. R. Luria
‘contiguity’ and ‘reinforcement’, Chomsky decides that the solution to the problem can be found in the history of dualist philosophy which attempted, as far back as the 17th century with Descartes and the Port Royal grammar to single out special creative forms of logical thinking. The atmosphere of our time, he believes, is closer to that of the 17th century philosophers than to the 19th century empiricists. In the 19th and 20th centuries, disappointment in the mechanistic approach which attempts to reduce complex events to elementary units and their associations, has been translated, insofar as psychology is concerned, in a growing disenchantment with behaviorism. In philosophy, this tendency has been manifest in the growing interest in synthetic approaches towards reality and in creative heuristic forms of thought. Cartesian thinking to which Chomsky subscribes falls, of course, within this trend. As Chomsky himself says: “ the general structure of the argument is not unreasonable.... The Cartesians . .. tried to show that when the theory of corporeal body is sharped and clarified and extended to its limits, it is still incapable of accounting for facts that are obvious to introspection and that are also confirmed by our observation of the actions of other humans. In particular, it cannot account for the normal use of human language, just as it cannot explain the basic properties of thought. Consequently, it becomes necessary to invoke an entirely new principle - in Cartesian terms, to postulate a second substance whose essence is thought, alongside of body, with its essential properties of extension and motion. This new principle has a ‘creative aspect’, which is evidenced most clearly in what we may refer to as ‘the creative aspect of language use,’ the distinctively human ability to express new thoughts and to understand entirely new expressions of thought, within the framework of an ‘instituted language’, a language that is a cultural product subject to laws and principles partially unique to it and partially reflections of general properties of the mind.” (Language and Mind, p.6) ’ Chomsky’s discovery of deep syntactic structures and his disappointment with mechanistic approaches towards the solution of their origins have led him to turn to classical rationalism and Cartesian thought. 17th century philosophy, with its dualistic approach is more appealing to him than objective, materialistic epistemology with its socio/historical searchings for data. Only rationalist philosophy appears capable of disclosing the essence of the universal principles of the mind as they are observed in language and thought. With the publication of Cartesian Linguistics (1966) and Language and Mind (1968) Chomsky became a confirmed Cartesian. The basic condition for the understanding of language, he maintains, is ‘competence’, and only ‘competence’ can provide accurate linguistic ‘performance’. All attempts to explain the origins of linguistic structures are terminated to be replaced by the intuitive concept of ‘competence’ which, rather than being a result of the development of language is a basic a priori condition.
Scientific perspectives and philosophical dead ends in modern linguistics
38 1
There is nothing really novel, however, in Chomsky’s turning from modern science to Cartesian philosophy. Disappointment with associationist theories led psychologists to adhere to the Wiirzburg mentalist movement at the turn of this century and the failure of the Markov chains in explaining grammatical structures resulted in Cartesian principles and ideas about the innateness of deep linguistic schemes.
III.
It would be a mistake to assume that Chomsky’s concept of specific wholistic schemes of language remain tied to the framework of 17th century rationalism alone. As the progressive scholar and leader that he is, Chomsky has attempted to establish significant analogies between his own theories and certain basic trends in modern science which share his approach towards mechanistic science. It is for this reason that he has paid some attention to the biological discipline which has emerged in recent years under the name of ethology and which is best represented by Lorenz, Timbergen and Thorpe. The work of these scientists has demonstrated that a great many behavior patterns displayed by humans are also manifested by animals and require no specific level of civilization. In one of his publications, moreover, Lorenz (1941) has attempted to establish a direct relationship between Kant’s a priori categories and his own findings and he claims that Kantian philosophy bears out his own ideas. Chomsky has also looked closely at advances made in neurophysiology and in particular at the analysis of the functions of single neurons. The works of Hubel and Wiesel (1963) which are now classics in the field, and a subsequent series of publications have shown that single neurons can have very special, highly differentiated inborn functions - some of them respond only to horizontally oriented movements running from center to periphery, while others respond only to vertically oriented movements, etc. It is even thought that more complex patterns can be associated with the activity of isolated neurons or neuronal groups. It is only natural that Chomsky should have attempted to find support for his ideas on the existence of a priori, innate linguistic structures in ethology and neurophysiology since the existence of findings in these areas made it possible for him to assume that deep linguistic structures are inborn and that some innate forms of mind exist.
“We must postulate an innate structure that is rich enough to account for the disparity between experience and knowledge, one that can account for the constructions of the empirically justified generative grammars within the given limitations of time and access to data.... I think that the study of problems of the mind has been definitely hampered by a kind of a priorism with which these problems are generally approached. In part the empiricist assumptions that have
382 A. R. Luria
dominated the study of acquisition of knowledge for many years seem to me to have been adopted quite without warrant and to have no special status among the many possibilities that one might imagine as to how the mind functions.... The idea of a triangle is innate. Surely the notion is comprehensible; there would be no difficulty, for example, in programming a computer to react to stimuli along these lines. Similarly, there is no difficulty in principle in programming a computer with a schematism that sharply restricts the form of a generative grammar, with an evaluation procedure for grammars of the given form, with a technique for determining whether given data are compatible with a grammar of the given form, with fixed substructures of entities (such as distinctive features), rules, and principles, and so on - in short, with a universal grammar of the sort that has been proposed in recent years. For reasons that I have already mentioned, I believe that these proposals can be properly regarded as a further development of classical rationalist doctrine.... To my knowledge, the only substantive proposal to deal with the problem of acquisition of knowledge of language is the rationalist conception.... As I have now emphasized several times, there seems to be little useful analogy between the theory of grammar that a person has internalized and that provides the basis for his normal creative use of language, and any other cognitive system that has so far been isolated and described; similarly there is little useful analogy between the schema of universal grammar that we must, I believe assign to the mind as an innate character and any other known system of mental organization.... Turning to comparative ethology, it is interesting to note that one of its earliest motivations was the hope that through the ‘investigation of the a priori, of the innate working hypotheses present in sub human organisms,’ it would be possible to shed light on the a priori forms of human thought. This formulation of intent is quoted from an early and little known paper by Konrad Lorenz.” (Language and Mind, pp. 79,80,88,90,95)
Statements like the above are repeated in all of Chomsky’s philosophical writings and this has rendered the idea of the ‘innate’ nature of the most complicated of linguistic structures (such as those found in ‘universal grammar’), one of the most contested in recent years. The shortcomings, however, of this philosophy which is in conflict with Chomsky’s own findings and the general tendencies of contemporary science, mean that the broad scientific vistas which he has opened can only ultimately lead to a dead end. The idea of ‘deep structures’ and ‘linguistic competence’ constitute the basic problem in his thinking since the origin and historical development of these phenomena would require careful scrutiny. Furthermore to assume that deep structures are ‘innate’, makes a postulate out of a problem and this in itself means that all further study in the area can lead us nowhere.
Scientijic perspectives and philosophical dead ends in modern linguistics
383
IV. It may take many generations before we can explain how a child masters linguistic forms with such ease. Nonetheless, the basic principles to be employed in such a study can be established now and may constitute an alternative to the idea of the a priori existence of deep linguistic structures. A great many well known authors have already maintained repeatedly that there is no direct relationship between general linguistic structures and the general functioning of the mind. Piaget (1967) believes that language cannot explain thought. T. Bever (1970a, 1970b) believes that thought structures may be related in some complicated way to grammatical structures and maintains that certain basic perceptual and psychological strategies employed by the child in acquiring verbal information should be studied. Slobin (1971), I. M. Schlesigner (1971), Roger Brown (1973, 1974) and D. McNeil1 (1966, 1970) and others all hold similar views. Although they approach the study of language acquisition by the young child from many different viewpoints, their basic assumption is that the acquisition of linguistic structures is rooted in the child’s actions which serve as the background as well as an inseparable component of the first forms of a child’s ‘sympractic’ speech. We are furthermore happy to note that Jerome Bruner (1974) in a recent publication upholds this basic position which is also very close to our own. He attempts to find the roots of linguistic structure in the basic organization of the young child’s action and attention which single out the object which will provide the background for such basic pre-verbal relations as Subject-Object, Object-Action, etc. rather than in ‘innate forms of the mind’. All this means of course, that we are a far cry from the development of basic linguistic structures, that these are not in themselves the product of innate mental categories but rather the result of various forms of active reflection on the objective world and the active relationship between the subject and the world, and that we must look for the roots of basic linguistic structures in the relations between the active subject and reality and not in the mind itself. Consequently, we must consider that linguistic ‘competence’ which Chomsky believes is intuitive, is in actual fact the result of a long and dramatic evolution and is a problem rather than a postulate. Furthermore, we should assume that ‘competence’ is the result of long and dramatic ‘performances’ which were endowed with prelinguistic characteristics from the start, but which acquired their linguistic traits during the young child’s early contact with the speaking environment (children raised in orphanages where contact with the verbal environment is often poor take longer to acquire language than those exposed to rich verbal environments). Language is thus a system of codes used to express the relations of the subject with the outside world. As Bruner maintains, moreover, and we must agree, these basic forms of active interrelations between the subject and the external world including such relationships as that between the subject and action (S-P) and such subject-action-upon object (S+P+O) are first developed during the course of philogenetic (socio-historical) and ontogenetic preverbal
384
A. R. Luria
acfiviry.
In addition, the movements of a very young child involving manipulation of objects or shifting of attention from one object to another are of basic importance in the subsequent development of language and speech. The basic forms of future linguistic activity can already be observed through the careful study of an infant’s relationship with its mother and its activities and eye-movements. But in order to be in a position to make the above assumptions, we are obliged to transcend the limits of the organism itself in order to concentrate on the basic relations of the subject with the outside world while bearing in mind that all patterns present in the human mind are simply a reflection of the interaction between the subject and the outside world. It is only in adopting an approach like this that we can arrive at a genuinely scientific study of what were formerly referred to as the inner ‘properties’ of the mind. It seems safe to assume that the basic schemes of deep syntactic and deep semantic structures (including the ‘lexical functions’ which have been carefully studied by the Soviet linguists, Zholkowski and Melchuck) like the basic forms expressing the beginning (-incip), the end (-fin), th e causation (-caus) or the functioning (-func, -oper) are the most important reflections of the basic forms of existence and human action. It is for this reason that ‘deep syntactic structures’ as well as ‘deep semantic structures’ should be looked on as the reflection of objective external relations having an objective significance as well as a long pre-verbal history which should be carefully studied in order to leave no room for the assumption that they stem from ‘innate rational schemes’. The child’s acquisition of basic linguistic structures has been the object of many studies and is certainly one of the most productive lines of research in modern science. Were research in this area to be approached in a truly scientifically philosophical manner which would bring our attention to bear upon the real history of basic (pre-verbal and verbal) forms reflection on the outside world as well as the history of real forms of human action, ail hypotheses concerning the innateness of basic (deep) structures would be rendered useless. Furthermore, it can be supposed that if such an approach, which is far from either the simplified patterns of mechanistic behaviorism or the assumption of ‘innate’ mental structures to be applied it would open new and broader paths towards a better understanding of the complex process involved in the development of human speech. RCFERENCIS Bever, T. (1970) The influence of speech performance on linguistic structure. In G. B. Flores d’Arcais & W. J. M. Levelt (eds.) Advances in Psycholinguistics, Amsterdam, North Holland Publishing co. pp. 4-30. (1970) The cognitive basis for linguistic structures. In J. R. Hayes (ed) Cognition and the Development of Language, New York, Wiley. Pp. 279-362. Brown, R. and U. Bellugi (1964) Three processes in the child’s acquisition of syntax. In E. H. Lenneberg (ed.), New Direction of theStudy oflanguage. Cambridge, M.I.T. Press. Pp. 131-162. Brown, R. (1973) The First Language. Cambridge, Harvard University Press.
Scientific perspectives and philosophical dead ends in modern linguistics
385
Bruner, J. S. (1974) The Ontogenesis of Speech Acts (in press). Chomsky, N. (1957) Syntactic Structures, The Hague, Mouton. (1957) Review of Skinner’s verbal behavior. Language, 35, 26-58. (1959) On certain properties of grammar. Inform. Control, 2, 137-167. (1961) Some methodological remarks on generative grammar. Word, 17, 219-239. (1962) A transformational approach to syntax. In A. A. Hill (ed.) Proceedings of the 1958 Conference on Problems of L&uistic analyses. Austin, Texas. Pp. 124-148. (1963) Formal properties of grammar. In R. D. Lucc, R. Bush, E. Galanter (cds.) Handbook of mathematical Piyc/;ology, Vo; III, New York, Wiley. Pp. 323-418. (1964) Current Issues in Linguistic Theory. The Hague, Mouton. (1965) Aspects of the Theory ofSyntax. Cambridge, Mass., M.I.T. Press. (1966) Topics in the theory of the generative grammar. In Th. Seboek (cd.) Current Trends in Linguistics, Vol. III, The Hague, Mouton. (1966) Cartesian Linguistics, New York, Harper and Row. (1972) LanguageandMind. New York, Harcourt, Brace and Johanovich. (1972) Studies on Semantics in Generative Grammar, The Hague, Mouton. (1972) Psychology and ideology, Cogn., 1, 1 l-46. (1970) Deep structures, surface structures and semantic interpretations. In R. Jakobson and Sh. Tavamoto (eds.) Studies in General and Oriental Linguistics, Tokyo. Hubel. D. H. and T. N. Wiesel (1963) Receptive Fields of cells in striate cortex of very young incxperienced kittens. J. Neurophysiol., Wash,, 26. Jakobson R. (1941) Kindersprache, Aphasieundallgemeine Lautgesetze. Uppsala, Almcluist and Wiskell. Jakobson R. and M. Halle (1950) &‘undamentals of Language, The Hague, Mouton. Lorenz, K. (1941) Kant’s Lehre von Apriorischem im Lichte der gegcnwartigcn Biologie. Blitf. deut. Phil., 15, 99-125. McNeil], D. (1966) Developmental psycholinguistics. In I:. Smith and G. A. Miller (eds.) The Genesis of Language. Cambridge, M.I.T. Press. (1970) The Acquisition of Language. New York, Harper and Row. Melchuck, I. A. (1974) An Essay of the Theory of a Linguistic Model “meaning-text”, Moscow, “Nauka” Publishing House (In Russian). Miller, G. (1963) Introduction to the formal analysis of natural language. In R. D. Lute, R. Bush and E. Galanter (eds.). Handbook of Mathematical Psychology. New York, Wiley, Vol. 11, pp. 269322. Piaget, J. (1967) Six Psychological Studies, New York, Vintage Books. Saussure, I:. de (19 16) Cours de Linguistique g&ze%ale, Paris, Lausanne. Schlcsigner, I. M. (1971) Production of utterances and language acquisition. In D. I. Slobin (ed.) The Ontogenesis of Grammar, New York, Academic Press. Pp. 63-101, Skinner, B. F. (1937) Verbal Behavior, New York, Appleton, Century Crofts. Slobin, D. I. (1971) The Ontogenesis of Grammar, New York, Academic Press. Slobin, D. I. (1971) Cognitive prerequisites for the development of grammar. In C. A. Fcrguson and D. I. Slobin (eds.) Studies of Child Language Development, New York, Holt, Rhinehart and Winston. Pp. 175-216. Troubetzkoi, N. S. (1939) Grundziige der Phonologic, Prague. Zholkowski, A. K. and Melchuck, I. A. (1967) On the semantic synthesis, Problems of Cybernetica, No 9, Moscow (in Russian). (1969) On the construction of the working model of language. Machine Translation and Applied Linguistics, No. 12, Moscow (in Russian).
Discussion
On interpreting
reasoning data - A reply to Van Duyne
J. St. B. T. EVANS Plymouth Polytechnic
In a previous article (Evans, 1972a) it was argued that theorists had misinterpreted data in reasoning experiments by assuming that subjects were responding to the logical structure of the problems. Instead it was proposed that results should be interpreted in relation to two kinds of ‘non-logical’ variables, those which affect the interpretation of the sentences forming the propositions, and those which affect the reasoning operations involved. One such operational variable discussed was ‘matching bias’ or the tendency for subjects to select values named in logical rules for testing hypotheses, irrespective of the logical consequence of such selections. This has been demonstrated by manipulating the presence of negative components in conditional rules (Evans, 1972b, Evans & Lynch, 1973). Van Duyne (1973, 1974) in criticising my approach to these problems has concentrated on the problem of ‘matching bias’ as an explanatory concept. His criticisms may be summarized as follows: (i) The matching bias effect lacks empirical generality (ii) My psychological explanations of the effect are vacuous (iii) The postulation of a ‘non-logical’ matching bias is inconsistent with introspective protocols which show that “in these experiments subjects are trying to reason correctly” (Van Duyne, 1973, p. 240) In support of the explanatory power of the matching bias hypothesis it should be recalled that it is the only explanation of Wason’s well known ‘selection task’ (for detailed discussion see Wason & Johnson-Laird, 1972) to have been deduced from an entirely independent reasoning situation (Evans, 1972b) and subsequently predicted and observed on the selection task (Evans & Lynch 1973). It is true that matching bias does not generalize to problems with realistic materials, at least so far as the selection task is concerned, since in these circumstances subjects tend to produce the logically correct responses, (e.g., Wason & Shapiro, 1971, Johnson-Laird, P. Legrenzi &, M. Legrenzi, 1972), presumably because there is a strong semantic basis for interpretation. Van Duyne correctly states that the effect is also not observed on abstract tasks when disjunctive rules are used. This is rather puzzling but the introduction of negative components produces sentences which are so unnatural as to be almost impossible for subjects to interpret (see Evans 1972a). I do, however, have some data to show that the
Cognition
3/41, pp. 387 - 390
388 J. St. B. T. Evans
Table 1.
The psychological truth tables obtained for the IT and OI groups compared with those obtained by Evans (19 72b)
_I__--~.__ Rule
____
____
Task *
____-._
P9
---___
T
”
______
17
1
~._ Logical
values
-
P9 1:
T
correct correct
matching
?
T
1:
Q__._
verifying falsifying
bias effect is not restricted (1972b)
and Evans & Lynch (1973)
experiments. the City of London
Polytechnic
(1972b)
students Subjects were randomly divided problems in the linguistic
at
experiment,
example, a might be given the rule ‘If the letter is not B then the number is 5’ and asked to decide each of the following logical case of the truth table appearing once). It was thus possible ‘false’ (F). The of this experiment (1972b) are included striking in these is that for ‘If .. . then . ..’ rules, the tion task data are almost identical to those observed (1972b). A set of logically equivalent phrased in the form ‘... only if ...’ produced
On interpreting reasoning data - A repZy to Van Duyne
389
similar results when the antecedent was affirmative, but a slightly different distribution when the antecedent was negative. Evans (1972b) found that matching bias affected the probability of selecting a case but not the direction (true or false) of those selected. Consequently, the frequency of ‘irrelevant’ classifications, corresponding to ‘failures to select’ in Evans (1972b) were examined for the evaluation task data. As can be seen from Table 1 there is a tendency for ‘irrelevant’ classifications to increase with mismatching values - moving from left to right across the table - interacting with strong tendencies to recognise the correct verifying and falsifying contingencies. Holding the logical case constant in comparisons between rules there was found to be a significant tendency for the frequency of ‘irrelevant’ classifications to change in the direction predicted by matching bias for both IT and 01 rules. Van Duyne’s second criticism is the lack of psychological explanation of the matching effect. The results given above eliminate any explanation which is specific to construction tasks (i.e., a tendency to select named values) but allow hypotheses such as ‘the subject’s attention is directed to the named values’ or ‘the subject considers that the rule is only about the named values’. Such formulations do seem rather vacuous, although they do not fall into the danger of tautology suggested by Van Duyne, since they allow prediction over a range of different experimental tasks. The third criticism that a ‘non-logical matching bias’ does not correspond to subjects’ introspections, is crucial since protocols are being used increasingly to support alternative theories. Thus, for example, Goodwin & Wason (1972) produced protocols which appear to confirm the stages of insight which Johnson-Laird & Wason (1970) postulate to occur in the selection task, and certainly contain reference to the logical structure of the problem. A theoretical proposal which reconciles the matching bias hypothesis with these protocol data, is that the introspections do not reveal the processes underlying selections, but reflect a separate, independent thought process in which the subject essentially rationalizes his own behavior (Evans, 1974; Wason & Evans, 1975). In making these defenses against Van Duyne’s criticisms, I do not wish to convey the impression that all is well with the account presented by Evans (1972a). Indeed, it is seriously deficient in one respect as Van Duyne (1973) observes, “Evans does not indicate anywhere the nature of the interaction between ‘non-logical operational variables’ and the interpretation of the sentences....” In essence what is required is a model of how operational and interpretative processes combine in an individual to produce the responses he actually makes.
390
J. St. B. T. Evans
REFERENCES Evans,
J. St. B. T. (1972a) On the problems of interpreting reasoning data: logical and psychological approaches. Cog., 1,373-384. Evans, J. St. B. T. (1972b) Interpretation and ‘matching bias’ in a reasoning task. Q. J. eXP. Pr.Ychol., 24, 193-199.
Evans, J. St. B. T. (1974) On the origin of selections in the selection task. Paper read to the Trento conference on the selection task. Evans, J. St. B. T. and Lynch, J. S. (1973) Matching bias in the selection task. Brir. J. Psychol., 64, 391-397. Goodwin, R. Q. and Wason, P. C. (1972) Degrees of insight. Brif. J. Psychol., 63, 205-212. Johnson-Laird, P. N., Legrenzi, P. and Legrenzi, M. S. (1972) Reasoning and a sense of reality. Brif. J. Psychol., 63, 395-400. Johnson-Laird, P. N. and Wason, P. C. (1970) A theoretical analysis of insight into a reasoning task. Cog. Psychol., 1, 134-138. Van Duyne, P. C. (1973) A short note on Evans’ criticism of reasoning experiments and his matching bias hypothesis..Cog., 2, 239-242. Van Duyne, P. C. (1974) Realism and linguistic complexity. Brit. J. Psychol, 65, 59-67. Wason, P. C. and Johnson-Land, P. N. (1972) Psychology of Reasoning: Structure and Content. London, Batsford. Wason, P. C. and Evans, 1. St. B. T. (1975) Dual processes in reasoning? Cog. 3, 141-154. Wason, P. C. and Shapiro, D. (1971) Natural and contrived experience in a reasoning problem. Q. J. exp. Psychol., 23,63-71.