Cognition, Vol. 5, No. 4

Copzition. 5 (1977) 287 - 332 @Elsevier Sequoia S.A., Lausanne - Printed 1 in the Netherlands Planning meals: Problem-...

37 downloads 1186 Views 6MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Copzition. 5 (1977) 287 - 332 @Elsevier Sequoia S.A., Lausanne - Printed

1 in the Netherlands

Planning meals: Problem-solving

RICHARD University

on a real data-base*

BYRNE of St. Andrews

Abstruct Plumzing the menu for a dinrler party, although a familiar and practiced task, involves problem-solvirlg with a large und complex body, of krlowledge. /t is here used to study the everyda), operation of humat mcmor?,. Verbal protocol analJ?sis, a technique devised to investigate formal pmblern-solvirzg, is examined theoretically and adapted for analysis of this tusk. The stud), shows the need for a number of mentul structures and their associuted control processes, only some of which huve previousl~~ been proposed irl ps.vcI1ology.

Introduction The form in which knowledge is represented in the mind has wide implications for the use of that knowledge in daily life. Ideally, we would like to discover what information is stored, in what format, and what mechanisms can search, interpret and use this information. The task of deciding a representation for a person’s world knowledge has been called “the most fundamental problem confronting cognitive psychology today” (Anderson and Bower, 1973), and can be tackled in two broadly different ways: the neat and the messy. The neat approach relies on study of a situation which is limited in some respects, in the hope that explanations can eventually be extended to the full complexity of everyday life. The extensive “semantic memory” litera-

*The experimental work was carried out while the author was at the MRC Applied Psychology Unit in Cambridge, and the financial support of the Medical Research Council is gratefully acknowledged. I would like to thank John Morton for valuable discussion and comments during the preparation of this paper.

288

Richard Byrtte

ture (see Griggs. 19761, is an example of such an approach, in which a natural area of knowledge is studied by unnatural, laboratory tasks. Another is the “blocks world” of Winograd (1972) and others. in which a highly simplified, artificial domain is treated as the substrate for relatively natural conversation. Representations proposed as a result of such work were initially rather free and flexible, for example networks of propositions whose structure was largely governed by the learning process (Collins and Quillian, 1969. 1972). Recently, more rigid structures, related in some ways to lin1965) and “case frames” (Fillmore, guistic “deep structures” (Chomsky. 1968), have been explored in detail (Anderson and Bower, 1973: Norman and Rumelhart, 1975). The current trend is towards representations with a rigid and highly stereotyped structure, usually known as “frames” (Minsky, 1974; Neisser. 1976). Typically, ideas have been originally inspired by developments in artificial intelligence or linguistics and the work has been largely “conceptually driven” (in the sense of Bobrow and Norman, 1975). which may be unwise when rcscarch is still at an exploratory stage. Often, the original concern has been with natural language understanding and so there has been a strong linguistic bias. Adequate experimental testing of models has been a major problem, and tests between altemativcs have therefore been “late” in the process of model construction (c.g.. Thorndyke and Bower, 1974). In the light of these drawbacks, the present study approaches the same problem in a quite different and apparently messier way. Both the area of knowledge studied (cuisine) and the experimental task (planning a meal for a dinner party) are relatively natural and everyday matters. The aim is to provide empirical information on the representation and use of knowledge. of memory which can then be used for early guidance in construction models, and so enable theory to be partly “data driven”. Evidently a quite different research method to those normally employed in this area will be necessary, and the one which has been used is verbal protocol analysis. This method is perhaps unfamiliar, as previously it has chiefly been used for analysis of problem solving in artificial, logically structured domains. Therefore, before detailing the experiment proper, some discussion will be devoted to what is meant by verbal protocol analysis and how it can be used to study the problem of interest here.

I. Protocol

Analysis

Extensive studies of thinking were carried out with the method called “systematic introspection” in German psychological laboratories around the turn

Planning meals

289

of the century. The German psychologists believed it crucial to separate the subjects’ two activities of thinking and reporting, so they instructed subjects only to report after their task was complete. Since the reliability and accuracy of what subjects say they are thinking is likely to decrease with time after the event, they were forced to restrict themselves to short and simple tasks. The protocols so generated were often supposed to reflect the ‘real’ mental events directly and completely, and the emphasis was on the content of the utterances, which was held to ‘explain’ the processes of thinking. This approach has been in disrepute for so long now that there is no point in listing its inadequacies. Note that the only way to extend this method to study complex tasks which require some time to complete would be by repeated interruptions for reporting. This would so obviously disrupt the process under study that no-one has even tried it out. Duncker (1945) developed a new technique, “protocol analysis”. The subject is given a task which takes some time to complete, and asked to “think aloud” while he worked. In Duncker’s words, “while the introspecter makes himself-as-thinking the object of his attention, the subject who is thinking aloud remains immediately directed to the problem, so to speak assumes that to allowing his activity to become verbal”. This optimistically translate between thoughts and utterances is straightforward for the subject, and requires no effort. Neither of these hopes is generally true, and the problems will be considered below. The method was used and developed extensively by de Groot (1965) in his study of chess ability, and employed by A. Newell, H. A. Simon and their co-workers (summarized in Newell and Simon, 1972), and by Quillian (1966) and others. Protocol analysis has a number of advantages over systematic introspection, although problems remain. Instead of regarding subjects’ utterances as an ‘explanation’ of overt behavior, the record of these utterances is treated as another form of data which an adequate theory of the process must account for. Protocol analysis is therefore closely related to the observational techniques used in ethology, where a blow-by-blow record of the behavior of an animal or group of animals is used as the basic form of data. Such a detailed record of behavior is liable to be a stringent test of any theory. Thinking aloud minimizes the danger of confusion of past and present states of knowledge, which means that it need not be restricted to tasks of short duration, and highly complex tasks can be analyzed. Similarly, rationalisation and editing of behavior is likely to be far less common in thinking aloud than in introspection. These advantages are merely relative since the different rates of thought and speech mean that the same problems are present in reduced form in protocol analysis. Perhaps more importantly, access to the exact temporal sequence and pause structure is possible. For construction of or

comparison with an information-processing model of a process, it is important to have a close approximation to the sequence and timing of stages in the process. While not every pause can ever be interpreted sensibly, they should at least be preserved and made available to the theorist (although the present study seems to be the first in which systematic use of timing information is made). However, interpretation of protocols is by no means straightforward. Assuming that mental processes do not occur ‘in words’ they must be translated into words during thinking aloud. Some mental events are notoriously difficult to describe in words, and the words even then liable to be misunderstood. It is plausible that if a subject has difficulty putting his thought into words, the experimenter will have difficulty grasping the intended meaning of the words and confusion will result. Protocol analysis should therefore be restricted to cases where the subject finds it easy to verbalize. Presumably, the very best guarantee of this is when subjects would rather think aloud than not, as in the present studies. Very much the same recommendation applies to another problem of the method, that of the effects of reporting on the primary task. Evidently the main process will be affected: at the least the available capacity must be reduced. No effects of the subsidiary task have so far been demonstrated on the quality (Newell and Simon, 1972, p. 478) or rate of thinking (Danserau and Gregg, 1966), but it has been claimed that thought is made more organized and less intuitive (de Groot, 1965, p. 83). In any case, provided a task is used to which thinking aloud is a natural accompaniment, the behavior seems worthy of investigation in its own right, even if it differs somewhat from silent thought. Although a protocol can sometimes be used to show that mental processes are performed in series, the parallel nature of processing cannot be proved from a protocol. As this asymmetry seems odd at first sight, the point seems worthy of elaboration, although Duncker (1945, p. 4) was probably aware of it when he stated “a protocol is relatively reliable only for what it positively contains but not for that which it omits”. Consider a very simple program which uses a memory store to find threecourse meals. Each dish has a number of ‘features’ attached to it in the store: LIGHT, TASTY, HOT, etc. When the program is called, two features (say, A and B) can be specified, and the program will assemble a meal each course of which is known to possess the features A and B. The program operates according to the flow chart of Figure 1. The program is designed to produce ‘protocols’ of what it is doing, by printing out its current state of knowledge at various points. What these points will be is governed by its ‘comment level’ (C-level)*. The higher the comment level, the more explicit and detailed is “This concept

was first termed

“introspection

level” (Morton

and Byrne,

1975).

Planning meals 29 1

Figure 1.

An informal representation of a program to find a three course meal, each course of which has features A and B.

the program’s description of what it is doing. Each transition between operations in the program has a number attached, which may be read as an instruction to “print out the current state of knowledge if the C-level is N or less”. For example, if the test “X is A ?” is successful and the C-level is 3, then the program would print “well, X is certainly A . ..” or something similar. If the program were called to find a light, hot meal and the C-level set to 3, the complete protocol would show the true structure of the process: “We need a meal which is light and hot .., how about tomato soup for starter ... well, tomato soup is light ... and it is hot ... tomato soup is light and hot, have that .. . then perhaps cheese salad for the main course .. . cheese salad is light . ..

292

Richard Byrne

but it isn’t hot . . . not cheese salad, it isn’t light and hot ... try fish fingers instead .. . fish fingers are light . . . and they’re hot... yes, fish fingers are fine . maybe stewed apple for pudding .. . apple is light . .. and stewed apple is hot stewed apple is fine . . . let’s have tomato soup, fish fingers and stewed apple”

However, with C-level 2 some of the intermediate steps (those inset twice, above) would be missing and a naive observer might conclude that features of dishes were checked in parallel. With C-level 1 only two utterances would remain. as if all three courses were chosen in parallel. In general, apparent parallel processing in a protocol may always be due to high C-level. Assuming that subjects are in some ways similar to our program raises some interesting questions. Are there individual differences in C-level? If so, some subjects’ protocols will be much more helpful to an experimenter than others, even if their mental functioning is identical. De Groot (1965, p. 380) recommends “selecting subjects who verbalize easily” for collection of protocols, which implies that this is so. Do subjects’ C-levels vary with different tasks? Do they change with time, or with practice on a task? It is often asserted that those most excellent at some task are least able to explain the skills involved and understand the problems of the novice. A rise of C-level with competence could account for this. Particularly of interest to an experimenter is the issue of whether subjects have voluntary, conscious control over their C-level (beyond the ability to not think-aloud at all). This is still an open question.

2. Details of method Subjects Subjects were members of the MRC Applied Psychology Unit subject panel, taking part on a paid volunteer basis, or members of staff at the Unit. Six women and four men took part, aged between 23 and 40. The only criterion of selection was that they should enjoy, and be fairly competent at, cooking and entertaining.

Planning meals 293

Procedure

Subjects were seated in a quiet room and all verbalizations were recorded on a Grundig Stenorette dictaphone, placed in an inconspicuous position. The experimenter was present throughout each session, seated as for normal conversation with the subject*. Instructions were written, and divided into two parts. Subjects had unlimited time to read the background information, which gave the general orientation. Since in pilot trials no substantive differences in protocol were found between instructions to “think aloud throughout the task, and try to give reasons for your decisions” and “you may mutter or talk aloud if you like, and need not give reasons for any decisions you take”, the latter version was used in the experiments. The protocols generated are therefore a close approximation to subjects’ normal behavior under the particular task. The background instructions also asked subjects not to ask questions once the experimental session began, but in fact specific questions were usually answered to save subjects setting unknown ‘default’ values instead. When subjects were satisfied with the background, the written details of the first task were given and the session began, with the next task set as soon as each one was completed. Transcription

All protocols were transcribed by the experimenter alone and subjects did not see the transcriptions. Although de Groot used and recommends collaboration with the subject during transcription (or the subject transcribing his own protocol) and even changing the text to improve the clarity, this was not done. The danger of unwanted importations and in general changing the status of the data from protocol to retrospective introspection seemed too great. Great care was taken to decipher each utterance exactly and this often necessitated multiple runs through the tape (transcription took over six times as long as the original recording). Pauses were timed by ear using a stopwatch, and were transcribed with the following notation: 3

...

pause of less than 1 set, or a segmental pause of 1 - 2 set

pause

*Although the presence of the experimenter introduced some slight risk of artifacts, it was desirable for two reasons. Firstly, in pilot trials, subjects found it unnatural and difficult to “talk to an empty room”. Secondly, de Groot (1965, p. 380) warns against the danger that “if a subject is reporting “to a tape” and not directly to an experimenter who is attentively trying to grasp his meaning, he is likely “tape protocols are often less clear to spend less effort in making himself understood” and therefore and informative than hand written protocols”.

294

Richard Byrne

. . . (Xs) .. .

pause of X set (X greater than 2) to the nearest set

In fact only the pauses of one second or more are ever used in analyses, and their relative lengths are not compared in detail, so the lack of precision is unimportant. Each task is timed from the moment the subject is given the instructions to the end (marked with the “@” symbol), and times given in seconds. For convenience of display, each protocol is broken into chunks separated by one or more seconds of pausing. Task

To use protocol analysis in order to study the normal operation of an interesting segment of the human memory system, a task is needed which meets at least the following requirements: (i) it must take sufficiently long to complete that thinking aloud during its execution is a natural behavior (ii) a large and complex body of information should be already available to the subjects, and search of this body should be a part of the task (iii) it should be an ordinary and everyday one to the subjects, to tap the normal use of the system and avoid ‘laboratory specific’ strategies. Tasks of the form “find a meal for a particular occasion” satisfy all these demands. The instructions given were designed to match the hardest problem in this domain normally met by the subjects: Task Task Task Task Task Task

1 2 3 4 5 6

A three-course meal suitable for a dinner party A three-course meal suitable for a dinner party with the same guests The same again The same again The same again The same again

Each task of the sequence of the previous one.

of six was presented

3. Introduction

of Analysis

to Procedure

in writing on completion

Analysis of the ten subjects’ protocols will begin by isolating recurrent patterns of behavior. As well as defining any pattern objectively, it is important to discover its function in the overall process and to discover whether it is general across subjects, and what mental mechanisms its use necessitates. All the protocols were analyzed to the same level of detail; protocols were not selected in any way. Some patterns are extremely common in the data,

Planning meals 295

and where this is so the data is summarized. Otherwise, all utterances are reported; this seems essential if the account is not to be biassed. However, for ease of exposition it is convenient to present the analyses around the framework of a single protocol where this is possible. The protocol of S3 was chosen for detailed explanation of its analysis, as it is quite long but reasonably clear. Starting from a fragment of this protocol, we will introduce the method of analysis, and build up a graphical formalism for gaining an easy and rapid over-view. The formalism is intended as an aid to analysis, to facilitate the search for important information by summarizing a good deal of data in a visual form. It is not intended as a model of the process in any sense. In section 4, the patterns of behavior will be analyzed in detail, reporting data of all the protocols, and their implications worked out where possible. The aim will be to discover what type of information is stored, the formats of various types of information, and what devices are used to interpret and use this information.

A graphical

display of the commonest

behavior

patterns

Consider a fragment of protocol in S3’s first task, concerned with choice of a main course (abbreviated S3, 1:2 and segmented by long pauses): 12 13 14 15 16 17 18 19

then for the second course you’d need a main dish of .. . a roast meat, say roast beefif we’re going to be extravagant .. . and then trimmings to go with that, which would be .. . potatoes, roast potatoes, and a nice veg, depending on the time of year . .. but at this time of year might be .. . cauliflower, for example . . . (3s) .. . perhaps another kind of veg as well, two different kinds . .. say carrots, that’s something of a different color, and nice taste, and some gravy .. .

A succesion of noun phrases occurs, each of which refers to a (possible) part of the final answer to the question. If the task had been simply to “write down your answer when you are satisfied with it”, S3’s answer would have presumably included: roast beeJ roast potatoes, cauliflower, carrots, gravy. But in the protocol, other terms are found which could have been, but in fact weren’t, in the final answer: the second course, a main dish, a roast meat, trimmings to go with that, potatoes, a nice veg, etc. Call both these classes of names “OBJECTS”. Then we can view the protocol as a series of “TRANSITIONS” between objects, caused by mental operations. These operations may well be complex, and concern more than a pair of objects. Furthermore, it is likely that the objects explicit in the protocol correspond only to a subset of the complete chain of ‘mental objects’ which we presume S3 has considered.

Can we see any simple patterns in the objects and transitions of this fragment? In a number of cases, a pair of adjacent objects are semantically related as set and subset: a main dish a roast meat potatoes a nice veg another kind of veg

--f + + + --f

a roast meat roast beef roast potatoes cauliflower carrots

That is, in each case we can say that object (xl is a kind of object (x ~ I). (“Another kind of X” is of course a more complex entity than the others, and will be considered again later). In each case, two objects are related to each other semantically. Call this pattern the REPLACE transition. Another pattern visible in this fragment is a (1 : many) transition. Here, more than one object which are not necessarily adjacent to each other make up the components of a previously mentioned object: the second course trimmings to go with that

--f a main dish & trimmings to go with that -j potatoes & a nice veg & another kind of veg & gravy

That is, in each case we can say that the objects fx), (x + i), (x + jl, .., etc together make up all the parts of object (x ~ I) (1 < i < j . .. etc). Call this pattern the EXPAND transition. A third simple pattern which recurs in this short fragment is where an object is both unrelated to any subsequent object in a simple way, and makes up a part of the final answer. Call this pattern the STOP transition. Note that the transition names are intended only as a shorthand, and the significance of the transitions is yet to be glimpsed. These three patterns occur throughout the ten protocols, and make up the great bulk of the simpler ones. They form the basis of the graphical representation which was devised during the analyses to show visually the overall structure of the solution path, and guide more detailed anlyses. In this “object transition graph” (OTG), each node corresponds to the currentlyheld object, and transitions are represented by vertical lines, branched in the case of EXPAND. The way in which this fragment is coded can be seen by consulting the corresponding part of Figure 2. The full key to OTG notation will now be given to allow reference to OTGs during analyses, but note that some of the patterns coded will be obscure until the analyses are complete.

Planning meals

297

Key to Object Transition Graphs Nodes for objects which are inferred to be held mentally by subjects are normally labelled with the protocol utterance which corresponds to them (e.g. “roast beef”), sometimes shortened by omission of irrelevant words (signified by #). Where no such utterance is present (i.e., the object has been inferred), a system of abbreviations is used to label nodes:

Mi Ti Cl C2 c3 P C V L

main part of course i trimmings to course i starter or first course main or second course pudding or third course main or protein-containing part of C2 filling or carbohydrate-based part of C2* vegetable or salad part of C2 liquid or sauce part of C2

Where a single object (X) is substituted for object (X - 1) and (X) is a member of the set named by (X - l), we have a REPLACE transition, coded: X-l

Where more than one object (X), (X + i), (X + j), . . . (X + N) are substituted for an object (X - 1) and, taken together, make up the whole of it, we have an EXPAND transition. This is coded, for a 1 : 3 transition: X-l @ X

i X+i

$ X+j

The $I records the fact that no final “review for adequacy” a review does occur, it is coded:

is made. If such

*Note that in French domestic cuisine no distinction is made between C and V. So, “What shall we though bizarre in London, would not have for a vegetable: leeks, potatoes or Jerusalem artichokes”, be strange in Paris (J. Morton, personal communication). The normal practice of serving a green satad after the main course also changes the status of any V segment of the main course.

I

*someth~nq start with

4

for

1“three

to

a dinner

course

Figure 2.

“the

1

suitable

middle

+

party”

meal

“start

with

I

+

S3jsix meals/OTC

“the second course (8

4

I

$

(

?I\/

“apple pie I3

pastry”

of

sort

” of

dessert”

0

4

A

fgcream”

i-p!jj f 5

4

“Some sort

“any

“dessert

main course”

J. “mutton curry”

‘meat curry”

P

81main course I’

4

X l-5

i “rice and cucumber salad with yoqhurt’

J

4

r’chut&y c pickle*’

(n$>“,h;~~mi’,~eo~rn$)

“usual side dishes and soon ”

+ ‘*a made up dish of some sort rather than a plain roast

“the

somfethinq different

I

three course meal for another dinner party with the fame quests’

2 “another

b

SD

,121 TI

with”

4

d E!l

” a light

very

salad ”

” somethinq light”

4 “mixed vegetable soup 1’

o( 4 “some sort of veqetable SOUP”

4 MI 4 “soup”

G “to start

“fruit

salad ”

M3

sL

”

“would probably be a suitable dessert”

“a che(secake

= a cake of some sort ’

0

300

Richard Byrtze

4 (meal) “toM!!

“to

.%mLng

fir

yj$&rJrj

cooked *‘somethin a little more solid ”

“i’hW T2

i “a

4

firh” 4 “sole

‘sole cooked with white Cld wine I, mushrooms c cream2 sole au champaqne

‘I

5 (meal) dish”

“somethin like

duck

A

4 “roast

duck” 4

(,,..t

duck)“stuffed

with prune

apple and I, served with

oranqe oranpe

sauce and salad”

an

b (mea1)

“some~!irq

different

*adifferent

kind

4 ‘pork

chops” G

4

$0

what

TI i. “platn potatoes”

1’

4

“main

P

“som!thinqQ completely different

aqain”

of meat”

,,a si,.,

L

rather than a hot veqetable ”

“a

’ selection

of

cold meat, sausages and pate G thinqs like that”

+ “bread

had L

rolas+ I4

we’ve before”

4 .-a fruit salad

I’

to

Planning meal

“verbatim ” X

X+i

30 1

text”

X+j

The STOP transition is more complex than we have so far discussed. Objects may be accepted and form part of the final answer, or be rejected and entirely discarded, or be held temporarily and judgement deferred. These are represented as:

0

r;!

6l

If an object is rejected or judgement deferred, the original current object may be reinstated, and another REPLACE transition applied. We represent this by the convention: “object” 4

“subsetr”

& “subsetz”

In the case of a temporarily held object, a final acceptance or rejection may be made in the light of subsequent evaluation of alternatives: “object”

“subsetr”

“subsetz”

Sometimes (a) queries are addressed to E by S, or (b) E prompts or corrects S when he deviates from the intended task instructions. These interactions are represented as follows: (a) b-1

(b) 4”verbatim

text” 1

302

Richard Byrne

Before moving to the analysis proper, the ‘analysis’ of an artificial protocol may help to show how these notations are applied. The OTG of Figure 3 gives the coding of the following ‘protocol’: a nice meal . . . well let’s have a Chinese meal for the main course, how about chop suey . .. no, don’t like that, have chow mien instead .. . E:

more detail chicken chow mien . .. then for pudding, we’ll have mango . would that be enough? No, have a starter . . . have soup .. . perhaps a thick soup . .. or a consomme .. . yes, not thick soup . . . and with it, have bread .. .

Use of pause data in final object segmentation So far we have treated the identification of “final objects” as straightforward; they are simply “objects” which would form part of a minimal final answer to the task. But consider the following (artificial) protocol fragments: (a)

“Let’s have what I had at my last dinner party . . . which was . .. roast beef, roast potatoes, and brussels sprouts”.

(b)

“We need some kind of meat . . . say, roast beef . . . and potatoes to go with that . . . roast potatoes and some vegetable . . . say, brussels sprouts”.

We would argue that these fragments reflect quite different processes, and wish to code them to show this difference clearly. Fragment (a) represents the retrieval of a single ‘chunk’ of information, and its description verbally, while (b) the construction in three separate retrieval steps, of a complex entity. Using the preliminary definition of a final object, their OTGs would be almost identical, so another rule is added to the definition. To be coded as “a final object”, a noun phrase must be separated from other final objects by either: any pause of one second or more g;, any interspersed comment, apart from filler sounds (“urn”, “er”, etc.) and conjunctions “and”, “with”, etc.). This arbitrary cut-off will not of course guarantee error free coding of protocols. It simply assumes that mental processing, including memory

Planning meals

303

OTC coding of an artificial protocol.

Figure 3. “0 nice meal ” I

OTG codings of two artificial protocol fragments, illustrating use of pause data in segmenting final objects.

Figure 4.

(4

MEAL

@I

MEAL

I ‘roost roast

beef, potatoes

‘some kind of meat’

I ‘some vegetable”

I

I “roast

I “potatoes”

beef”

‘roast potatoes”

I ” brussels

access, takes time, and so two items are more likely to have resulted from two retrievals if they are more separated in time from each other. Sometimes we will inevitably interpret protocols wrongly, but the cut-off is at least objective and so can serve as a basis for statistical treatment. Using this rule for identification of final objects, the artificial protocols (a) and (b) are coded quite differently (see Figure 4). In (a), a single object (corresponding to “roast beef, roast potatoes and brussels sprouts”) results from a REPLACE transition applied to C2 (the main course). In (b), EXPAND is applied to C2, resulting in three objects, which are separately subject to REPLACE, so three distinct tinal objects result. Taking the protocol of S3 (given in full in the Appendix), we can identify “final objects” by round brackets: 1: 1 1:2 1:3 2: 1

(pate), #, #(toast and lemon) (roast beef) # .. . # . . . # (roast potatoes), # . . . # . .. # . . . (cauliflower) (carrots), #(gravy) ... # . .. (3s) .. . #(a salad) (apple pie) .. . (and cream) (mixed vegetable soup)

.. . (3s) .. . # . . .

304

Richard Bvne

2:2 213 3:l 3:2

(chicken casserole RECIPE) . .. (and, or, boiled potatoes) (cheesecake) (a light salad) (a mutton curry) ... (3s) . .. #(rice and cucumber salad with yogurt) # (and chutney and pickle) (ice cream) (a selection of cold meat sausages, and pate and things like that) ... (with bread or toast) (sole au Champagne) # .. . # (with plain potatoes, and or) .. . (4s) . (with a salad) (a fruit salad) (a light soup) (roast duck RECIPE) . (3s) . . . (and roast potatoes) (a chocolate mousse) (oxtail) (pork chops RECIPE) .. . # .. . (and potatoes and) .. . # .. . (brussels sprouts) (a trifle)

3:3 4:l 4:2 4:3 5:l 512 5:3 6:l 6:2 6:3

[Note: # represents the omission of interspersed comment, and RECIPE the omission of a lengthy description of a single object.] This segmentation was used in constructing the OTC of subject 3 on the task, shown in Figure 2.

4, Analysis The REPLACE and STOP transitions It is logically possible to answer the first of the series of six sub-tasks simply by retrieving an instance of “a three-course meal suitable for a dinner party” from memory. In terms of an OTG, this occurs as a REPLACE transition applied at the level of meal. That this is an option for subjects is shown by S4, who begins: (S4, task-l)

oh well, I’m going to give you the menu I gave at my last dinner party (laughter) .. . it was (etc.)

She then describes the three courses in order. Using the segmentation by pauses procedure, the entire description emerges as a single final object, and in a follow-up it was confirmed by guests at the occasion that the menu was correct. The only other answer, among the set of sixty produced, which is not clearly segmented into three courses by pauses is the second given by S4: (S4, task-2)

urn, oh well obviously we want everything quite different oh let’s have a Christmas dinner! . .. we’ll have a clear soup, urn, consomm6, and then roast

. .. (9s) . . .

Planning meals

305

turkey with stuffing, roast potatoes, sprouts and chestnuts, bread sauce and cranberry sauce and Christmas pudding, with brandy butter Evidently REPLACE is applied to “a Christmas dinner”, a strategy which ensures a sufficiently contrasted meal (see “Avoidance of Repetition”, below). More typically, REPLACE is applied at the level of courses and parts of courses, and the transition is common. The longest chain found in the protocols is from subject S2, task-4, where we see three successive REPLACE transitions: (S2,4:1)

think I might go back to the soup, this time, and we could have thick soup ... a cream soup ... of celery ...

Explicit references to courses and parts of courses are often omitted (and abbreviated labels used on OTG coding) where the objects can be easily inferred. This results in an underestimation of the lengths of chains of REPLACE transitions. Table 1 only gives the frequencies of clear REPLACE transitions, where both objects are explicit and neither are names of courses. It is thus an underestimation of the frequency and prominence of this behavior. Table 1.

Explicit occurrences of N-step chains of REPLACE transitions Number

of steps

2

1 Sl s2 s3 s4

_

_

6 11 2

_

s5 S6 s7 S8

2 1 14 2

s9 SlO

9 _

3 _ 1 _ _

1 _ 2

_ _ _

5 7

6 2

1 _

50

20

2

_

The behavior we have called the REPLACE transition has a number of implications. Most obviously, direct access must be present in memory to any item’s ‘subsets”: given any item’s name, we must be able to treat it as a category and have access to some members of the category.

306

Richard B_vrrw

More interestingly, the chained application of REPLACE implies the existence of a “stop rule” other than one based on “value” (when a termination would result in rejection). Each time the transition is applied, the result is clearly more specific and less vague. The most straightforward stop rule would therefore be based on the level of specificity of the current object. For example, “celery cream soup” is at a much lower level than “soup”. This hypothesis is confirmed by a number of occurrences of setting and changing the stopping level, for example: (S4, 1 :l)

(S6, 12) (S9, 1 :l)

it was, urn, started with a, prawn and lobster soup, there were, tins of Sainsbury’s lobster soup, to which I added some __ E : Tkat ‘sfine. Headirlgs arc fine That’s all? Oh. I see . . then the next course was (etc.) boned stuffed duck .. . do you want any more about that? something like a soup . . . no particular sort of soup .. . (etc.) E : sa_v? a vegetable soup .. .

In each case, the subsequent protocols show that the level prescribed or agreed to by the experimenter is adhered to by the subject. In any set/superset network, the level of a node can be computed relative to its neighboring sub- and supersets by using the subset relations themselves, but level computed in this way could not serve as a stop rule here. The reason is that although all the answers offered by any one subject are at about the same level of specificity, they are derived by a varying number of REPLACE transitions. Therefore it must be concluded that the levels of objects must be represented in memory independently of their structural position in the network of information. Although a representation of an item’s level must be available, it need only be relative to semantically close items. This seems likely on intuitive grounds: while it is easy to decide the relative levels of semantically close items (e.g., Tate and Lyle sugar, roast meat, gazpacho) this is not true if they are unrelated (e.g., meat, America, chair). Independent representation of semantic level does not seem to have been suggested before as necessary for memory. A third implication also follows from the chaining of REPLACE transitions, to do with subject’s control processes in searching memory. The facility of either iteration or recursion is evidently necessary, and since the stop rule requires that the level of any current object be looked up, recursion would

Planning meals

307

be impossible*. While the iteration facility is not often explicitly mentioned for human memory, this is presumably because it is self-evident. The vast majority of STOP transitions in the protocols are acceptances. Rejection of an object, and consequently back-up, is rare, which contrasts sharply with protocols of subjects solving mathematical puzzles, logic and chess problems, etc. (see Newell and Simon, 1972). Little information is therefore available on the evaluation of adequacy (as opposed to level) of objects. Of twentyane rejections in the series of protocols, five are clearly due to the items having been used before (these are discussed below with other aspects of repetition avoidance) and two more are unexplained (including “a stewed type of dish” in S3,2: 2). S7 rejects bouillabaisse for 3:l since “not everybody fancies the amount of - that amount of olive oil”, curry for 4:2 since “you’d have to ask them if they were going to have curry, because not everybody likes curry” and Chile con came for 5:2 for the same reason. How is it that these facts are made available at just the right moment to influence processing and so avoid choices which in practice might be disastrous? Do we assume that subjects (or at least S7) check each item before final acceptance for any property which might prove problematic? Or that certain specially ‘marked’ items interrupt processing or locally capture control to ensure a disastrous error (perhaps made and regretted in the past) does not occur? There is no way of telling at this point. It must at least be possible to represent ~12 arbitrary range of properties of items, but this of course is basic to any model of memory. The remaining nine cases of rejection occur as part of another behavior pattern which is again restricted to S7. We can call this the “compare-andchoose” pattern, as in it several possible answers to a part of the problem are retrieved (by repeated application of REPLACE, in our terms) and then either compared and the best selected, or else accepted en bloc and no choice made. For example: (S7,5:2)

(S7, 1:3)

oh, I know, you could have boiled .. . er boiled bacon or boiled ham, or, or as you can go back

to the boiled beef and ... boiled beef and, your traditional boiled dinner (etc.) therefore, I like making pastry so I’d probably make, urn, apple tart, flan, or a lemon meringue pie of some sort .. .

The items held temporarily for comparison or joint evaluation are shown in Table 2. Some form of ‘examination buffer’ of capacity at least three *Recursion of REPLACE would produce expressions of the form SUBSET (SUBSET(SUBSET X)) which would only be evaluated when the stop rule was activated. No rule based on what the expression will evaluate to is therefore possible.

30% Richard Byrne

Table 2.

Items involved in the compare-and-choose pattern of S7 (italics denotes the single item, if any, accepted) Course 1:3 2:l 2:3 213 4:1 412 5:l 512 6:l 612 6:3

items apple tart a choivder pears bourguignon spaghetti bolognesc a chowder a green veg mussels moulinairc (sic) boiled bacon all kinds of salad steak and kidney pie mincemeat

flan a minertronc peach melba canelloni a creamed soup il salad smoked salmon boiled heel scotch woodcock chicken casserole lemon meringue pie

lemon meringue

scotch

pie

hro th

pritd maison chicken pie

items, is required to allow this behavior to occur, since for comparison to take place they must have been held in the same memory location. This store may well be responsible for some of the characteristics ascribed to “shortterm store” in the literature, and if so would correspond to the executive component of “working memory” (Baddeley and Hitch, 1974) rather than to the “response buffer” (Morton, 1970). The pattern also shows that S7’s evaluation rule is not absolute but relative to the difficultly of the situation. Unless an item is strikingly suitable, S7 continues searching so that the standard of judgement depends on what else can be retrieved. This closely parallels the evaluation rule de Groot (1965, pp. 257 - 260) finds in chess masters, where in his terms the “expectancy ” is lowered if the situation is difficult. Unlike a structural requirement like a memory store, which if present in one is presumably present in all subjects, this evaluation rule may be specific to S7 for this task. The EXPAND trunsition Apart from the two exceptions noted above, subjects regularly use the EXPAND transition to segment each sub-task into sections corresponding to courses of the meals. It is also used, at times, to segment each course into yet smaller sections. Identification of the transition does not depend on the presence of explicit utterances corresponding to every part of the expansion. Thus, in the protocol of S3, while in C2 we find explicitly “a main dish” (M2) and “trimmings to go with that” (T2), in Cl no utterances corresponding to Ml and Tl are found. But, by consulting the final objects isolated in Section 3, we

Planning meals

309

find that Cl equally has two sections, and can infer the mentally held objects M 1 and Tl (see OTC, Fig. 2) Using both the presence of an utterance corresponding to each part, and presence of a final object evidently resulting from a part of an expansion, we can construct OTC representations of each of S3’s six sub-tasks. Using brackets to represent hierarchical levels and the abbreviations introduced in Section 3, we find five different control structures (ignoring for the moment variation in order of parts): ((P ((P ((P ((P ((P

(C V L)) (Ml Tl) (M3 T3))* (C V)) C 1 C3) C V) (Ml Tl) C3) c V) Cl C3) C) c I C3)

meal 1 meal 3 meal 4 meal 6 meals 2 and 5

It cannot be that S3’s mental representation of the task changed five times during the experimental session, and that his view of an adequate dinner party meal changed completely. In any case, such variation is not reflected in the final answers, all of which appear about equally satisfactory. Far more parsimonious is to assume that S3 used the same representation throughout. For reasons discussed below, it is most economical to assume that this underlying representation is at least as elaborate as the most complex structure visible in the protocols, and this is what has been done in the OTG codings of Fig. 2. The economy of inferring a_more elaborate structure is well illustrated by S9. In 1: 2 he explicitly mentions “two or three vegetables” before going on to choose, in fact, two kinds of vegetable. No equivalent utterance occurs again, but in 2:2 he chooses two and in 3:2 three kinds of vegetable. Again it is argued that inferring an intermediate mental object “two or three vegetables” in these two protocols is justified by parsimony. It seems that S9 has simply omitted the superordinate from his verbal output, and the same applies to “trimmings to go with (M2)” in S3’s meals 2, 4, 5 and 6 and “main course” in his meal 6. Why this kind of change, equivalent to a rise in C-level, should occur with practice is not clear. Assuming that we are justified, and that S3 uses a representation at least as complex as ((P (C V L) (M 1 T 1) (M3 T3)) throughout, another effect must be occurring. Some of these parts, in some sub-tasks, do not correspond to *The symbols P and M2 are equivalent. Alternatively, instance, the control structure of task-l would become: MEAL + c2 Cl c3 + c2 M2 T2 Cl -+ Ml Tl + c3 M3 T3 + T2 CVL

rewrite

rule notation

could be used, when,

for

3 10

Richard Byrne

any segment of the final answer. Where are the missing components? To appreciate the reason for this, consider the content of some of these answers. In 2:2, parts V and L are coded as null on the OTC; but P is “a chicken casserole with lots of veg in the casserole”, so extra vegetables or sauce would be inappropriate. Similarly with mutton curry, sole cooked with white wine, duck served with orange sauce and pork chops with onion sauce; in each of these cases, no L segment occurs in the final answer. In 5:2, no V segment occurs: but with duck “stuffed with apple and prunes and served with an orange sauce and an orange salad”, it would be unnecessary. This behavior is general, and is often caused by the intrinsic nature of the “main part”: “paella, chips and peas” would simply be wrong. On other occasions, items which would fill the omitted segment are present, but apparently retrieved as a single unit with another part, and coded as such on the OTC: (S2,2:2) (S4,5:2) (S5,3:2)

shoulder of pork, and roast potatoes kebbabs with orange sauce rice and urn chips and peas

(no C) (no C) (no V)

It is clear that the term “parts” is a misnomer. It is not that a main course must contain four separate components P, C, V and L. Rather the entire course must meet several restrictions, and components are added until this is true. If the first item retrieved happens to be adequate on all counts, a subject will terminate search. The term “goals” will henceforth be used for P, C, V and L, as it seems to capture more of the situation than does “parts”, and has no implication of “separable parts”. The symbol 8 which terminates goals which are not separately present should be read to mean that the goal is satisfied by items already chosen, not that that goal is not met. With an initially full list of goals, a central executive might search for an item to satisfy the first goal, then test for any other goals which that item happens to satisfy and delete them. Or control might be local to particular items, and an item might include an operator which (if the item were selected) would cancel from the list any goals which it satisfied. Although at some stage it may be important to contrast these options, it is not obvious how they would differ in resulting protocols and for the present they will be treated as equivalent. For consistency, all three courses have been treated alike in the way they have been coded on the OTG: a full set of goals is assumed initially present in each case, but some of these goals seldom become “overt” in separate items. The economy of inferring a more elaborate covert structure is justified for C2, but there is insufficient evidence to decide in the cases of. C 1 and C3, and ambiguity remains. For instance, another possibility is that the goal list would initially contain only minimal goals (e.g., for C2, perhaps P only) but that certain items could add goals to the list to remedy their deficiencies (so “roast beef” might add C, V and L). The goal list would

Planning meals

3 11

thus be amplified and reduced during processing, and no separate representation of the entire goal structure need exist. Yet subjects do appear to possess just such a representation when they sometimes review the adequacy of the answers so far assembled, for example: (S3,l :l) (S6,2:2) (S9,1:2)

that would be satisfactory that would be all I think that’s all for that

for the first course

Each utterance occurs after the course to which it refers is apparently complete. Assuming that a full goal list is initially retrieved, and goals are deleted during processing, allows these utterances to be sensibly interpreted as reflecting a final checking process, and this is what has been done (above). The evidence is not conclusive, but the simplest hypothesis is adopted for the moment. To mediate this behavior, subjects must have access to mental representations of some items in terms of the goals which an adequate exemplar must satisfy. We will call this an ABSTRACT DESCRIPTION of the item, and attempt to deduce its properties and implications of its use. What items in the series of protocols do subjects represent as Abstract Descriptions? A number of different structures are shown by subjects: (Cl c2 C3) (Cl (M2 T2) C3) (Cl (P c V) C3) (Cl (I? C V) (M3 T3) ((M 1 Tl) (P C V) C3) (Cl (P (C V)) C3) ((Ml Tl) (P (C V)) (M3 T3)

Using rewrite rule notation, MEAL Cl c2 c2 T2 c3

+ + + + -+ +

Cl C2 C3 Ml Tl M2T2 PCV cv M3 T3

S8 Sl S4,S5,SlO S6 s2 s7 s3, s9

we can summarize

the expansions*

used:

(all subjects) (S2, s3, S9) (Sl,S3,S7,S9) (S2, S4, S5, S6, SlO) (S3, s7, S9) (S3, S6, S9)

Note that, as we have already seen for S3, the different overt representations of C2 can both arise from a single underlying structure since M2 and P are equivalent. In fact, all the varied manifestations can be viewed as resulting

*In addition, S6 twice expands C2 into FOOD and DRINK, then FOOD into P, C and V; V is interpreted as “one or two vegetables” by S3, S6 and SlO and as “two or three vegetables” by S9; S3, as already noted, has an additional goal L in his representation of C2.

3 12

RichmY Byrne

from a single complex representation, amounts of the structure:

with each subject

showing different

((M 1 Tl> 0’ CC V)) (M3 T3)) When subjects have no need to use the Abstract Descriptions (as for instance S4 did not need even to EXPAND into courses in her meals 1 and 2), or interpret the task instructions weakly (set a relatively high stopping level), less structure is shown in the protocol. The fact that we can map all subjects’ structures onto a single representation presumably shows the extent to which the goals of “a dinner party meal” are agreed across subjects. Are the goals specified by Abstract Descriptions ordered? Superficially they appear to be in the protocol of S3, and this was found true in all subjects. The great frequency of expansion of C2 into its three components allowed statistical examination of the orderings. Consider first those cases where clear segments of protocol are found corresponding to P, C and V. Of the 27 such cases, 23 are order PCV, three are order PVC and one CPV. The probability of such regularity occurring by chance is of the order of lo- 14, so we can state that the goals are ordered, and that this order is the same across subjects. This is strengthened by examination of the cases where no separate C or V segment is found. Of the 58 cases of a single segment, in all but one case that segment corresponds to P (C occurred once). Finally, in the 15 cases where two segments are present, the first always corresponds to P. Note that here the first item retrieved from memory has happened to satisfy two of the three goals. There is no reason to suppose that it is more likely to satisfy C than V, and hence no reason to expect any regularity of the second segment. And in fact there is none: in 8 cases it is C, and in 7 it is V. So not only are goals ordered for any one subject, which might be an intrinsic property of an Abstract Description representation, but this order is the surrze forall subjects, which obviously could not be. The apparent cause of this embarrassment of regularity only becomes clear when we examine cases of deliberate chunges of goal order. The clearest instance of a subject changing the order in which goals are attempted occurs in the first sub-task of S3: 5 6 7

8 9

well something to start with first . .. logical ... (9s) ,.. and that would depend on what the rest of the meal is going to be, so we’ll think about what sort of meal it will be and then go back to the beginning . .. (19s) . .. well if we assume that the middle course will be something like roast meat . . . which is probably the most suitable thing, if we can afford it .. .

Planning meals

10 11

3 13

then you can start with anything you like . . . so if we started with pate, as the first course (etc.)

Although here S3 attempts the courses in the order (Cl C2 C3), in the remaining five sub-tasks he uses the order (C2 Cl C3). He has deliberately reordered the goals in his Abstract Description of “a dinner party meal”. The same occurs with S2, who uses the order (Cl C2 C3) up to meal-5, when we find : (S2, task-S)

curry, . .. definitely curry .. . (5s) . .. chicken curry with rice . and nothing to start with .. .

then in meal-6 she continues: (S2, task-6)

could be lamb this time . . . (13s) . .. salad again to start with .. . (4s) .. . and roast shoulder of lamb ... (3s) .. .

S7 also changes goal order, beginning with (C2 C3 Cl), trying (Cl C2 C3) in meals 2 and 3 but finally reverting to (C2 C3 Cl) for the remaining three sub-tasks. What is quite clear from these ‘reordering goals’ patter&is that the order in which goals are attempted can affect the task difficulty. At least for these three subjects, it is easier to choose C2 before Cl than to choose courses in the order of serving. Also, from line 7 of S3’s protocol, we can see that S3 encounters a representation of this fact as soon as he attempts Cl. We will call this representation a NEED SLOT, and can say that (NEED : value of C2) is attached to the memory node for Cl. When a Need Slot is encountered, the “need” must be remedied in some way before processing can continue. In this way, later disruption of processing is avoided. We will later encounter the use of Need Slots for different information, but in all cases the important point is that appropriate location of information in memory economizes on effort. We can now see a possible explanation for the great regularity of goal orders across subjects: some orders are easier than others, and since all the subjects are quite experienced (in daily life) with the task they may all be using the easiest order already. Cultural transmission of information as well as trial-and-error learning presumably contributes to this effect. In general, separate goals of an Abstract Description are not independent: consider the error of serving chicken soup, chicken casserole and rice pudding! This is true within a course as well as between courses (for example, consider curry, chips and Yorkshire pudding). Do our subjects perform this matching by independent choice and then subsequent checking (a), or by guiding each choice with the results of previous choices (b)?

3 14

Richard Byrne

CHOOSE A + a (b) CHOOSE A + a CHOOSE B + b CHOOSE B TO GO WITH a -+ b CHOOSE C -+ c CHOOSECTOGOWITHa&b + c CHECKa&b&c For matching between courses, the answer is clear. Numerous utterances show that (b) is the case, and that the outcomes of previous decisions are used in the current choice process. Perhaps most strikingly, when S4forgets an item chosen earlier, her search process is disrupted : (S4,S :3) now what did I say earlier? I’ve forgotten what I started with, bother ... (8s) .. .

(4

E:

blow ! soup.

I started with soup. What kind of soup? E:

I think it was mushroom. Urn, mushroom,yes, mushroom soup. OK, urn . . . and then I had . .. orange, so I don’t really want any more fruity ... Oh ! I’ll have a chocolate mousse.

Here it is evident that Cl and C2 (mushroom soup and kebbabs with orange sauce) are used in choice of C3. Table 3 gives other examples of this. All these cases could be handled by a simple system of rules, such as “if one course is HEAVY, it’s neighbour(s) must be LIGHT”, “if one course is WET, it’s neighbour(s) must DRY”, and “the SAME items should not occur in two different courses”. The rules used appear to be few, and the consequences of violating them are not grave due to the separation in time of the courses. There are no data to show whether information of the form light/heavy, dry/ wet is stored directly with certain dishes or computed at the time of choice. A different picture is shown in the case of within-course matching. It is obvious that the constraints within a single course are much tighter (consider how easy it is to think of ‘errors’ where the components would be absurd together). In the protocols there is only one instance of an error being made during the process with consequent back-up, and all the final answers seem excellent. Yet in the entire series of protocols, only five utterances refer to matching within a course and none of these gives detail of the mechanism: (S3,2:2) (S3,4:2) (S3,6:2) (S7,4:2) (SlO, 1:2)

boiled potatoes would go with that (chicken casserole) with a salad rather than a hot vegetable to go with that (sole) one or two kinds of veg to go with that (pork chops) perhaps a green veg trying to think of an interesting vegetable that goes with fish either roast - not roast because it’s (lamb chop is) not roast, no, baked potatoes

C2: roast beef

C2: roast beef C2: chicken casserole Cl : mixed vegetable soup

C2 : curry C2: sole

C2: pork chops

: mushroom

Cl

C2: kebbabs

C2: moussaka C2: paella C2: boeuf bourguignon

C2: boeuf bourguignon

C2 : roast lamb

C2: Herring Calais

S3,l:l

s3, 1:3 s3,2:1 S3, 2~3

s3, 3:l s3,4:1

S3,S:l

s4, s:3

s4, s:3

S5, 2~3 ss, 5:3 Sl, 1 :l

Sl, 1:3

S7,2:3

Sl, 4:l

with orange

soup

Course used in choice

sauce

and that would depend on what the rest of the meal is going to be [chose roast meat] then you can start with anything you like dessert, again any sort of dessert would be suitable and you have to specify a kind of soup to go before that not a fruit salad, if we had soup to start with, something a bit more solid than that I think and to start with you’d only want something very light and before fish one wants something perhaps a little more solid to start with, to make the meal up and the fist course before that would, probably want to be say now what did I say earlier? I’ve forgotten what I started with, !rofher and then I had . orange, so 1 don’t really want any more fruity and something fairly light to follow and for a sweet, think after a paella again probably fairly dry . urn (4s) well, neither dry nor - I wouldn’t take soup, with a casserole to follow and then you’d need a dry type urn, dry sweet to contrast with the casserole that would be quite heavy, so probably the last course would be something like then you would need a fairly substantialfirst course . . urn because the middle course is quite light.

Utterance

Utterances showing use of previously chosen courses in current choices

Context

Table 3.

: pate

course

mousse

: chocolate

soup

: pears bourguignon : scotch broth

C3 Cl

C3 : apple tart

C3: mousse C3 : strawberries and cream Cl : eggs florentine

C3

mousse

: oxtail soup

Cl

C3: chocolate

: salad : cold meats

Cl Cl

C3 : apple pie Cl : mixed vegetable C3 : cheese cake

Cl

Current

3 16

Richard LQvwc

Evidently the strategy is again to guide each choice with the results of previous choices, but exactly how is uncertain. The process seems to be so reliable and automatic that it generates almost no protocol utterances. It is also convenient to mention here a series of patterns in which processing is interrupted because of the need for some piece of information. For example : (S3, 12) a nice veg. depending on the time of year .. .

(S8,5: 1)

but at this time of year might be .. cauliflower some fruit juice .. . (3s) . whatever was appropriate to the time of year

As in the case of reordering of goals (above), they show that subjects are using “Need Slot” representations. The fact that absence of a certain piece of information at the current stage could result in later disruption of processing, is explicitly represented in memory. Nine of these patterns correspond to (NEED: reason of the year) attached to ‘vegetable’ or ‘fruit’. This anticipates the problems (in real life) of choosing an item which is seasonally unavailable. In the experimental task, subjects circumvent the problem in several ways. They may assume a default value such as “this time of year” and then choose appropriately (in fact, three subjects independently choose the same vegetable: subjects h-now what is available!). Or their answer may be left in the form of a program, deferring the choice: “get whatever is in season at the time”. This is of course a common strategy in daily life, when choice is deferred until during shopping. Finally they may work out all contingencies and express more than one possibility. The remaining two cases, which do not in practice appear to affect processing materially, appear to correspond to (NEED: budget) associated with choice of “steak” and (NEED: size of guests’ appetites) associated with Sl O’s rules for combining courses: if I was feeling particularly affluent, I might to (S5,3:2) (SlO, 1:3)

steak it would depend, on my estimation of the size of my

guests’ appetites ... (3s) ... I think I’d go for something light and fruity What implications can we draw from all the behaviors discussed under the ‘umbrella’ of the EXPAND transition? It must be possible to represent what we have called Abstract Descriptions of items. These set out the goals which an adequate exemplar must satisfy. In a sense, they are ‘models’ or ‘definitions’ of items. Since it is possible to use these separate goals to construct an answer out of several parts, Abstract Descriptions must also represent the interdependencies between goals. Most simply, they must give rules or procedures for combining items which satisfy separate goals. These rules operate

Planning meals

Figure 5.

3 17

Hypotetical stages in handling hierarchical expansions with a push down stack of capacity four.

slot

contents

at successive stages

1 2 3 4

MEAL

Cl Ml Tl C2 P T2 C C2 Tl C2 C3 T2 C3 V c3 c2 c3 c3 c3 c3

V C3 M3 T3 T3 C3

in terms of the properties of items, therefore it must be possible to represent such information about properties in general. Both to hold the several goals set by an EXPAND transition during subsequent processing, and for “unpacking” hierarchical structures of goals, temporary storage in the form of a goal stack must be available. It must have a capacity of at least four items, as can be seen by examining how its contents would change while handling the most complex structure used (Figure 5). Here the simplest possibility, a push-down stack, is assumed. Notice that this store cannot be the same as that needed to mediate the compare-and-choose pattern (above), since that behavior occurs during expansion into a hierarchy of goals. For efficient processing this goal stack must be protected from interference, and is therefore not closely related to any conception of ‘short-term store’ current in psychology. We have also seen that subjects can represent ‘demands for information’ in advance of the point at which the information is crucial. The concept of a Need Slot was introduced for this, where (NEED: X) represents the fact that unless processing takes account of X at this point, there is likely to be serious disruption later. This issue of the appropriate location for information brings us to control processes in general. Need Slots are dealt with in several ways (including the use of default values and program-like answers) but in the case where they code interdependencies between courses subjects are able to reorder goals to overcome the problem. Thus the structure of the semantic area can force a particular ‘optimum’ order of attacking goals upon subjects, and this appears to be why all subjects are mutually consistent in the order in which they attempt goals. The goal order within C2 is highly regular both within and across subjects, so the restrictions must here be very tight (i.e., to use the order VCP must be much harder than using PCV). Subjects are able to use Abstract Descriptions to segmenta hard task into simpler components, which can be tackled one at a time. The mode of processing is one of carryingforward constraints to guide the next choice. Thus, given a problem which

3 18

Richard Byrne

can be split into three goals, A, B and C, subjects fashion :

proceed

in the following

CHOOSE A --f a + b CHOOSE BTO GOWITHa CHOOSECTOGOWITHa&b+ c CHECK THAT a & b & c SATISFIES (A B C)

As a result, back-up is rare and few choices later have to be rejected. Avoidance of Repetition Subjects interpreted the task wording of “another three-course meal for the same guests” to mean that repetition of items in the series of answers was undesirable, and this made the task substantially harder. For example: so now we have to change things obviously we want everything quite different I’d want something different for the same people ah, it gets complicated now, doesn’t it? I see, this is getting increasingly difficult

(S3, task-2) (S4, task-2) (SS, task-2) (S6, task-2) (SlO, task-3)

This prohibition on repeating items was not always absolute. A number of cases of ‘limited’ repetition (Table 4) show two criteria for allowing repeats. If an item has not been given for some time it may be repeated (a mean of 2.2 meals separates the repeats in Table 4 and none are in adjacent meals). In addition (sometimes alternatively), the repeated item should be varied in some way: a different type of soup, or melon trimmed differently, for instance. S3 formalizes this distinction between specific instances and general classes into two ‘standards’ of strictness: (S3,2:2)

well if we’re going to be original we won’t use the same type of meal with changing the meat, we’ll have something different.

His standard (S3,5:2)

is later relaxed

when such strictness

becomes

burdensome:

if this goes on much longer we shall have to start repeating things - or at least repeating the style of things.

Even allowing for a little laxness in keeping to the implicit task instruction of “no repeats”, the overall lack of duplication is striking. How do subjects avoid repeating answers they have used already? A number of utterances show that previous answers are retrieved during the process.. (S3,2:2) (S3,2:3) (S3,6:2)

a made-up dish rather than just a plain roast. if we gave them a pastry thing last time, we’re going to want something completely different. what have we given these people so far?

1: 1 2:3 3:l 1:2

1:2 broccoli

2:l home-made

?

2 : 1 chowder

1: 1 Flemish cream soup 1: 1 cream of onion soup 3: 1 melon with ginger

I:1 and 4:l 1 :l vegetable

s2,4:1 s2,5:3 S2,6:1 S3,5:2

S4,6:2

ss, 4:l

S-l, 512

s7,4:1

S8,4:1 s9,4:1 s9,5:1

S9,6:1 SlO, 5:1

consomme

vegetable

French onion soup fruit salad salad roast beef

Previous use

soup

think I might go back to soup this time fruit salad again salad, again if this goes on much longer we’ll have to start repeating things - or at least the style of things we’ll have broccoli again, they haven’t been to dinner ~ haven’t had that for a long time another home-made soup, because they liked that first one, only not the same sort they haven’t been for a while so could have something like a potage soup, urn, I said chowder before but you can have other ones a different soup I’d start with soup again . (5s) a more unusual soup I’d start with melon again . . . (3s) .._ decorated with mandarin oranges I would start . . . (3s) . again with soup a meatysoup

Explicit cases of limited repetition of items

Context

Table. 4 choice

mixed vegetable soup dumpling soup melon with mandarin oranges borsht minestrone

scotch broth

?

home made chicken

broccoli

cream of celery soup fruit salad salad roast duck

Current

soup

320 Richard Byrne

(S7,3:2)

I’ve given them .. . something with mornay sauce .. . bourguignon . .. roast . . . now they’ve had .. . (3s) . .. boeuf bourguignon . . urn, roast, pasta . . wait a minute .. . boeuf bourguignon, roast, pasta and fish . I’d start, not with a soup, but with . .. then for a complete contrast by way of a main course instead of something dark and . .. dark with a sort of .. . curranty sauce. perhaps something . er, like chicken. what did we have? Yes, yes .. . (1 1s) . .. yes, well . .. can’t have poultry again, we don’t want chops again . . . (9s) . . well, I suppose, something, er .. . more with mince than joints of meat would be nice, such as ... a moussaka on this occasion we won’t have a sweet .. . er, we’ll have a cheeseboard.

(S7,4:2) (S7,5:2) (S9,2:1) (SlO, 2:2)

(SIO, 3:2)

(SIO, 4:3) In many

ordinate

cases,

answer is not mentioned exactly, but a super-~ in the case of (SlO, 3:2), with some effort. It and quite reasonable to assume that all subjects use the same we will call the “CONTRAST COMPUTING STRATEGY”. the previous

class of it is derived

is parsimonious strategy, which Hypothetical steps in this process 1. 2. 3. 4.

are shown for two particularly

RETRIEVE PAST ANSWER(S) FIND PROPERTY DESCRIBING THEM FIND PROPERTY EXCLUDING THIS CHOOSE ITEM WITH THIS PROPERTY

(S3,2:2) roast beef plain roast made-up dish beef casserole

clear cases:

(SIO, 3:2) chicken, lamb chops joints of meat mince moussaka

Notice that this strategy is only heuristic. In (S3, 2 :2) it actually fuils, producing beef casserole: “not beef, perhaps chicken casserole”. That this slip is immediately corrected implies that a restrospective check is used in conjunction with the guiding strategy, and other utterances support this deduction: (S3,2:1) (S3,4:2) (S3, 6:2) (S4,4: 1) (S4,6:2) (SlO, 3:2)

soup, to start with - be nice change a fish for example . .. just for a change that would be something they hadn’t had yet toma ~ green pea soup (after gazpacho for 3: 1) roast ~ er, no, chicken with almonds (after roast duck for 3:2) a moussaka . . which is quite different from the other two things

These utterances include the only three cases of errors and consequent back-up caused by duplication of items. Evidently, if retrospective checking of possible solutions were used alone, errors and back-up would be common.

Conversely, the Contrast Computing strategy is only a heuristic and so errorprone. The combination seems to be an ideal way of achieving economical and competent processing.

Planningmeals 32 I

Essentially the same strategy is applied by two subjects in a more complex way. S4 finds contrasts at the level of whole meals by selecting different occasions and their typical meals: (S4, task-l) my last dinner party (S4, task-2) a Christmas dinner (S4, task-3) a summer one (S4, task-4) I’m getting short of money now S7 also works at the level of entire meals, but in terms of the patterns of individual courses, coding meal-l as (DRY WET DRY) and so using (WET DRY WET) for meal-2: (S7, task-2)

well I’d reverse the procedure really, and have a fairly dry middle course ... with say soup These behaviors have several implications for mental representations. Step 1 of the Contrast Computing strategy shows that subjects remember their previous answers, and step 2 that they can retrieve superordinate classes of items (including classes based on physical properties) of them; neither of these implications is unexpected. Similarly, step 4 is a straightforward REPLACE transition, and only implies access to subsets of items. However, step 3 presupposes that a different kind of information is represented: negative facts, or information on the non-overlap of classes. This is the kind of information implied when any proposition is denied: knowing that “plain roasts” exclude “made up dishes” is the same as denying the proposition “a plain roast is a made-up dish”. Provided that only non-overlapping subsets are represented (which is not usually specified in models of semantic memory), a network model based around sub/superset links can mediate this behavior. Then, given an item X, if a superordinate of it, X’, is retrieved and any other subset branch taken from X’, the resulting node will “contrast with” X. If so, then the different ‘standards of strictness’ in avoiding repetition, which some subjects show, correspond to travelling for different ‘distances’ up the network before descending a ‘contrasting’ branch.

5. Summary The aim of this study has been to deduce, by observation of the behavior of the human memory system in use, a set of features which are minimally necessary to account for the behavior exhibited. It is not implied that these features are sufficient, and that any model which possesses them will be an adequate model of the system. Rather, the need for these mechanisms constrains the set of possible models.

322

Richard Bvrne

Some of these mechanisms are sufficiently similar to ones previously postulated in psychology to be identified with them; others are radically different, though some of these are related to mechanisms found useful in artificial systems. These correspondences will be pointed out in the summary which follows. In a number of cases effort was made to find correspondences between those representations used by subjects in the task and those urged in memory theory, but this was often impossible. The summary is divided into ‘structural’ and ‘control process’ aspects, though the distinction blurs when we consider ‘appropriate location’ of information. Memory structures Representation of set/subset and disjoint set information That the memory system has direct access to some of the subsets of a given item was implied by the patterns we labelled the REPLACE transition and the Contrast Computing strategy. The latter also showed that the converse, direct access to some of the superordinates of an item, is available, and that whether or not sets are mutually exclusive is represented. For instance, given an item ‘bird’, the system can directly retrieve the facts that an example of bird is ‘canary’ and a superordinate is ‘animal’, and can deduce from the representation that if an item is a bird it cannot be a reptile and may be a pet. Choosing this example from the domain most popular in work on semantic memory makes it clear that these requirements on a possible memory model have already been put forward for other reasons. Representation of LE VEL of items The termination rule for chained series of REPLACE transitions suggested the need for the system to have access to the ‘level’ (on the dimension vaguespecific) of objects. This was confirmed by cases in which subjects altered the level of termination. Although this level roughly corresponds to ‘distance’ along set-subset links, we saw that the representation of level must be independent of set-subset information. Representation of this information is intuitively plausible, but seems to be novel in psychological modelling. Representation of ABSTRACT DESCRIPTIONS of items From patterns of behavior associated with the EXPAND transition, we saw that for some items the separate goals which an adequate exemplar must satisfy are represented. We called the structure holding this information an “Abstract Description”. The goals are not independent, so that a solution for one goal restricts possible solutions for others. The restrictions need not be symmetrical, so that an ‘optimal’ order of satisfying goals can be generated. Examples of interdependencies from the current task are the restrictions on a meal from past meals (“no repeats if eaters same”), the restrictions on one

Planningmeals 323

course from other courses, and the restrictions within a course from other parts of it. As well as representing such restrictions, Abstract Descriptions must include criteria for limited violation of them, to avoid total failure. The concept of an Abstract Description of an item is not closely related to any other ideas in the psychology of memory; the nearest is perhaps Miller, Galanter and Pribram’s (1960) suggestion that some memories are stored as plans. An Abstract Description of a particular class of meal could be identified with a ‘plan’ for mentally constructing or mentally evaluating examples of this class of meal. Winograd (1972) has argued that all knowledge is best stored as programs, but this is a much stronger claim than we are making here.

Appropriate location of NEED SLOTS and properties of items The ability to represent the need for a specific piece of information, whose lack would later make processing liable to disruption, was termed a “Need Slot”. A crucial issue was the location of the information: the Need Slot can be inserted in the mental ‘program’ at such a point that it will be encountered early enough to prevent disruption. We saw that Need Slots were used to represent a known asymmetry in the restrictions imposed on one course or another, and to represent a seasonal factor in the availability of vegetables and fruit, among other things. In a similar vein, properties of certain items (e.g., curry) which might cause problems later were represented in such a way that they would be encountered during processing and the problems anticipated. In both cases, the type of information represented is unsurprising and in keeping with many models of memory. What is novel is the need for certain types of information to “become available at the right time”: their location not their content is of interest. Thus the issue is as much one of ‘control’ as it is of ‘structure’.

A GOAL STACK Since Abstract Descriptions can be used to create hierarchical goal structures, some form of short-term memory is necessary to hold and ‘unpack’ such structures. We will call this a “Goal Stack” to differentiate it from other local storage systems. Assuming the simplest possibility, that it operates as a ‘push-down stack’, leads to the need for a minimum capacity of at least four items. As has been discussed, this store is not similar to current concepts of STM, although computer systems have used push-down stores for holding and unpacking goal structures for some time.

An EXAMINA TION B UFFER A pattern of behavior we called “compare-and-choose” showed the need for another form of local storage, since up to three items could be mutually

324

Richard B.vrne

compared. There was no good reason to differentiate this store from similar ideas put forward in the literature, and it seemed parsimonious to identify it with “working memory” (Baddeley and Hitch, 1974).

Having set out the ‘building blocks’ of mental structures and representations, we can now see how they are used in the task to make the process coherent and efficient. One of the most basic patterns of behavior, the REPLACE transition, corresponds to accessing a subset of an item, and this can be done iteratively. accessing a subset of that subset, and so on. This iterative chaining may be terminated in two ways. Either the current object may be evaluated as unsatisfactory, rejected, and the process ‘backed-up’ to a previous object. This is rare, but from the few instances where it occurs we can see that the evaluation is not absolute, but depends on the relative difficulty of the task (or rather, how the subject currently views this). Alternatively, a satisfactory current object may terminate the chain, if it is sufficiently low in level. What is ‘sufficient’ depends on how the subject interprets the task, but this setting of stoplevel is flexible and can be changed (for example, if the experimenter prompts the subject to “be more specific”). Although in rare cases these processes alone may be sufficient for performing the task, normally others are used. Subjects use Abstract Descriptions to segment the whole task into a series of parts, each concerned with a particular goal. Whether this is done only when a problem is intractible, or whether it is felt implicit in the instructions to ‘construct’ an answer in this way, is unsure. In any case, this splitting can be performed iteratively, producing the hierarchical structures of goals which typify most protocols. Although goals are dealt with “one by one” (as noted in Section 4, the simplest idea is that they are “popped” from the top of a Goal Stack), they are not independent. Firstly, an item satisfying one goal may just happen to satisfy another as well. If this happens, we have seen that the goal is automatically deleted*. Secondly, the choice of an item to satisfy one goal may restrict what items can be used for other goals. Again we have seen that these restrictions are carried forward to the next choice, rather than choices being made independently and later checked to see if the combination is possible. Thus, after each choice of an item, the system checks to see if there are any outstanding goals which this item satisfies, and checks what restrictions this choice will impose on later *As was noted in Section 4, the evidence for this aspect of the control process is not conclusive, but is the most parsimonious way of accounting for the data of this study. If other versions were preferred, the remainder of this paragraph would need to be altered accordingly.

Planning meals

325

choices. For example, choosing “moussaka” for the main part of a second course might delete the goals normally realised by “vegetables” and “gravy” (it contains aubergines and is rather damp already) and restrict the “carbohydrate” goal (prohibiting mashed potato). After dealing with the last goal, subjects often ‘review’, checking that the combination of items satisfies the whole Abstract Description. Problems may arise if certain facts are not available at the right times. This is avoided by the appropriate location of a Need Slot, specifying what information is needed and halting processing until it is provided. There are several ways of providing the information, including the use of a default value or all possible contingencies (where these are limited). In particular, if the Need Slot specifies that the choice of item for another goal is needed, then the goals can be reordered. Subjects do reorder goals of their Abstract Descriptions towards an optimal order, and the existence of optimal orders for a given problem may explain why the goal orders are often the same even in different subjects. Finally we have seen that the task is interpreted to mean that all solutions should be different. This is dealt with by a strategy of retrieving previous choices (to the particular goal in hand), finding a property they all share, and choosing an item having a property which excludes this. The kinds of stored information necessary to allow this have been discussed. This considerably complicates later sub-tasks, and subjects sometimes violate the “no repeats” rule in certain limited ways. This ability, of knowing how to break a rule without disastrous consequences, may well be an important one in everyday problems. Possible generality

of results

Some speculations are perhaps appropriate on the generality of the mechanisms proposed. The approach adopted here has been to study in some detail a small part of everyday competence in a single semantic domain. It is doubtful if the mechanisms identified are a complete set even for this restricted task (that is, these mechanisms although necessary are probably not sufficient for a device to perform the task in a manner like our subjects.) If we extended the same task (perhaps requiring meals for different occasions, or with restrictions of budget or apparatus) or examined other everyday abilities in the same area of knowledge (perhaps studying the processes of shopping or cooking), we could expect to extend the set of necessary mechanisms. To a limited extent this has been done (Byrne, 1975). Those aspects of the results which are general across other areas of knowledge are of most interest. Obviously the particular information searched would be exclusive to this task, but logical relationships between pieces of

326

Richard Byrne

information may well be general: for example, the representation which we have called a Need Slot, where economy of processing is achieved by advance warning of crucial information. Similarly, many memory tasks can be performed with various degrees of specificity, so representation of the level of items is likely to be widespread. The format in which information is stored will depend on the uses to which it will be put, but it seems unlikely that Abstract Descriptions will be the only representation more complex than set/subset and simple property information. Abstract Descriptions themselves seem particularly suited to encoding functional outlines of fuzzily-bounded categories, so are probably quite commonplace. Some of the control processes identified are rather dependent on task phrasing, for example the strategies we called Compare-and-choose and Contrast-Computing. Others appear to be generally necessary whenever memory is a problem-solving activity, for example iteration, back-up at failure and segmenting a task into smaller parts which are matched by carrying forward constraints. To assess the generality of these mechanisms it is necessary to construct a logically similar task in a quite disjoint area of knowledge, and analyse behavior in a similar way to this study. It is hoped to present the results of such an investigation at a later date. Whether the mode of analysis is a good one can be judged by how rapidly the set of explanatory mechanisms increases as more and more different tasks are studied in the same way; ultimately it would be hoped to converge on a limited but complete set of mechanisms. However, protocol analysis requires that concurrent verbalisation should not disrupt behavior, and until better techniques can be found to study cognitive abilities which are unavailable to conscious report claims of “completeness” must be very tentative.

Appendix Protocol

of S3, transcribed

verbatim

from tape recording

made on 6.3.74.

Task I E: Right, Here’s your instructions. @ (gave written instructions to S) . . . 1 three course meal suitable for a dinner party .. . (11s) . .. 2 you’re not allowed to answer, any sort of questions from this point on, are you? . . . 3 he doesn’t answer, so assume he isn’t allowed to answer any questions at all . . . 4 so it’s up to me to guess what ‘three course’ means .. . (7s) .. . 5 well something to start with first .

Planning meals

6 7

logical . . . (9s) .. . and that would depend on what the rest of the meal is going to be, so we’ll think about what sort of meal it will be and then go back to the beginning .. . (17s) .. . 8 well if we assume that the middle course will be something like roast meat .. . which is probably the most suitable thing, if we can afford it .. . 9 then you can start with anything you like . . . (3s) ... 10 so if we started with pate, as the first course, together with toast, 11 and lemon and whatever else needs to go with it . .. (3s) .. . that would be satisfactory for the first course, then for the second 12 course you’d need a main dish of . .. a roast meat, say roast beefif we’re going to be extravagant .. . 13 and then trimmings to go with that, which would be .. . 14 potatoes, roast potatoes, and a nice veg, depending on the time of year ... 15 but at this time of year might be .. . 16 cauliflower, for example .. . (3s) . . . 17 perhaps another kind of veg as well, two different kinds . .. 18 say carrots, that’s something of a different colour, and nice taste, and 19 some gravy . . . made from the meat . . . (3s) . .. 20 and possibly a salad to go with it, or as a separate course, if we’re 21 allowed to make an intermediate course in this three course meal . .. (4s) . . . 22 and then for ... dessert, again any sort of dessert would be suitable . . . (4s) .. . 23 could have some sort of pastry . .. 24 like a pie . . . 25 apple pie . . . 26 and cream, that would be nice. @ 27 (167 seconds)

Task 2 E: 1 2 3 4 5 6 7

And the next problem. @ (gave instructions to S) . . . another three-course meal for another dinner party with the same guests I see. Right. So now we have to change things .. . well if we’re going to be original we won’t just use the same type of meal with changing the meat, we’ll have something different .. . so perhaps as the main course we’ll have a made-up dish of some sort rather than just a plain roast . .. (5s) .. . something perhaps in a casserole . .. or even a stewed type of dish . .. (3s) ... (sotto vote: not beef) perhaps chicken casserole, that would be nice . ..

327

328

8

Richard Byrne

have chicken casserole with lots of veg in the casserole, like onions and

carrots and leeks and mushrooms .. . and, er, boiled potatoes, would go with that, as the main dish . . . then . . soup, to start with - be nice change, can always do that .. . and you have to specify a kind of soup to go before that . (3s) . . . I should think some sort of vegetable soup . mixed vegatable soup .. . (4s) .. . and then . . . for dessert .. . (3s) . if we gave them a pastry thing last time, we’re going to want something completely different . . . 18 which could perhaps be . (3s) . not a fruit salad, if we had soup to start with, we want something a bit 19 more solid than that, I think . . . (6s) .. . 20 perhaps a cake of some sort .. . like cheesecake . . . 21 to finish with. @ ;I210 seconds) 9 10 11 12 13 14 15 16 17

Tusk 3 E: And the next problem is: the same again. @ . . . the same again? I see - how many times do I have to do this - you don’t 1 answer that question! ,.. (5s) .. . 2 well assuming that we want to keep ringing the changes . . (3s) . .. then, there are many different kinds of main courses we could make, which 3 would, again be completely different . 4 something like a .. . 5 curry for example is always easy to make . . . if we had . .. 6 7 meat curry say, a mutton curry .. . (3s) .. . be a -- usual side dishes and so on, like rice and cucumber salad with 8 yoghourt for example, to go with it, and chutney and pickle .,. that’s the main dish . . . 9 10 then to start with you’d only want something very light, like perhaps a light salad .. 11 and, er . .. ice-cream would probably be a suitable dessert. @ tt?3 seconds) Tusk 4 Right-on. And another one. @ . . . E: another one - this is getting difficult . .. 1 2 urn, taxing our imagination . .. (7s) .. . and if we were going to be reaZly extravagant 3 thing . . . (3s) . ..

we could go into some-

Planning meals

4 5

nice and fancy like .. . (5s) . . . like something cooked with wine, that always goes down extremely well, we could have, er .. . a fish, for example . . just for a change . . and a suitable sort of fish for a main course would be, say, er, sole . . . cooked with white wine and mushrooms and cream ‘Sole au Champagne’ as it’s called .. . 10 that would be again with plain potatoes, and er . .. (4s) . .. 11 probably with a salad rather than a hot vegetable to go with that . . . (3s) .. . 12 and before fish one wants something .. . 13 perhaps a little more solid to start with, to make the meal up .. . (3s) . .. 14 we could have .. . (7s) . .. 15 perhaps . . . (4s) .. . 16 a selection of cold meat sausages, and pate and things like that .. . 17 with bread or toast, to start with . .. (7s) ... 18 and to finish with . . . (3s) . .. 19 still looking for something completely different from what we’ve had before .. . (4s) . .. 20 we could have perhaps a fruit salad, that would go quite well at that stage. @ (114 seconds) Task 5 E: And another. @ 1 and another . . . 2 if this goes on much longer we shall have to start repeating things - or at least repeating the style of things . .. (5s) ... 3 well another sort of main dish of course which would be - go down extremely well, though it’s a bit extravagant, would be something like duck . . . 4

5 6 7 8 9 10 11 12 13 14 (72

which is, er .. . greatly favoured but highly expensive, so if we had roast duck as the main course .. . with, er .. . stuffed with apple and prunes and served with orange sauce and orange salad . . . (3s) . . . and roast potatoes . .. (4s) .. . we could then . . . (3s) ... perhaps start that with soup - that would not be unreasonable . .. start that with a light soup, like chicken soup for example . .. (3s) .. . and to finish off with, er ... (6s) . . . (sotto vote: what would we finish that with) .. . finish off with a chocolate mousse, I think. @ seconds)

329

330

Richard Byrne

Task 6 Right-ho. And one more. @

E; 1 2 3 4

5

6 7 8 9 10

11 12 13 14

15 16 17 18 19

(82

One more .. . don’t suppose that really is going to be the last one. but never mind (laughter) we’ll think of another one . . . what’ve we given these people so far? Mind you - if they’re only coming to dinner once every two months they’all have forgotten what they had the first time anyway . .. (3s) . . . something different again: well, a different kind of meat we could perhaps give them, pork chops, that would be something they haven’t had yet . .. there are many nice ways of cooking pork chops .. . (3s) . . . perhaps we could, er .. . (3s) . .. just, er, fry them .,. and serve them with a nice, onion sauce .. . that would go down very well . . . and potatoes and . . . one and two kinds of veg to go with that - perhaps a green veg, like .. . brussels sprouts would go with that, very well .. . (3s) . .. and the first course before that would, probably want to be soup, of a different kind again . a fairly thick soup, like . .. oxtail, say to start with . .. (5s) .. . and to finish off with . .. (3s) .. . we could have . . . a trifle. @ seconds)

Glossary Abstract Description: an inferred mental representation of a class of item, which describes it in terms of the goals which must be satisfied by an example of the class. It may also encode restrictions and interdependencies between the goals, and criteria for occasional limited violation of restrictions. Expand: a (1 :many) transition between several objects where the objects are in a whole/ parts relation, such that (X), (X + i), (X + j) .. . etc., together make up all the component parts of object (X - 1). The components are normally not adjacent, so 1 G i < j. Goal: characteristic or property which is required for a certain item. Where the item is a composite of several parts, each part may correspond to a single goal of the whole. Level: degree of vagueness/specificity of a class of food items, corresponding to its position in a hypothetical hierarchy of classes. Need Slot: an inferred mental representation of any specific information whose lack at

Planning meals

331

some point in processing could cause later disruption. The nature of the information is unimportant, but its location is crucial for efficient processing. Object: a class of food items, considered by a subject as a candidate for, or superordinate of, a part of the required answer. At first in the analysis, the term was used for entities with a 1: 1 correspondence to noun phrases in the protocol, but later it was extended to include entities which were deduced to have been held mentally. Replace: a (1 :l) transition between two adjacent objects where the objects are in a set/ subset relation, such that (X) is a kind of (X - 1). Stack: a first-in, last-out store, which can be used to “unpack” hierarchically bracketed structures of goals into a linear sequence. Stop: a (1 $) transition, where an object which itself forms part of the final answer is not related to any subsequent object in a simple way. Transition: a protocol is viewed as a string of objects (broken in places by omission) connected to each other by transitions. Transitions are classified by the number and the relationships between the objects they connect, and are believed to reflect underlying mental operations.

References Anderson, Baddeley,

J. R. and Bower, G. H. (1973) Human Associative Memory. Washington D.C., Winston. A. D. and Hitch, G. (1974) Working memory. In G. H. Bower (ed.), The Psychology of Learningand Motivation, Vol. 8, pp. 47-90. Bobrow, D. G. and Norman, D. A. (1975) Some principles of memory schemata. In D. G. Bobrow and A. M. Collins (eds.) Representation and Understanding: studies in cognitive science. New York, Academic Press. Byrne, R. W. (1975) Memory in complex tasks. Unpublished Ph.D. thesis, University of Cambridge. Chomsky, A. N. (1965) Aspects of the Theory of Syntax. Cambridge, Mass., M.I.T. Press. Collins, A. M. and Quillian, M. R. (1969) Retrieval time from semantic memory. J. verb. Lear. verb. Behav. 8, 240-247. Collins, A. M. and Quillian M. R. (1972) How to make a language user. In E. Tulving and W. Donaldson (eds.) Organisation and Memory. New York, Academic Press. Dansereau, D. and Gregg, L. W. (1966) An information processing analysis of mental multiplication. Psychon. Sci., 6, 71-72. Duncker, K. (1945) On problem solving. Psycho/. Mono., 58, 5 (Whole No. 270). Fillmore, C. J. (1968) The case for case. In E. Bach and R. T. Harms (eds.) Universals in Linguistic Theory. New York, Holt, Rinehart and Winston. Griggs, R. A. (1976) Semantic memory: a bibliography 1968-1975. Percept. Mot. Skills 43, 729-730. de Groot, A. D. (1965) Thought and Choice in Chess. The Hague Mouton. Miller, G. A., Galanter, E. and Pribram, K. H. (1960) Plansand the Structure ofBehavior. New York, Holt, Rinehart and Winston. Minsky, M. (1974) A framework for representing knowledge. Memo. No. 306, M.I.T. Artificial Intelligence Laboratory. Morton, J. (1970) A functional model for memory. In D. A. Norman (ed.), Models of Human Memory. New York, Academic Press.

332

Richard Byrne

Morton

J. and Byrne R. W. (1975) Organization in the kitchen. In I’. M. A. Rabbitt and S. Dornic (eds.),Atter~fion and Prrforrnance I’. New York and London, Academic Press. Neisser, U. (1976) Cognifion and Reality. San Francisco, W. II. Freeman. Newell, A. and Simon, H. A. (1972) Huron Problem Solving. New York, Prentice-Hall. Norman, D.A. and Rumelhart, D. E. (1975) Explorutiom in Cognition. San l:rancisco, W. H. I~reeman. Quillian, M. R. (1966) Semantic Memory. Unpublished doctoral dissertation, Carneyir Institute of Technology. Thorndyke, P. W. and Bower, G. 11. (1973) Storage and retrieval processes in scntcnce memory. Winograd. T. (1972) lJndcrstandinsy Namrul Language. New York, Academic Press.

Unc tichc aussi pratiquc et familiGrc clue la planification d’unc Gccption implique “la solution dc probl&~es” nombreux et complexes. Nous l’utilisons, dans le prdscnt article, pour 1’L:tudc dcs ol&ation< de mdmoirc quotidienne. La technique retenuc a c’tC analysCc d’un point de vuc thioriquc et adapt& j la tichc concern&; elle consistc i analyscr. comme dans Its htUdcs de solutions de probl&mes formels, dcs protocoles verbauu. L’Gtude montre la n&cssit& d’un certain nombrc dc structures mcntalcs ct de proccssus de contrirlc correspondants clue pcu dc rechcrchcs psychologiques ont mcntionnc’s.

Cognition, OElsevier

5 (1977) 333-361 Sequoia S.A., Lausanne

Implicit

2 - Printed

learning:

in the Netherlands

An analysis of the form and structure of a body of tacit knowledge* ARTHUR Brooklyn

S. REBER

and

SELMA

LEWIS

College of CUNY

Abstract Subjects learned implicitly the underlying structure of’ an artificial language bl’ memorizing a set of representative exemplars from the language. The form and structure of their resulting knowledge of the language was evaluated and analqazed over a four day period bll several procedures.. (a) solving anagrams from the language, tbl determining the well-formedness of novel letter strings, and (c) providing detailed introspective reports. Several important implications about implicit acquisition of a novel complex s.)ystem emerged. First, the memorial representation of a structured system is acquired through the dual operations of a differentiation-like process based upon relational invariances and a configurational process based upon overall structure. Second, the form of tacit knowledge is an abstract representation of the intrinsic structure of the stimulus field. Third, while the ability to make explicit what is known implicitl?! increases with performance levels, the conscious apprehension of structure always lags behind what is known unconsciousl?:

The term “implicit learning” was coined several years back (Reber, 1967) to characterize the manner in which subjects came to apprehend the underlying structure of a complex stimulus environment. At that time it was argued that there were two aspects of implicit apprehension that clearly differentiated it from various other, more explicit, acquisition processes. First, that it was a process which took place quite naturally and simply in any subject who devoted sufficient attention to a structured stimulus environment. Second, that it was manifested in the absence of conscious operations such as hypothesis testing about the nature of the stimuli and explicit strategies for learning. *Our thanks to Chris Hether for collection and initial analysis of the data and to an anonymous reviewer for a cogent and thoughtful review. The research was supported in part by Grant MH 2023901 from NIMIl. Reprint requests should be sent to Arthur S. Reber, Department of Psychology, Brooklyn College of CUNY, Brooklyn, N.Y., 11210.

334

Arthur S. Rdxr

and Selma Lewis

Our experimental studies have been carried out using synthetic, referencefree, languages. The stimulus materials consist of strings of letters formed according to rather complex rules for letter order. A typical grammar is shown in schematic form in Fig. 1. These artificial languages have several virtues as experimental devices, the most important of which for our purposes is that they have a rather rich rule system, one which is unlikely to be within the range of the general knowledge that the typical subject brings into the experimental laboratory with him. In this sense our materials and procedures are markedly different from those used in the more analytic concept formation and serial pattern learning literatures (see Jones, 1974 for a review). Thus, our investigations of implicit learning have been concerned with, essentially by definition, the nonconscious aspects of the cognitive processes involved in the acquisition of complex, tacit knowledge (Reber, 1967, 1969, 1976; Lewis, 1975). Our empirical findings in this regard can be summarized essentially as follows: When the underlying structure of a stimulus network is highly abstract and complex, subjects become quite adept at judging the grammaticality of letter strings they have never seen before. The implication is that subjects are learning rather abstract regularities of the stimulus network. The optimal process for apprehending such structure is one where the learner is free from specific learning strategies and conscious hypotheses about the to-be-learned structure. Obviously, such a set of circumstances places severe restraints upon the experimental procedures that can be employed to obtain further information about the actual operations involved in implicit learning. An annoying kind of uncertainty principle pertains. If we ask our subjects to try to report their cognitive modus operandi during acquisition, the very introspective act transmutes the cognitive process and we lose the “implicit” element, the very thing we wish to study. If we don’t ask them, we must rely upon indirect evaluation procedures which, as we have complained elsewhere (Reber, 1976) are often unsatisfactory. There are a variety of possible ways out of this bind. In this study we introduced an anagram solution procedure as a kind of projective device with which to obtain a more detailed picture of what our subjects have learned implicitly. Anagram solution tasks have a long history of use as empirical probes into the cognitive and affective processes involved in the organization of natural language materials. Here we extend their use to the exploration of the underlying processes of abstraction: the manner in which the intrinsic structure of a complex stimulus environment is memorially represented. Moreover, given the delicate balance between knowledge which is tacit and acquired implicitly and knowledge which is explicit and acquired through conscious rule induction (Reber, 1976), the anagram task seems particularly

Implicit learning

335

well suited for our purposes. It has the important virtue that it is more overt than the discrimination of well-formedness used in our previous work, yet it is still quite remote from the free, generative procedures used by others (e.g., Miller, 1967) which seem to us to lead away from the issues of implicit learning and representation of tacit knowledge and back toward the more traditional study of conscious, inductive rule learning. We should emphasize here at the outset that the central theoretical issue under scrutiny is the analysis of the form and structure of a body of abstract tacit knowledge acquired in a laboratory setting and relatively free from preexisting cognitive structures. The focus is thus somewhat removed from the contemporary orientation in the study of cognition where the paradigmatic dominant is the formalization of existing symbolic structure (see, for example, any number of contributions in Weimer and Palermo, 1974). Thus we use as the source for our stimulus materials a fairly rich and complex synthetic language whose underlying properties are unlikely to be within the purview of the typical undergraduate and are unlikely to be consciously apprehended by the use of simple decoding strategies. The actual experiment itself is, on the surface, extremely simple. After a short, intense period of memorization of well-formed strings of letters from an artificial language, subjects were required to solve anagrams in that language by reordering letters so that they adhered to the grammatical constraints exemplified by the set of sentences in the memorization task. The solutions thus offered were analyzed for regularities reflective of the subjects’ tacit knowledge of the rules of the language. The simplicity, needless to say, is illusory and, as we shall argue later, can be used to support our existing theoretical notions about implicit learning and extend them to some of the more esoteric areas of cognition.

Method 1. Stimulus Materials

The stimuli consisted of ordered strings of letters generated by the finite state grammar shown in schematic form in Figure 1. Each acceptable string in the language represented by this grammar is defined by a unique sequence of permissible transitions from the initial state 0 to the terminal state 0’. Thus, for example, the state sequence O-1-1-2-34-2-0’ generates the letter string TSXXVPS. This particular grammar will generate 43 unique strings of Lengths 3 through 8. (For details on the procedure for this calculation and other formal aspects of finite state systems see Chomsky and Miller, 1958

336 Arthur S. Reher and Sehu Ixwis

Figure I.

Schematic dtigrarn of the finite state gramvzar used to gerreratr the stimdi. See text for clcscription.

and Reber, 1967.) Fifteen of these acceptable let ter strings were selected for use as training stimuli: the remaining 38 served 23s the test stimuli for the anagram solution task.

The subjects were ten undergraduate for the four days of the experiment.

volunteers

who were each paid Xl0

The 1.5 exemplars of the grammar were presented to the subjects for memorization in five sets of three items per set. These items were selected from the total of 43 so that there were examples of each of the lengths from 3 through 8 and for each length where it was possible the three “loops” or recursions in the language (S, T, VPX) were represented. For each set the three items were shown one at a time through a viewing window. Each item was visible for 5 sec. The subject was then handed a blank card and asked to reproduce in writing the full set. The order of stimulus items was varied randomly from subject to subject. Subjects were informed after each trial only as to which items they had recalled correctly; no information about the nature of their errors was provided. The criterion for learning was one completely correct reproduction of the full set of three items. The training session lasted until all five sets were learned. In keeping with previous work and to optimize acquisition of the underlying structure of the grammar (see Rcber, 1976), subjects were given no information about the rule-governed nature of the stimuli: during training the experiment was referred to only as an investigation of rote memory.

Following the completion of the training phase subjects were informed about the existence of the well-defined set of rules for letter order. Nothing about the exact rules was communicated, merely that they existed. Testing

Implicit

learning

337

procedures were then used to assess the extent to which subjects had succeeded in apprehending the structural regularities incorporated in the original 15 stimuli and the nature of their apprehensions. Two procedures were used here, the anagram task which will be the source of most of the data discussed later, and our usual discrimination of well-formedness task which will allow us to draw comparisons between the results of this study and our previous research. A tmgram solutions Several variations of the anagram solution task exist. The one used here is the simplest case where on each trial a set of letters is given to the subject in a random order and an acceptable symbol string must be produced using all and only those letters. The remaining 28 letter strings, or more precisely, the letters that constitute these 28 letter strings were used here. On each trial the subject was given a set of shuffled cards, each containing one letter of the particular string, with instructions to arrange them in a “correct” order.* Correctness was to be treated as those structural relationships between symbols represented by the 15 exemplars from the memorization task. A stop watch was started when the subject received the set of letters and was stopped when he announced that he was satisfied with his solution. On Day 1 of the study the anagram solution tasks followed the training session after a ten minute rest period. On Days 2, 3, and 4 the original 15 exemplars were re-presented briefly (5 set viewing time per item) before the anagram task. On three-quarters of the trials subjects were provided with a letter-position cue. Either the first, the last, or the middle-most letter was placed in its proper location for the subject just as he was handed the stimulus cards. Each of these cues was used on one-quarter of the trials; the remaining one-quarter of the trials was run without a cue. The order of anagrams and cues was determined by a counterbalanced Latin square design so that each anagram problem was attempted four times during the experiment, once each day and once under each cue condition. Subjects were never informed about the correctness of their solutions. There was no time limit imposed on them although they knew that latencies were being recorded.

*Of the 28 anagrams 24 had unique solutions; four unavoidably had two correct variations. example, the letter set PPTTVVVX can be used-to from two acceptable strings, PTTVPXVV PVPXTTVV. Where relevant, this fact is taken into account in the analyses that follow.

For and

338 Arthur S. Reber at& Selrna Lewis

At the end of Day 4’s anagram solution session, subjects were tested further using our standard procedure of discrimination of well-formedness (see Reber, 1967, 1976). Forty-four letter strings were used here; 22 were selected at random from the set of 28 used for the anagrams (referred to below as grammatical or G items) and 22 were generated by introduction of a single letter violation of the grammar into a letter string (the nongrammatical or NG items). Each was printed on a card and displayed through the same viewing window used in the training session. The subject’s task was to press one of two buttons labeled “yes” and “no” indicating whether or not he felt that the letter string conformed to the rules for letter order. Subjects were encouraged to give reasons for their responses whenever they could. The stimuli remained visible until the subject responded. Latencies were not recorded on these trials. The full set of 44 items was presented once with order randomized for each subject. All subjects were informed about the equal proportion of G and NC items. Again, no feedback about correctness of the response was provided.

After the completion of the discrimination test, subjects were given a sheet of paper and asked to introspect and write freely about the experiment. They were requested to try to provide as much detail as possible about, (a) what they knew of the rules for letter order, (b) what they thought they were doing, specifically the rules they were using whether or not they were sure that they corresponded to the actual ones and, (c) any other mnemonics, strategies, or “gimmicks” they used. Subjects were then fully debriefed as to the nature of the experiment, paid, and ‘released.’ Results

The data here were consistent with those obtained in our previous work with this particular grammar as well as with the work of others (e.g., Miller, 1958). Over the five learning sets the number of trials taken to reach the memorization criterion dropped systematically from a mean of 5.8 on Set 1 to a mean of 2.5 on Set 5. The number of errors committed before reaching criterion showed a similar decline from 6.5 to 2.3. As before, the clear implication is that these trends reflect the growing ability of the subjects to exploit the regularities intrinsic to the stimuli. Since the primary purpose of this investigation was an examination of what our subjects learned, no further analyses from this part of the study will be presented here.

Implicit learning

Table 1.

Proportion of Correct Anagram Solutions over the Four Days According Letter-position Cue

Cue

Day

to

Mean

~___

1 -__ None First letter Middle letter Last letter Means

339

0.20 0.31 0.31 0.37 0.31

2 ____ 0.31 0.33 0.43 0.36 0.35

3

4

0.43 0.47 0.53 0.59 0.51

0.74 0.56 0.67 0.70 0.68

0.42 0.43 0.48 0.50 0.46

2. Anugram solutions The procedures used here produce a welter of data, much of it analyzable by standard techniques but much of it highly qualitative and relatively immune to traditional statistical devices. We first consider the relatively straightforward quantitative findings; then we present several more molecular analyses of the subjects’ response patterns; and finally we present a ‘case study’ of the behavior and introspective reports of a representative subject. Probability of a correct solution (PC) From Table 1 it can be seen that there was a marked improvement in success of anagram solutions over the four days of the experiment, F(3,27) = 28.9, p < .OOl. The various cue conditions also produced a significant, although somewhat muted effect, F(3,27) = 3.89, p < 0.05, with last latter = middle > first = none. The cues by days interaction was not significant. The length of the anagram was, not surprisingly, a strong variable. Since the two acceptable three-letter strings (TXS and PVV) were both used during training, only lengths 4 through 8 appear in the anagram task. As Table 2 shows, the more letters in the anagram the lower the likelihood of a correct solution, and the pattern is found throughout the four days of the experiment. The length by days interaction was not significant, and the cue variable was not differentially effective with items of different lengths. The data were further analyzed according to the type of item. The system schematized in Figure 1 can also be represented by five different types of letter sequences that it generates. Each type is characterized by a path through the system with obligatory and optional state transitions, the optional one being the recursions or “loops.” The five types are as follows, with the optional transitions demarcated by parentheses: 1. T(S)XS, 2. T(S)XX(T)(VPX(T))VV, 3. T(S)XX(T)(VPX(T))VPS, 4. P(T)(VPX(T))VV,

340

Arthur S Reber attd Selttla I,ewis

Table 2.

Proportion c?f‘Correct Anagram Solutiotls over the Four Days Accwditzg Ixt~gth of’ A rugram

to

I,er1$@

Mean

Day

4 (2) 5 (3) 6 (4) 7 (7) 8 (12) Mean

1

-7

3

4

0.63 0.63 0.29 0.30 0.20 0.31

0.64 0.60 0.35 0.36 0.22 0.35

0.77 0.86 0.52 0.46 0.38 0.51

0.9s 0.96 0.68 0.63 0.57 0.68

0.75 0.77 0.46 0.44 0.35 0.46

“Note that the rnarpinals for days are ;1 result of differential Icngth differed as indicated by the numbers in parcnthescs.

wcightings

since number

of cases of each

5. P(T)(VPX(T))VPS. Note that Types 2 and 3 and Types 4 and 5 are quite similar differing only in the two terminal positions and that they are all considerably more complex than Type 1. The pattern of correct solutions of each type over the four days is presented in Table 3. Type 1 anagrams were solved with much higher probability than any of the other types which were not statistically different from each other. There were no significant interactions between cue, item type, and/or days. Solution

times

There was a strong negative correlation between the probability of a correct solution and solution time, r = -0.74, p
anul.vses

Some relatively molar characteristics of what our subjects know can be derived from the above. To provide a more molecular picture a variety of fine grain analyses were carried out on the patterns of responses which are exhibited within their proffered solutions. In these analyses we looked at the positional restrictions for single letters, for bigrams, for trigrams, and for the recursions or loops. In all cases we examined the extent to which subjects reflect the grammatical restrictions in their attempted anagram solutions.

Implicit learning

Table 3

Proportion Type

Item Type

1 2 3 4 5 Mean

34 1

of’ Correct Anagram Solutions over the Four Days According

to

ofAnagram Mean

Day I

I

3

4

0.70 0.31 0.16 0.29 0.30 0.31

0.70 0.28 0.28 0.34 0.50 0.35

0.90 0.41 0.46 0.45 0.68 0.51

1.00 0.57 0.64 0.66 0.78 0.68

0.82 0.39 0.38 0.43 0.56 0.46

Note that in all cases the issue of whether or not a particular anagram was solved correctly in its entirety is not relevant. Rather, individual letters and groups of letters were searched to determine the extent to which sub-elements of the grammar were known. These analyses were carried out in this manner because of the difficulty of determining just what an error is when it occurs. Since the subjects were constrained by having to use all the letters given to them on each trial, any single mislocation forces at least one additional error and a single such inopportune letter placement can actually render every other letter placement an erroneous one (if, for example, the initial letter is transported to the terminal position). Nevertheless, the analyses performed here are actually quite conservative, especially so since only .46 of the total sample represents completely correct solutions. Moreover, as will become apparent, these analyses provide considerable insight into the manner of apprehension of the rules of this synthetic language. (a) Single letters: First note that according to the grammar outlined in Figure 1 the five letters that make up the “vocabulary” of the language have the following positional restrictions: in initial position only P and T may occur, in the terminal position only S and V, X is strictly an internal symbol, in the second position all letters except P may occur, and in the next-to-last position all except S and T may occur. Table 4 shows the proportions of single letter placements for the two initial and the two terminal positions. These data include, of course, only trials where the position under consideration was not cued. The overall level of “appropriateness” (P,) here is an impressive 0.924. Note, however, that this value doesn’t necessarily reflect the subjects’ pure knowledge of these

342

Arthur S. Reber and Selmo I,ewis

Table 4

Position

First Second Next-to-Last Last

Proportions with which Individual Letters were Ofj’ered in Critical Positions in Anagrum Solutions Letter

Used”

P

s

T

C’

0.394 0.035b 0.226 0.023b

0.014b 0.237 0.02gb 0.343

0.578 0.249 0.098b 0.038b

0.012b 0.232 0.503 0.545 Overall

P,

CorrectedC P’a

0.973 0.965 0.873 0.888 0.924

0.954 0.776 0.683 0.817 0.807

X O.OOlb 0.247 0.143 0.051b proportions:

aNumber of cases in this analysis is as follows: first position = 840, second position = 1104, next-tolast position = 1115, last position = 840. %onpermissible letter positions as given by the grammar in I;&. 1. ‘The formula used for correction for guessing was PA = Pa - Pa,,/1 - P,g. where Pa = probability of an appropriate lcttcr placement and P,,g = probability of an appropriate guess. P,, values were derived directly from the proportion of possible appropriate letters in each anagram problem. I-or example, given the letter set I’PTTVVVX the value of P,, for the first position would be 0.375 since of the seven letters four arc permissible in that location.

restrictions. The values here need to be adjusted by possible correct guessing since not all letters appear equally often in the anagram problems. Thus, the last column in the table gives a more accurate assessment of what our subjects actually “know” ~ a still respectable 0.807 appropriate, Pi. Knowledge of the restrictions for the initial position is clearly superior to that of the terminal position and the second letter restrictions seem to be equally better known than the next-to-last letter restrictions. Perhaps this is why the cue effect discussed above turned out the way it did. The final position cue provides important information to the subjects, the initial position cue provides only information that the subject already possesses. (b) Bigrams: Examination of the grammar shows that there are four acceptable initial bigrams (PT, PV, TS, TX) and three acceptable terminal bigrams (PS, VV, XS). The left half of Table 5 shows that the general level of appropriateness of bigram placement in these positions is also quite high. Note that even after the correction for guessing these values are inflated. In these analyses the cued trials were included so that on slightly more than one half of the trials one of the two letters was provided for the subject. It should be noted, however, that there are a total of 25 possible bigrams in the letter set (see Appendix B) so that the possibility of a correct guess is not large, as the corrected values (PL) in Table 5 show. Again, note that the subjects seem to know more about initial letter sequences than about those which can terminate an acceptable letter string.

Implicit learning 343

Table 5.

Proportions with which Acceptable B&rams and Trigrams Were Used in their Appropriate Initial and Terminal Positions

Position

Bigrams P,

Trigrams Corrected

P,

PA Initial Terminal Overall:

0.819 0.705 0.762

0.782 0.661 0.722

Corrected PA

0.641 0.609 0.625

0.616 0.588 0.602

(c) Trigrams: The final extension of this form of analysis is with three letter sequences. This analysis accounts for all letter placements in 4-, 5-, and 6-letter anagrams and all but the most internal one and two letters in the 7- and &letter anagrams respectively. The right half of Table 5 gives the uncorrected and corrected proportions of appropriateness. Note that although the overall level of performance diminishes somewhat there are only six acceptable initial trigrams and four acceptable terminal trigrams out of a possible 125 three letter sequences. (d) Recursions: A similar kind of analysis was carried out on the other salient aspects of the language, the recursions or loops. Table 6 shows the relative fates on multiple S’s and multiple T’s. These are the only letters that can occur more than twice in a row. A T- or S-cycle was classified as “correct” whenever it was placed in its proper location, independent of whether or not the rest of the item was correct or not; it was called ‘displaced’ if the sequence of T’s or S’s was intact but the entire series was in an erroneous position, and it was categorized as ‘broken’ when there were intervening letters. Although subjects are clearly sensitive to both cyclicity and location (0.599 of the placements being correct) the fact that on fully a third of the trials the integrity of the series was disrupted by interposing other letters shows that subjects’ knowledge of these single letter loops is not as complete as that of the initial and terminal letter sequences. In summary, in terms of the proffered solutions, subjects are quite knowledgeable about positional restrictions, about the sequences of letters that may initiate and terminate strings, and about recursions. Overall, they are slightly more often appropriate with the beginnings of letter strings than with the ends, and are more knowledgeable about the anchor positions than about the more internal positions. However, as we point out below, post-experimental interviews and introspective reports from the subjects

Table 6

Proportions ivitll which Particular

Uses ,zvre Made of the T- atd

S-cycles

in

Atlagram Sohrtioris

s T Xlean

0.662 0.506 0.559

0.075 0.131 0.112

0.262 0.362 0.329

revealed that this knowledge is not necessarily represented in an awareness, at least not in an awareness that can be formally expressed. The relationship between tacit knowledge which emerges in behavior and the explicitness of that knowledge is an issue of some theoretical importance but one that has received precious little attention from cognitive psychologists. We shall pursue it at some length later.

The preceding analyses leave open an important theoretical issue: is the degree of learning observed due to simple anchor effects or is it due to more sophisticated processing of information about relational invariances? The data presented so far are ambiguous on this issue since the position of a letter in a string is largely confounded with invariance, and the relatively poorer learning of the constraints for the internal positions could be due either to their location or to the fact that essentially any letter may occur in an internal position. The issue basically is, how salient is location of a letter or letter group for the subject relative to letter-to-letter invariance patterns independent of location? To resolve this issue we examined all of the bigrams produced by our subjects, a total of 6560. In their introspective reports our subjects frequently mentioned bigram patterns as particularly salient and relatively codeable; they therefore seemed the best place to look for critical evidence. The frequency with which each of the 25 possible bigrams occurred was recorded and then corrected for the likelihood of assembling it by chance. These adjusted frequencies were then ranked according to use and the ranks were compared with the predictions of two theoretical models. Note that in this analysis the actual location of a bigram in the subject’s attempted solu-

Implicit learning

345

tion was not relevant. We were concerned with the extent to which subjects knew two-letter patterns and thus, where in an anagram a particular bigram occurred was ignored. The results of this analysis are presented in Table 7. There are several things of note here. First, the 16 bigrams which appear in the table are the acceptable ones in the language; the remaining nine all had adjusted frequencies of use lower than the acceptable ones on all days of the study. As the proportion of acceptable bigrams observed values (P,) reflects, these unacceptable bigrams made up less than 10% of the total sample and even on Day 1, only 13%. Clearly, with only minimal exposure to grammatical constraints, subjects learned the bigram patterns of the language to an impressive degree. Second, although there is significant improvement in Pa over the four days of the study O-, < 0.01) the amount of improvement is relatively small, particularly when reflected against the overall improvement in the probability of correctly solving an anagram (I’,) which increased from 0.31 on Day 1 to 0.68 on Day 4 (see Table 2). The implication is that bigram invariances are learned quickly and well, but learning the overall positional configurations of the bigrams emerges more slowly. By Day 1 subjects know almost all they will learn about bigram patterns that are permissible, the remainder of the time is given over to learning about locations for these bigrams. Finally, there is the theoretical issue that prompted this analysis, the relationship between the salience of bigram invariance and the salience of the anchor positions. Two models of bigram use were developed, one based primarily upon bigram invariance and one based primarily upon bigram position. The invariance model predicts a rank order of bigram use on the basis of the frequency with which each possible bigram can occur in the language within the length range used. Predicted ties, however, were broken by invoking an anchor criterion, e.g., PX receives predicted Rank 8 and XV Rank 9 even though both have the same frequency of occurrence because PX always occurs in a terminal position while XV is purely an internal bigram. (See the Appendices for the full set of acceptable strings in the language, the 16 permissible bigrams, their frequency and their location characteristics.) The anchor model predicts ranks on the basis of initial bigrams first, terminal bigrams next, and internal bigrams last. Predicted ties were broken by invoking a frequency rule, e.g, the four purely initial bigrams, TS, PT, TX, and PV received predicted Ranks 1, 2, 3, and 4 respectively on the grounds that TS has the highest frequency of the four, PT next, and so forth. Thus, each model incorporates, to a minor degree, some aspects of the other. Goodness-of-fit tests were carried out on the adjusted ranks for each day’s anagrams separately and for the total. These values are presented as correla-

346

Arthur S. Reber and Selma Lewis

Table 7.

B&ram

vv TT xx TV TX VP ss TS PT I’S I’V SX XT xv xs PX

Rank Order of Use of the Acceptable Models Location

Terminal Internal Internal Internal Initial lntcrnal Internal Initial Initial Terminal Initial lntcrnal lntcrnal Internal Terminal Internal P, = rs (Invariance) = rs (Anchor)=

B&rams and Predictions of the Two

Day I

2

3

4

2 4 3 I 5 12 I 8 10 13 6 11 15 16 14 9

2 1 3 4 5 9 7 10 6 12 8 11 15 13 16 14

1 2 3 4 5 6 8 IO 11 9 12 13 I 14 15 16

1 2 4 3 I 5 11 8 12 6 13 10 9 14 15 16

Total

Imariance Model

Anchor Model

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

2 3 5 I 14 4 6 I 11 8 15 10 12 9 16 13

5 9 11 8 3 10 12 I 2 6 4 14 15 13 7 16

0.87 0.44

0.89 0.58

0.90 0.68

0.92 0.77

0.90 0.72

0.23

0.29

0.09

0.12

0.29

tion coefficients in Table 7. In all cases the invariance model provides a better fit than the anchor model. The relationships that emerge between bigram invariance and location, however, are complex. Note that increasing congruence between the invariance model and the obtained ranks is expected since as the likelihood of a correct anagram solutions approaches 1.0, the value of r, for the invariance model approaches 1.0 and the value of rs for the anchor model approaches -0.16. Thus, the interesting comparisons are those between the fits of each model, the likelihood of an appropriate bigram (P,), and the probability of a correct anagram solution (P,). What emerges is clear in one respect: bigram invariance is generally more salient than bigram position. On Day 1, for example, three of the four top ranked bigrams are purely internal ones (TV, TT, XX) and two of the four lowest ranked are terminal ones (XS, PS). What is happening here is that subjects know which letters go together to form acceptable pairs (P, = 0.87) but most of what they know is about those bigrams with high relational

Implicit learning 347

invariances relatively independent of location. The overall level of performance on Day 1, however, is relatively low (P, = 0.31) and this is due primarily to the clustering of letters which conform to bigram patterns without much regard to location. Take two typical instances: (a) the subject was given the letter set TTVVXX without a cue, the correct solution here is TXXTVV but the subject offered TXTVXV. Note that the TX is an acceptable initial bigram, of the other four bigrams, three are acceptable internal letter pairs (XT, TV, XV) and only one is unacceptable (VX). (b) The subject was given SSSTVVXX, the correct solution is TSSSXXVV, and the attempted solution was TVVSSSXX. Here five of the seven bigrams are acceptable internal ones (TV, SS, SS, SX, XX), one is an acceptable terminal bigram (VV) and one is unacceptable (VS). These examples are very typical of observed attempted solutions and reflect the patterns that are in Table 7. The most invariant letter pairs (e.g., TV) are likely to appear almost anywhere, common anchor pairs (e.g., VV) often occur in nonanchor positions, and some anchor bigrams appear more frequently than the invariance model predicts (e.g., TX). Overall, the data reflect a strong adherence to bigram invariance (r, = 0.72) and only marginal concordance with the notion that anchor positions grant bigrams any particular salience (Y, = 0.29). Note that these overall correlations are performed on anagrams that were only solved correctly 46% of the time so that the fit of the invariance model is not “forced.” Finally, and perhaps most importantly for our general theoretical stance with regard to implicit learning, we should emphasize that these models were developed on the basis of the bigram patterns that can be generated by the grammar within the representative lengths and not on the 15 exemplars used during learning. This point is important for our argument that subjects are acquiring an abstract representation and not merely responding on the basis of memorized instances. In actuality the 15 exemplars used during learning display a very different bigram frequency pattern than the full sample. This can be seen by examining the items with asterisks in Appendix A. Models based upon the bigrams actually used during learning do not correlate in any way with the observed ranks. The invariance model correlates .04 with overall use, and on Day 1 when the impact of the 15 exemplars is freshest it correlates -0.09. The restricted anchor model is equally poor, correlating -0.03 overall and 0.02 on Day 1. Clearly subjects are learning the overall structural relations that hold between letters and letter pairs and not merely logging frequencies. Whatever they learned from the representative letter groups of the learning stimuli is applied in a coherent, abstract form to the anagram task.

338

Arthur

S. Reber ard Seha

I,mG

One other important theoretical issue raised by our earlier work is the extent to which subjects induce nonrepresentational rules. As we have shown elsewhere (Reber, 1976), when implicit, neutral instructions for learning are used, nonrepresentative inductions are rare. Although subjects do not learn the rule system in its entirety, what they do acquire is a subset of the lawful invariances as reflected in the exemplars. However, when encouraged to undertake explicit rule searches, nonrepresentative elaborations become quite common; subjects develop rules which are not systematically reflected in the structure of the language. Thus, the manner in which the task is approached influences the mode of learning that a subject will use and the nature of the tacit abstractions that are formed. To generalize on these and other findings (Miller, 1967; Lewis and Reber, in preparation: Reber and Kassin. in preparation,) it seems clear that whenever subjects are in a problem solving mode whereby they must explicitly learn a complex set of rules and/or use those rules to generate novel rule-governed stimuli they will show nonrepresentativeness (prior, of course, to perfect learning of the system, see Miller, 1967). In this context the procedures used in this experiment take on added importance. The original acquisition procedure used on Day 1 was one which encourages the abstraction of representative rules, but the anagram solution task is a step in the direction of generation of novel strings. Since there is some feedback in the form of letter position cues, the possibility exists that our subjects may construct letter sequence rules which are not reflective of the underlying structure. Although the preceding analyses suggest that this is not the case, a definitive answer to this question is as difficult to achieve as it is important. Note that one cannot simply peruse subjects’ protocols for erroneous consistencies without a quantitative estimate of how frequently such consistencies could be expected by chance. A variety of possibilities exist for obtaining such an estimate; the one we elected to use was a variation on the standard two-element model (see Atkinson, Bower, and Crothers, 1965). We assume that on any given trial and for any given item a subject may be in any of three states with regard to the degree of knowledge possessed about the generative rules for that item: a state of total ignorance (@), a state of partial knowledge @), or a completely knowledgeable state (k). The transition matrix, the initial vector, and the expected probability of a correct solution (P,) for each state are all given in Table 8. The primary difference between our model and the standard two-element model is in the initial vector which allows entry into States li and p on the first trial. The learning phase of the experiment mandates such an extension of the model.

Implicit

Table 8.

Transition Matrix, Initial

Vector, and Parameter

learning

349

Estimates

for the Model ‘I)-ansition Matrix Trial n +

Trial II

k p @

Initial vector

state

-

k P @

Probability

1

k

P

@

1.0 oi 0

0 1 --01

0 0 1-s

P

PC

Parameter

estimates

Paramctcr

value

cy =0.13 p = 0.81 h = 0.32

It must be recognized that we do not present this model as a theoretically valid characterization of the behavior of our subjects in this experiment. There are several issues involved here. First, the assumption that all items be treated equally must be wrong given the analyses reported earlier on PC as a function of item length, item type, and cue. Second, the PC values over the four days of the experiment do not reveal the type of growth curve which the model predicts and are, in any event, a wretchedly small sample for the fitting of such curves. Finally, the failure for all subjects to obtain the absorption state, k, for all items means that a sufficient statistic for the estimation of the p parameter does not exist. Nevertheless, the model has the redeeming virtue that it permits specific predictions about sequential statistics, specifically the frequency with which particular patterns of correct and erroneous solutions on specific items are to be expected. These statistics can then be used to evaluate the degree of nonrepresentativeness of subjects’ knowledge of rules quite independently of any other attribute of the model. The three parameters of the model were estimated by an empirical least squares technique; the values are also given in Table 8. The sequential statistics are presented in Table 9. The fit of the model is not bad, even given the questionable assumptions which went into its construction. The most aberrant cases are the CECC and CEEC sequences which are off by a factor of two and EECC which is 50% larger than predicted. But, importantly,

350

Arthur S. Rcbcr and Sclma I,ewis

Table 9.

Predicted and Obserwd Proportiom for the Scyuential Statistics. Each 4-tuplc is tile Pattcrrr of Correct and I+roneous Solutiom Offered by Each Subject for Awry Item in the Anagram Task. Total Sample SiTe = 280

0.159 0.005 0.008 0.023 0.108 0.016 0.026 0.090 0.017 0.016 0.028 0.056 0.061 0.061 0.109 0.217

0.146 0.004 0.004 0.050 0.139 0.007 0.054 0.132 0.000 0.007 0.014 0.039 0.036 0.032 0.139 0.196

the number of instances of items solved incorrectly on four successive days (EEEE) is well within that predicted. The logic of the analysis is quite simple: the effect of the existence of nonrepresentative rules in subjects’ tacit knowledge space would be to inflate the proportion of items with this particular sequence. Hence, we conclude that there is no evidence of systematic nonrepresentativeness in the use of rules in this experiment.

The basic data here are presented in Table 10 which gives the total number of acceptable (G) and unacceptable (NG) responses to letter strings that were either G or NC. The overall level of performance here is quite high and consistently so. The poorest subject correctly identified the grammatical status of items on 0.75 of the trials, the best subject on 0.886; the group mean was 0.807. This value is high, particularly so considering that all of the NG items contained only single-letter violations of grammatical structure whereas in previous research using this same grammar items with multiple violations were among the test stimuli (see Reber, 1967, 1976). As we pointed out elsewhere (Reber, 1967), the probability of a correct rejection increases with the number of violations of the rule system.

Implicit learning

Table

10.

Total Number

of G and NC Responses to G and NG items on the Discrimina-

tion of Well-fomledness Item

35 1

Task

Response

Total

G

NG

G NC;

199 64

21 156

220 220

Total

263

111

440

Table 11 reveals some of the details of subjects’ responses during this task. There are some interesting differences between these data and those from the anagram task. First, note the total lack of any effect of item length on this task. This finding, which is not new to these studies, seems to indicate a kind of Gestalt-like apprehension of items where they, in the words of many of our subjects, “just felt right (or wrong)” independent of length. Second, the type of G item also fails to produce any reliable effect. We suspect that the small number of trials used and a ceiling effect may be combining to mask differences here. Previous work has found that Type 1 items are usually identified as acceptable more often than other types, and they were also solved much more often during the anagram task. Several aspects of the NG items are of interest. The highest probability of rejecting an item with a violation surprisingly occurs when the terminal letter grammatical constraints are not fulfilled, despite the fact that the anagram data show that subjects are somewhat less keenly knowledgeable of these final position constraints than they are of those for initial letter position (see Table 4). Also note that the overwhelming majority of errors on this task occurs when an item is rendered unacceptable by virtue of a single internal position violation. This result replicates earlier findings and reflects similar difficulties in solution of the anagrams. Finally, as Table 10 reveals, there is an overall bias for responding G which appears in spite of instructions to the subjects about the equal proportions of G and NG letter strings. We view this bias, which has not appeared in other work, as reflecting a criterion shift produced by the four days of working with anagrams. It seems reasonable to assume that during such intensive exposure which occurred without feedback that the tacit representation of the grammatical rules “blurred” a bit causing subjects to frequently respond G to subtly unacceptable letter strings. The fact that the bias is clearly centered in the tendency to accept strings with violations in internal positions--the area of maximum grammatical flexibility and poorest knowledgesupports this interpretation.

352

Arthur S. Rebcr and Scbna Ixwis

Table 1 1, Details of the Discrimination G

Taska

Items

Length

4 5 6 I 8

of Well-j?vmedncss

N

10 30 30 50 100

P,

Typ

N

P,

0.90 0.93 1.00 0.84 0.90

1 2 3 4 5

20 70 30 60 40

1.00 0.84 0.97 0.88 0.95

First Second Next-to-last Last Internal

aEffcct of length and item type arc given for G items; effect items. N is the number of instances in each condition.

30 40 30 30 90 of length

0.80 0.82 0.11 0.90 0.54

4 s 6 7 8

and position

10 30 30 50 100 of violation

0.80 0.63 0.73 0.72 0.71 for NC

We present two types of analyses here. The first is made up of the reasons subjects gave for classifying items as G or NG during the discrimination task. The second comes from post-experimental essays. For obvious reasons the data from the discrimination of well-formedness task consist entirely of reasons for an item’s NC classification (try to explain why a sentence wus grammatical). Out of the total of 177 NG responses, subjects offered reasons for 133 of them. Of these verbalized cases they were correct in rejecting 122 of them (the other 11 cases were actually well-formed) but on 38 of these 122 the reason given was the wrong one. Thus, on a sizeable minority of instances (31%) an item was correctly rejected but for irrelevant reasons; correctly rejecting PTVXTTVV but doing so because “two T’s cannot Fi&w an X” which Figure 1 shows is a legitimate sequence, or VSSXXVV “two S’s cannot follow a V” which, while certainly correct, because somehow misses the point that the initial V is the source of nongrammaticality. These issues are extremely complex and given the structure of the grammar it is difficult to draw firm conclusions about what subjects know either implicitly or explicitly. The primary point here is that on 93 of the 177 NC responses our subjects were either unable to provide a formal rationale for their choice, or they provided an irrelevant one. In short, they know when something is amiss but they are not very good at identifying the source. In postexperimental assays our subjects confirmed what earlier analyses implied: virtually every subject volunteered the correct anchor letters and several knew the permissible first and last two-letter sequences. A number of sequences were frequently identified as salient such as the terminal VV and VPS, the S- and T-loops, and not surprisingly, the most frequent internal

Implicit learning 353

bigram TV. However. these reports were replete with statements that conformed with neither the rules of the grammar nor the subject’s own behavior. Perhaps the most interesting aspect of these reports is not their incompleteness, since verbal responses of this kind are notorious for omissions, but rather the confident affirmations of the use of rules which in fact were never used.

A case study What follows here is a short analysis of a representative subject which describes some of these characteristics. This subject was, in fact, our most average subject. His overall probability of a correct anagram solution was 0.455 (5 1 out of 112) and his probability of a correct decision on the test of well-formedness was 0.8 18 (36 out of 44). This section cannot really be taken as a definitive analysis of the operations of an implicit learner, rather it is presented here as illustrative of the kind of performance which we have watched evolve in the laboratory innumerable times and which is often difficult if not impossible to convey by the presentation of tables, figures, and analyses of variance. This subject, in addition to identifying T and P as permissible initial letters, also claimed that X could begin a string. A search of his anagram solution attempts, however, showed that he never once began a solution with an X and moreover, he correctly rejected the one NG item that began with X on the discrimination task. Similarly, while correctly citing S and V as terminal letters, he also put down P and T as acceptable. Here his behavior is not quite so far removed from his consciousness for he did use P as a terminal letter on three occasions and T on six. Nevertheless, during discrimination he correctly classified as NG the two items whose only violations were a terminal P and a terminal T. He also identified seven salient letter sequences: TX and TSXXX as beginnings, VPS, XXVV, and XXV as endings, and PX and TV as internals. Of these, TSXXX and XXV cannot occur in the positions identified; three X’s in a row never occurs, and not one single problem ever contained more than two (although there was one learning item with three). Moreover, of the 48 occasions where it was possible to construct a sequence ending in XXV he did so on only five. The XXV sequence is, however, a common internal trigram and he used it appropriately in 38 of the remaining cases. The PX and TV bigrams which were part of the correct solution on 20 and 76 anagrams respectively were used by him only eight and 31 times each, and of these nine were cases where one of the letters was cued. The claimed salience of the VPS and XXVV endings was

354

Arthur S. Rebcr and Sehu

Ltwis

somewhat more appropriate; of the 36 and 12 anagrams where these are the proper endings he used them on 2 1 and 11 respectively, although on 12 of these trials the terminal letter was cued. Certainly one of the more amusing aspects of running this experiment was the reaction of the subjects during debriefing. This part of the procedure, invariably at the subject’s request, consisted of perusing their attetnpted solutions. Comparisons of their behavior with their claims about what they were doing produced uniform expressions of incredulity. This particular subject simply could not believe, for example, that there were no anagram problems with three X’s but that there were 24 with three or more S’s and T’s. His disbelief covered both the actual rules of the grammar and, more importantly in our view, his own quite systematic behavior. We don’t wish to imply that this subject is completely noncognizant of systematicity. He is not; he actually can tell us quite a bit about the structure of the language, more than the typical subjects in earlier experiments could (Reber, 1967). The point we want to stress here is that whatever it is that he knows explicitly about structure lags behind what he knows implicitly. Indeed, one comment from this subject is revealing. He told us that in solving anagrams he would place down a couple of letters that he thought he knew were right and then he would move the remaining ones around until he hit upon a configuration that “felt right”.

Discussion Behavior which regular and in concert with complex rules in the absence of awareness of both the regularities of the system and of one’s own behavior is the sine qua non of implicitly acquired knowledge systems. This statement, however, is hardly a revelation. Were it not manifestly true epistemology, psycholinguistics, perception, if not nearly all of what is interesting about cognitive psychology, would consist of little more than the construction of questionaires and neo-Titchenerian introspective reports. What we have tried to do in this paper that is unique and we believe represents a proper stance to adopt with respect to these issues, is to analyze a small system of tacit knowledge which was acquired in the laboratory and was not part of the accumulated body of abstract knowledge that the typical subject possesses. Even within this restricted empirical domain we can begin to make contact, gently, with some classic epistemological issues. The data allow us to look at three of them: First, what is the nature of this implicit acquisition process? Second, what is the form and structure of the tacit knowledge so

Implicit learning 355

apprehended? Third, what is the relationship between a body of tacit knowledge and the conscious evaluation of that knowledge? The data from this study provide a form within which to approach these issues both empirically and theoretically. Let us take them in turn. I. On the nuture of implicit lcurrling Here we must recognize that our characterizations are limited to an understanding of the boundary conditions under which the learning occurs. That is, we cannot as yet say very much about the ‘nature’ of the processes, we can proceed primarily by carefully circumscribing the nature of the conditions which set the stage for its occurrence. The data reported here, along with previous findings, support the conclusion that implicit learning emerges as an effective mode for apprehending complex structure in the absence of conscious cognitive efforts to learn. In short, implicit learning is a naturally occurring, unconscious, cognitive act, an automatic process of a human mind operating in any complex environment with a rich underlying structure with which it must interact. During the four days of extended practice solving anagrams our subjects, without feedback concerning their attempted solutions, reached a level of apprehension of the structure of our synthetic language that we consider quite remarkable. By Day 4 they were solving 68% of the problems and were capable of detecting violations of structure over 80% of the time. This latter figure for discrimination of well-formedness compares with values ranging from 68% to 79% in our earlier experiments with the same language even though the task here was considerably more difficult and subtler. The high level of performance here could be trivially due to the cuing and/or the daily representation of the original exemplars. We cannot rule out the contribution of these factors but we doubt that they can account for much. Elsewhere (Lewis, 1975; Lewis and Reber, in preparation), we explored the effects of extended observation of exemplars and of specific information about stimulus structure and found poorer apprehension of structure than occurred here. It seems doubtful (at least to us who have developed our own implicit understanding of implicit learning) that such minimal information as that provided by single letter cues and a bare 75 sec. of daily observation could have been effective without the act of attempting to solve the anagrams; an act of literally playing about with pattern and structure which immerses the subject in the system and allows for richer abstractions and gradual consolidation of the regularities of permissible letter strings. Moreover, the introspective reports from our subjects tended to underplay the role of the cues and the daily refreshment and focus instead on what they often called a ‘vague’ kind of understanding and sense of familiarity with the system.

3.56 Arthur S. Rebcr uwd Schu

I.cwis

We argue that any procedure which steeps the neutral subject in a structured environment will produce (at least partially) apprehension of that structure. The picture of that apprehension process which is emerging is a complex one and at least two distinct components seem to be implicated. First, there is a general differentiation-like process similar to that proposed by Gibson and Gibson (1955) and others (e.g., Mace, 1974: Shaw and McIntyre, 1974). According to this view the most salient cues, the ones with the highest invariances are abstracted out first with the less salient, less invariant relations following. The data from the fine-grain analyses of the anagram solution task are consistent with this view although the relationship between salience, invariance, and position were found to be rather complex. Basically, bigrams which have high relational invariances were learned early and well with the lower invariant letter pairs learned less well. Letter sequences with high co-occurrence such as the loops are fairly well apprehended. Positional factors, however, interact with ‘raw’ invariance such that initial letters and letter sequences tend to be learned better than terminal letters and letter sequences, and both are learned better than internal letters and letter sequences. Second, the characterization of implicit learning also requires a Gestaltlike, global apprehension process whereby a configurational representation of the system is set up. We take, (a) the common response during the test of well-formedness that a particular item “just felt right (or wrong)” and, (b) the fact that the length of a letter string was not an effective variable during the discrimination test as evidence for the existence of this component. Admittedly, this latter result is only suggestive. In a forthcoming paper (Reber and Kassin, in preparation) much stronger support for this configurational process is provided by the finding that decision times are similarly unaffected by the length of a test item. Further support for the simultaneous operation of these two processes comes from a study by Jones (197 1). She systematically manipulated higherorder (configurational) and lower-order (differentiational) relations in a temporal pattern learning task and found strong evidence that subjects are sensitive to both levels even though the manipulations on one level were independent from the task demands on the other level. Specifically, when information was requested of subjects about lower-order sequences whose symbol-to-symbol relationships were unaffected by the higher-order manipulations they showed relatively poor performance as compared with conditions when the configurational relations were left intact. Surprisingly, there are very few empirical studies which reveal the simultaneous operations of these two components of complex pattern learning.

Implicit learning 3 57

In this regard it is worth noting that the manner in which our subjects’ knowledge was evaluated seems to have an impact upon the collection of data which reveal the operation of one component or the other. The anagram task provided information that is most easily viewed as elemental, the well-formedness task information that appears more configurational. More careful and elaborate testing procedures than those commonly used may be needed to fully represent the dual nature of implicit learning. 2. On the fbrm of tucit knowledge This issue is clearly derivational and a corollary of the above. Since we assume that implicit acquisition is a process of elaborating a representative mapping of the deep structural relations in the stimulus environment then, perforce, the form of the mental code must be isomorphic with that environment. The data strongly support this contention. Our subjects induced systems which reflected what was “out there” in the structured letter strings; their mental mirrors may have had holes in them but they did not have warps. Errors are a result of incomplete structures, not of nonrepresentative ones. Inappropriate constructions by our subjects are either so rare or so labile that they play essentially no part in the effort to solve the anagrams. We view this failure to find evidence for nonrepresentative rule elaborations in this partially ‘generative’ task as an important extension of our earlier research. As is discussed in the next section, we suspect that this assumed isomorphism between the mental representation and the intrinsic structure of the stimulus is quite abstract. Indeed, the depth of abstractness may well pertain to both the stimulus domain as well as the mental and the isomorphism is most assuredly not a simple one-to-one mapping. 3. 011 the relation between the tucit and the explicit Note that the preceding statements about tacit knowledge are meant only to pertain to the internal representations and not to efforts to give conscious expression to them. The data show that a subjects’ anagram solutions are a far better guide to the form of his implicit knowledge than any verbal expression or rationale he may provide for his responses. Our case study of an individual subject clearly reflects this ordering. In fact, several strong epistemological arguments can be made that the role of consciousness in implicit learning, in either the apprehension process or in attempts to formalize what has been apprehended, is artificial and, except in rare cases such as persons like ourselves who insist upon browbeating our subjects, rather bcside the point*. *See overleaf

for footnote.

358

Arthur S. Reber ard Scbna Lewis

Nevertheless, our subjects can tell us, if prodded sufficiently, quite a bit about their symbolic structures. Indeed, the introspective reports in this study are far more complete and accurate than those found in earlier work. It is perhaps not a coincidence that as the level of performance increases, so does the appropriateness and thoroughness of the subjects’ verbal reports (although even here they still lag behind actual performance by a considerable margin). Our concern here is similar to an issue often raised in epistemology: can there be any assurance that explicit conscious report, even when it is accurate and relatively complete, is a valid reflection of the underlying representation. Pylyshyn (1973) addresses this problem as it pertains to imagery and concludes in the negative. He takes pains to point out that imagery is phenomenologically real and that reports of imagery are, in a sense, valid reflections of that phenomenological space. But he argues that the image as experienced is not a valid characterization of the deep memorial representation, which is best conceptualized as an abstract system conceptual and propositional in nature. We concur with at least the general framework of his analysis and would extend it to our implicit learning paradigm. We would argue (along with others, e.g., Turvey, 1974) that the operation of making tacit knowledge explicit, the act of giving it verbal form, is essentially a constructivist exercise wherein the deep abstract knowledge is mapped through a linguistic output system. The relationship between the tacit knowledge and the explication of it by its possessor then is likely to be understood only when we can both model the underlying abstract representation and characterize the manner in which such representations are mapped through verbalization. It is worth recalling an earlier study here (Reber, 1969), where it was shown that subjects could be shifted from one synthetic language to another during the course of learning without showing any deficit in performance, provided that the underlying abstract representations of the two languages were identical. This transfer occurred with little specific conscious knowledge of the relationships that were intrinsic to both. When the shift was to a language with a different underlying representation, then the learner was effectively “back at square one.” It did not matter that the surface manifestation of the second language was different from the first, that a different set of symbols was used; so long as the underlying deep relations were identical, there was transfer. Clearly, quite independent of explicit knowledge of the system(s), the deep abstract representations are sufficient to monitor performance. *SW, for example, Ilayek (1962) who makes a surprisingly effective such deep tacit knowlcdpc is stored in a meta-physical domain dubbed irretrievably beyond the operations of consciousness.

case for the proposition that the “supraconscious” and is

Implicit learning

3 59

It is easy, and not unreasonable, to argue that the deep acquisition system is the primary one when dealing with complex environments, and that it is operating quite naturally and undetected in any such structured situation. It is the verbal expression of the deep knowledge that presents our subjects with problems, and to be sure, presents us with one of the more intractible problems of this research. In summary, we view the data from this experiment combined with earlier findings as important in three respects: they extend previously collected data supporting the unconscious nature of implicit learning of a complex, synthetic language; they represent an empirical forum which has allowed us to extend our epistemological analysis of the acquisition and memorial representation of tacit knowledge; and they underscore the need for the analysis of novel systems, the structure of which is not part of the cognitive-perceptual armamentorium of the subject when he enters the learning situation.

References Atkinson, R. C., Bower, G. H., and Crothers, E. J. (1965) An introduction to mathematicallearning theory, New York, Wiley, pp. 252-261. Brooks, L. R. Nonanalytic concept formation and memory for instances. Paper presented at the 1976 SSRC Conference on Human Categorization. Chomsky, N. and Miller, G. A. (1958) Finite state languages.Info. Control, I, 91-112. learning: Differenti.jtion or enrichment? Psychol. Gibson, E. J. and Gibson, J. J. (1955) Perceptual Rev., 62, 3242. Hayek, F. A. von (1962) Rules perception, and intelligibility. Proc. Brit. Acad., 48, 321-344. Jones, M. R. (1971) Effects of context on lower-order rule learning in sequential prediction. J. exper. Psycho/., 91, 1033109. Lewis, S. Implicit and explicit learning of artificial languages. Unpublished doctoral dissertation, City University of New York, 1975. Lewis, S. and Reber, A. S. Implicit and explicit aspects in the learning of an artificial language. Manuscript in preparation. Mace, W. M. (1974) Ecologically stimulating cognitive psychology: Gibsonian perspectives. In W. B. Weimer and D. S. Palermo (Eds.), Cognition and the symbolic processes. Hillsdale, N. J., LEA Press. Miller, G. A. (1958) Free recall of redundant strings of letters. J. exper. Psychol., 56, 485491. Pylyshyn, Z. W. (1973) What the mind’s eye tells the mind’s brain: A critique of mental imagery. Psychol. Bul., 80, l-24. Reber, A. S. (1976) Implicit learning of artificial grammars. J. verb. Learn. verb. Beh., 6, 855-863. Reber, A. S. (1969) Transfer of syntactic structures in simple artificial languages. J. exp. Psychol., 81, 115-119. Reber, A. S. (1976) Implicit learning of synthetic languages: The role of instructional set. J. exp. Psychol. Hum. Learn. Mem., 2, 88-94. Reber, A. S. and Kassin, S. Implicit learning of an artificial language: Structural salience and instructions to learn interact. Manuscript in preparation.

360

Arthur S. Reher and Selma I,ewis

Shaw, K. and McIntyre, M. (1974) Algoristic foundations to cqmitivc psychology. In W. B. Weimer and D. S. Palermo (eds.), Cognition and the symbolic processes. Hillsdale, N. J., LI;A Press. Turvcy, M. T. (1974) Constructive theory, perceptual systems, and tacit knowledge. In W. B. Weimer and D. S. Palcrmo (eds.), Cognition and the symbolic processes. flillsdale, N. J., LEA Press. Weimer, W. B. and Palcrmo, D. S. (Fds.) (1974) Cognition and the symbolic processes. Ilillsdalc, N.J ., LEA Press.

Appendix A

The grammar in Figure 1 generates the following 43 strings of letters of’ lengths 3 through 8. An * signifies that that item was used as one of the 15 exemplars for the learning phase, the remaining were used in the anagram solution task.

Length 3

Length 4

Length 5

Length 6

Length 7

Length 8

*pvv *TXS

PTVV PVPS *TSXS

PTTVV *PTVPS TSSXS TXXVV

*PTTTVV PTTVPS LPVPXVV *TSSSXS TSXXVV TXXTVV TXXVPS

PTTTTVV *PTTTVPS PTVPXVV PVPXTVV *PvPxvPs TSSSSXS *TSSXXVV *TSXXVPS TSXXTVV TXXTTVV TXXTVPS

PTTTTTVV PTTTTVPS Pl-rVPXVV PTVPXVPS *PTVPXTVV *PVPXTVPS PVPXTTW *TSSSSSXS TSSSXXVV TSSXXVPS TSSXXTVV TSXXTVPS TSXXTTVV *TXXVPXVV TXXTTVPS TXXTTTVV

Implicit learning

Appendix

B

The grammatical letter strings in Appendix A have the following bigram patterns. Each of the 16 acceptable bigrams is given with the number of occurrences in the lunguage and the location characteristics

Rigram

Fi-eyuenc)j

Characteristics

TV vv TT VP xx

25 23 23 23 16 15 15 14 14 14 12 11 10 9 6

All All AU All All AU All All AU All All All All All All

ss xv TS I’S sx PT XT PX TX xs

36 1

internal; 5 in Positions 2 and 3 terminal; the most frequent ending internal; all part of T-cycle; 17 begin in Position 2 internal; 14 parts of VPS ending; 9 part OY VPX-loop internal; 8 in Positions 2 and 3 internal; all part of S-cycle and begin in Position 2 “pure” internal initial: the most frcqucnt beginning terminal internal; all occur at end of S-cycle initial “pure” internal “pure” internal initial; least frequent beginning terminal; always end Type 1 string

Des sujets apprcnncnt implicitement la structure sowjacente d’un langage artificiel, en m6morisant une suite d’exemplcs reprdsentatifs de ce langage. La forme et la structure de la connaissance du langage qui rdsultc dc cet apprentissage est &al&e et analyke sur une pkiode de quatre jours. Les pro& dures utiliies consistent j: (a) ksoudre des anagrames tirds du langage; (b) dktcrminer si de nouvelles skies de lettres sont bien form&s et (c) fournir des rapports introspectifs dbtaillks. 11 SC d&gage plusicurs propositions importantes sur l’acquisition implicitc d’un sys&me complexe nouveau. Premikrement, la reprksentation en m&moire d’un sys&me structurd est acquisc i travers les opi.rations duales d’un processus de type diffkrentiation fondk sur Its invariantes entre relations et d’un processus configurationnel fondd SW la structure ginCrale. Deuxikmement, la forme de la connaissance tacite est une reprkentation abstraite de la structure intrindque du stimulus. Troisi&mement, tandis que la possibtiitd de rendrc cxplicite ce que I’on sait implicitement croit avec lcs niveaux de performances, I’apprdhension consciente de la structure fiit par apparaitre toujours derrikrc ce qui est su inconsciemment.

Cognirion, @Flsevier

5 (1977) 363 378 Sequoia S.A., I.ausanne

3 I’rintcd

in the Netherlands

Development in the Understanding of Causes of Success and Failure in Verbal Communication*

E. J. ROBINSON Macquarie

and W. P. ROBINSON

University,

Australia

l l

Ahstruct Each child observed (I communication gdmc in which two dolls sent messages to each other so that the listener doll could pick out u matching card. Allocations and justifications of blame were examined as a function of the age of the child, udequac?, of mcssuge, correctness of choice, and *seating position. The results were generalI?% consistent with two propositions. Younger children passed judgments as though the?- were usking themselves whether the message sent was inconsistent with the speaker’s card ~ on111 when this was so was the speaker blamed. Older children blamed the speukcr and cited the inadeyuuc:13 of the message wheucver the message did llot was identilv the speaker’s card uniquc>lJ,, even when communication successful. Introduction In this paper, we examine the young child’s judgments about successful and unsuccessful communications. There is already a substantial literature on age-related differences in communicative performance (e.g., Flavell, Botkin, Fry, Wright & Jarvis, 1968; Krauss & Glucksberg, 1969; Piaget, 1959), and there have been attempts to specify the knowledge and skills necessary for successful performance (Asher, 1976; Asher & Oden, 1976; Flavell et al., op. cit.; Glucksberg, Krauss & Higgins, 1975; Piaget, op. cit.). Our concern is with one of these necessities, namely with knowing that the information requirements of the listener are to be taken into account if communication is to be successful. How does the child come to recognize that messages which do not meet the listener’s requirements may cause communi*We would like to acknowledge the financial support of the Educational Committee of Australia, the cooperation of both the infant/primary Lindfield East and the New South Wales Department of Education. **Now at the University of Bristol, U.K.

Research and Development schools of Eastwood and

364

E J. Robimotl

atul W. I’. Robitlsott

cation failure? In order to study this problem, it may be useful to use techniques which do not focus only on the child’s communicative performance. Our approach is similar to that used by Cl&man, Glcitman & Shipley (1972) to study the young child’s knowledge of linguistic rules. Instead of merely looking at the child’s language production, they asked the child abozrt sentences. The distinction which these authors make between “following rules” and “knowing rules” is one which can usefully be made in connection with communication as well as with Lmguage as such. The need to be aware of this distinction is particularly apparent when one considers the work on the communication skills of very young children. A number of authors have chosen to emphasize the adaptiveness of the communicative behavior (rather than the weakness) of messages of three to five year old children, apparently on the assumption that in the Piagetian scheme these children would be expected to be egocentric (Maratsos, 1973; Menig-Peterson, 1975; Shatz & Gelman, 1973). Evidence that such young children can change their messages as certain characteristics of listeners are varied might be used to conclude that they do understand that the listener’s needs should be taken into account if communication is to be effective; it is not entirely clear whether the authors mentioned wish to draw such a conclusion, since the word ‘understanding’ is not used and ‘egocentrism’ seems to be defined in behavioral rather than conceptual terms. That is, their focus is upon the behavioral symptoms from which Piaget inferred egocentrism or lack of it, rather than upon the inference as such. If we are to understand fully the development of communication skills, we should be able to specify how much can be achieved simply by ‘rule-following’, and at what point one must impute ‘understanding about communication’ to the child. For this reason, it is inappropriate to restrict oneself to observations of communicative performance: other evidence should be used to supplement such observations. We have argued elsewhere (Robinson & Robinson, 1976a; 1976b) that the ‘whose fault’ technique provides a means of probing the child’s understanding of the causes of communication failure. III the ‘whose fault’ technique, the child was required to play a communication game with an experimenter in which each had identical but mutually invisible sets of drawings and in which each took turns to act as ‘speaker’ and ‘listener’. The speaker’s task was to select and then describe a drawing so that the listener could pick the matching one. On trials when the listener chose an incorrect card, the child was asked to explain why things had gone wrong and whose fault this was. In two experiments (Robinson & Robinson 1976a; 1976b) we demonstrated strong relationships between the age of children and their ascriptions of blame for communication failure; younger children (about 5 to 6 years) blamed the listener on the grounds

Understanding Causes of Message Failures and Successes

365

that he had chosen a wrong card, while other children (up to 1 1 years old) blamed the speaker and his message. We argued that these results were consistent with the idea that listener-blaming children were ‘egocentric’: they did not understand that messages could be inadequate. In the studies mentioned, we looked only at judgments about communication failures following messages which were in fact inadequate, in that from the listener’s point of view they did not refer uniquely to the card the speaker had in mind. In a subsequent experiment (Robinson & Robinson 1977a), we asked the child to judge the adequacy of messages which were inadequate but which were followed by the listener’s picking the right card. We found that some children were more likely to judge messages inadequate when the listener chose wrongly (communication failure) than when he chose correctly (communication success). This tendency to be influenced by outcome was less common among older children. In the experiment described below, we extended the range of circumstances presented to the child: he was asked to judge the adequacy of messages which were good as well as of ones which were inadequate, and to do so both when communication was successful and when it failed. He was asked to ascribe blame for failure when this followed good messages as well as inadequate ones. Only one type of message inadequacy was examined: the messages were too general in that both listener and speaker would agree that they referred to the speaker’s chosen card and to one other in the set. We did not use messages which listener or speaker interpreted idiosyncratically nor messages whose inadequacy lay in the fact that they did not refer to the speaker’s chosen card. Our general aim was to look at the relationship between judgments in the different outcome-adequacy conditions, in order to begin to see how understanding of the role of the message in communication develops. In particular we wished to examine in greater detail than before the influence of outcome on children’s judgments of message adequacy and inadequacy, to look at the relationship between explanations of success and of failure in communication, and to check that the data were consistent with previous demonstrations of age-related differences in allocations of blame.

Method The child was asked to observe a communication game between the dolls Donald Duck and Mickey Mouse, and to comment upon the reasons for their successes and failures. We had previously found that observing was like participating in respects relevant to the present problem (Robinson & Robinson,

366

LYJ. Robinson and W. P. Robinson

1977b) and it enabled us to control the messages and their outcomes. In this experiment we used four conditions: messages which referred to the speaker’s card only were followed by the listener choosing the right or a wrong drawing, as were messages which referred to the speaker’s card and to one other. These conditions will be referred to respectively as “good-right”, “bad-right” and “bad-wrong”. “good-wrong”, Subjects

We tested 96 children aged between 5 - 9 and 8 - 0, but ten of these (the first ten tested) were not asked to explain the successful communication in the good-vight and bud-vight conditions. The children attended two infants’ schools in middle class suburbs of Sydney. All were white and Englishspeaking. Materials

The dolls Donald Duck and Mickey Mouse sat at opposite ends of a table separated by a screen. They had identical sets of six cards. The cards bore line drawings of men holding or wearing: a red flag; a blue flag; a red flower; a blue flower; a tall white pointed hat; and a black top hat. Procedure

The experimenter explained to the child the communication game that Donald and Mickey were going to play. The child then took a seat by one of the dolls (children sat alternately by Donald and Mickey), and watched two demonstration trials on which Donald and Mickey each sent, via the experimenter, one message that referred to one card only and for which listener and speaker picked matching correct cards. Throughout the game, the speaker doll was stood by the card he had chosen as he gave his message. Hence, if the child was sitting by the speaker he could see the correct card as the message was given. If he was with the listener he did not see the correct card until the listener had made his choice. After the two demonstration trials, Donald and Mickey continued to take turns as speaker. Each message-outcome condition was made to occur twice, once when Donald was speaker and once when Mickey was. Hence the child experienced each message-outcome condition once when he was sitting by the listener and once when he was by the speaker. The messages and the order in which they occ~u-red are given in Table 1: In each case, the full message was “I’ve got the one I want you to pick, it’s a man with a . ..“. The bad-vight condition was made to occur after the child had been made aware of the card other than the speaker’s to which the bad message referred; this card had been used in a preceding trial.

Understandirlg

Table

1.

Order of Prcswtatior~

causes of Message Failures ad

of Mmsagcs and Mcssage-outconle

Succc~sscs

367

Corditiom

Message

nay ilO\WI

red ilap pointy hat ilag hat blue flowed red flag

Donald Mickey Donald Mickey Donald Mickey Donald Mickey

blue flag blue flower red flag pointy hat red flag black hat blue flower red flag

red flag blue llowcr red flag black hat red flag pointy hat red fowcr red flag

bad-wrong bad right good&right poodm-wrong bad -right bad u’rong good-u rang good-right

Whenever the listener chose a wrong card, the child was asked: “They’ve got different cards. It went wrong that time. Whose fault was that, Donald’s or Mickey’s or both of them? Why? Did Donald/Mickey tell properly which one to pick? (If “No”) What should he have said? Whose fault was it they went wrong? Why?” Whenever the listener chose the right card, the child was asked: “They’ve got the same card. They got it right. How did they get it right? Did Donald/ Mickey tell properly which one to pick[ (If “No”) What should he have said? How did they get it right?“] The first ten children tested were not asked “How did they get it right?“. Each child was tested individually, and the entire session was tape recorded. Results and Discussion (1) Bad-wrong

condition:

Agcvelated

differences

The results replicate our previously-found age-related differences in children’s allocations of blame for communication failure when the message is too general. For the purposes of this comparison, the children were divided into young and old groups: 5 - 9 to 6 - 10 (47 children) and 6 - 11 to 8 - 0 (49 children). Children were classified according to the number of times on the two bad-wrong trials (two chances per trial) that they blamed the speaker for communication failure and gave as their reason that the message had been inadequate. Since all those who blamed the speaker three out of four times, omitted to blame him only on the first occasion they were asked, they were included with the pure speaker-blamers.

368

E.

Table 2.

J. Robinson and W. P. Robinson

Numbers of Children in Young and Old Groups who Blamed the Speaker and his Message for Failure on the Two Bad-Wrong

Trials

Times blaming speaker/message

Age s-9to6-10

6-IIto8-0

0

26 I 14

8 8 33

192 394

Of those who never blamed the speaker and his message (and blamed the listener on the grounds that he picked the wrong card) 76% were in the young age group, while of those who blamed the speaker and his messages three or four times only 30% were in the younger group. A nonparametric trend test based on the statistic S (Ferguson, 1965) was performed on the data presented in Table 2, and this showed a highly significant age-related trend (z = 3.97; p < 0.000 1) towards a greater likelihood of speaker/message blaming in the old group. (2) Responsibility for success and failure in the four conditions In both the bad-wrong and good-wrong conditions children were asked to ascribe blame for failure; in the badqight and good-right conditions children were asked “How did they get it right?“. We could therefore see whether in the other conditions, as in the bad-wrong condition, younger children were more inclined than older ones to focus upon the listener, and we could also look at the relationship between allocations of responsibility in the four conditions. 32 of the young children blamed the In the good-wrong condition, listener only for the failure and 1.5 blamed the speaker at least once. Fortytwo of the old children blamed the listener only and 7 blamed the speaker at least once. These proportions are not significantly different (chi square = 3.28; p > 0.05). It is however not surprising that this difference is not significant: we would expect younger children to blame the listener because they cannot conceive of the speaker’s message as being at fault, and older children to blame the listener because in this instance the mistake is his responsibility. There was no difference between young and old children in their allocations of responsibility for success in either the bad&right or the good-right conditions. Of those who mentioned the listener only as responsible for success in the badqight condition 7 were young and 6 old; of those who referred to the speaker/message 9 were young and 8 old; of those who mentioned guessing or confessed ignorance 2 1 were young and 35 were old. Very

Understanding &uses of MessageFailures and Successes

369

few children attributed the responsibility for success in the good-right condition to the listener, and this rarity precluded the possibility of any agetrend being significant. There was significant concordance of responsibility allocation across conditions. Forty-three of the 62 children who blamed the speaker at least once in the bad-wrong condition, correctly blamed the listener only in the good-wrong condition. Of the 22 children who incorrectly blamed the speaker at least once in the good-wrong condition, 19 had done so correctly in the bad-wrong condition whereas only 3 had blamed the listener alone (p < 0.001). In the bad&right and good-right conditions, children allocated responsibility for success to the listener (“He was thinking”; “He listened”), to the speaker/message (“Because Mickey said the color of the flag”; “Mickey told the right thing”), to both (“Mickey said the red flag and Donald saw the red flag”), to luck (“Donald just guessed”), or they could not account for the success. Only 3 children allocated responsibility to the listener alone in both bad-right and goodqight conditions; all 3 blamed the listener only for failure in the bud-wrong condition. Ten further children allocated responsibility to the listener alone in the bad&right condition, but to the speaker/ message in the good-vight condition. Of the 10, 4 blamed only the listener in the bad-wrong condition, 3 blamed the speaker once or twice and 3 did so three times. It appears from these results that focusing upon the listener only when explaining success is less common than doing so when explaining failure. This suggests that it may be easier for the child to recognize the role of the message in communicative success than it is for him to recognize its role in communicative failure. However, when making comparisons between responses in right and wrong conditions, it may be more appropriate to consider the child’s answers to “Did he tell properly...?” since this question was asked in all four conditions. “Whose fault was it we went wrong?” and “How did they get it right?” are not necessarily equivalent ways of asking the child to allocate responsibility for the outcome of the communication. In the sections below, we focus upon children’s answers to “Did he tell properly. ..?” in the four conditions, and consider allocations of responsibility for success or failure only when these help in the interpretations of responses. (3) Judgments of message adequacy in the four conditions In our initial analysis of the relationship between adequacy judgments in the four conditions, we classified the child’s answers to “Did Donald/Mickey

370 /:‘.J. Robittsott at& CV,I’. Robittsott

tell properly which one to pick ?” (If the child said “No”) “What should he have said?“. In order to be classified as answering correctly, the child had to 0“ivc‘ correct answers on both trials. Table 3 shows the number of children who (i) in the gootl riglrf condition, correctly answered “Yes” on both trials; no children answered “No”; (ii) in the good-\vrong condition, on the one hand correctly said “Yes” on both trials, or on the other hand gave at least one firm and repeated “No”, whether or not they could answer “What should he have said?“: (iii) in the botf riglrt condition, on the one hand said “No” on both trials ~r,ltf specified what was missing from both messages, or on the other hand incorrectly answered “Yes” on at least one trial; (iv) in the bud--wrotlg condition, on the one hand correctly said “No” on both trials rr/roi specified what was missing from both messages, or on the other hand incorrectly answered “Yes” on at least one trial. From Table 3. it is apparent that all the children judged the good message to be adequate when the listener chose the right card, but in the other three conditions some children made errors in their judgments of message adequacy: 49 of the 96 made at least one error on a bad-right trial, 33 did so on a gootlLwrotrg trial, and 30 did so on a bad--wrong trial. In an attempt to identify the criteria the children were using to make their judgments, we went on to examine the relationship between judgments in the different conditions. In Table 4, the relationship between errors is shown. Note that while Table 3 is presented in terms of the child’s answers (Yes or No) to the questions about message adequacy, Table 4 is presented in terms of errors made. Translating the data given in Table 4 into answers given requires ignoring the differences between children who made errors on

Understanding

Causes ofhlessage

Failures and Successes

Number of Errors (0, 1 or 2) Made in Each Condition in Judgments

Table 4.

371

of Messa-

ge Adequacy Message good

Message bad

Listener’s choice (outcome)

Number of children

Mean age (yrs-months)

18 6 I 26 2

6-8

right

wrong

right

wrong

0

0 0

0

0

2 2 1

2

0

0

2 2

2

0

1

0

0 0

1

2

0

1

1

1

0

3

0 0

2

0

0 E

0

I:

0 0

G

0

A

B

C

D

0

1

0

1 1

3) 4

11

0

6-7

6-8

0

0

0

2

0

0

1

0

0

0

0

1

2 2

2 1

1_!________ 3 2

6-5

0

0

2

1

7-7

2

5’

9

6-9

41

27

7-2

one or two trials in a particular condition, but if we do this we can identify five common response patterns: A. children with a tendency to judge all messages to be adequate (N = 26). B. children with a tendency to judge good messages to be adequate only when the listener chooses correctly, and bad messages to be inadequate only when the listener chooses incorrectly (N = 11). C. children with a tendency to judge bad messages to be inadequate irrespective of outcome, but good messages to be adequate only if the listener chooses correctly (N = 19). D. children with a tendency to judge good messages to be adequate irrespective of outcome, but bad messages to be inadequate only if the listener chooses wrongly (N = 9). E. children with a tendency to judge good messages to be adequate and bad ones to be inadequate, irrespective of outcome (N = 27).

372 E J. Robinson and W. P. Robinson

In addition there were four children (patterns F and G) whose responses appeared to be anomalous. These children will be omitted in what follows. An initial interpretation of these response patterns might be that children giving pattern A assume that all messages are adequate and have no understanding of the different contributions of good and bad messages to successful or unsuccessful communication. In contrast, those giving response patterns B to E judged some messages to be inadequate: those giving patterns B, C and D appear to be influenced by outcome (success or failure), while those giving pattern E appear to be judging solely on the basis of properties of the message, as an adult would under these particular circumstances. In order to see whether these initial interpretations are appropriate, in what follows we look more closely at the answers given in each of the conditions and relate them to results previously obtained with the ‘whose fault’ technique: (4) Response Pattern A: all messages judged adequate Of the 26 children who gave response Pattern A, 18 fzevcr judged a message to be inadequate. All of these blamed only the listener for failure on bad-wrong and good-wrong trials. Did these children have no understanding that messages could be good or bad? Analysis of their answers in goodkght and bad+ight conditions helps to answer this question. Nine of the 18 mentioned the message at least once in answer to “How did they get it right?” on bad-right and/or good-right trials. This result is inconsistent with the interpretation offered above that children who judge all messages to be adequate have no understanding of the role of the message in communication. It is, however, consistent with the results of an earlier experiment (Robinson and Robinson, 1977b) in which we asked children to judge communications for which the messages were inappropriate rather than inadequate. Previously, (Robinson and Robinson 1976a, 1976b) ‘inadequate’ messages had been too general to identify a single card, while in this case (1977b) they were made precise and specific, but the speaker picked up an inappropriate card as his choice. For example, the speaker could say, “I’ve got the one I want you to pick, it’s a man with a red flower”. This enabled the listener to make an unambiguous choice but the speaker would then show his drawing of a man with a blue flag. These ‘inappropriate’ messages pose a different problem from those messages which referred to a flower when several flowers of different colour were in the array. We found that 90% of children who blamed the listener when the message was too general, blamed the speaker and specified an improved message when it was inappropriate. Such children recognize that messages can be inadequate, but appear to define inadequacy differently from older children.

Understanding

Causes of Message Failures and Successes

373

If the problem for our listener-blaming Pattern A children is in identifying even though they recognize that ‘too-general’ messages as inadequate, adequate messages can be responsible for successful communication, we would expect the 9 children who mentioned the message to do so equally on badqight and good-right trials. This is true of 6 of the children, who mentioned the message at least once on trials of each type. However, 3 children mentioned the message on good-right trials only. It appears that the 3 could discriminate between good and bad (too-general) messages in that they recognized that the good ones contributed to the success of communications but omitted to mention the message in explaining success when the message was bad (too-general). This phenomenon becomes more apparent if we consider not only the children who judged all messages to be adequate, but those who blamed only the listener for failure even if they did give some inadequacy judgments. That is, we now consider children who appeared to have no understanding of the role of inadequate messages in causing communication failure. Of 30 such children, 19 mentioned the message at least once in their answers to “How did they get it right?” on bad-right and/or good-right trials. Ten of the 19 mentioned the message on at least one of the trials of each type. Eight mentioned the message on goodkght trials only, and a single child mentioned it on bad-right trials only: this is a significant difference (Binomial test, p = 0.04). (The single child’s other answers were deviant; he was the one who gave Pattern G.)

(5) P&terns B, C and D: adequacy judgments influenced by outcome Our earlier finding mentioned in the introduction (Robinson & Robinson, 1977a) of a difference between bud-wrong and bad-right trials in answer to 29 children made at least one ...?” was replicated: “Did . . . tell properly error (i.e., answering “Yes” instead of “No . . . he should have said . ..“) on both types of trial; 46 children made no errors on either type; 20 children made at least one error on the bad-right trials only; one child made at least one error on the bud-wrong trials only. A binomial test shows the contrast 20: 1 to be significant (p < 0.002). That is, children were more likely to judge too-general messages to be adequate when the listener chose the right card than when he chose a wrong card. To what extent is it appropriate to consider the child at a certain stage of development as truly ‘outcome-centered’, in that he judges message adequacy solely on the basis of outcome, ignoring properties of the message? If such children existed, then we would expect them to make errors on bad-right and on good-wrong trials: they would judge the bad messages to

be adequate and the good ones to be inadequate. The child was considered to have made an error on a goo&\uror~g trial if he gave a firm and repeated “No” in answer to “Did... tell properly . ..?” whether or not he could answer, “What should he have said?“. (Some children did suggest improved messages, these are given below.) Eleven children made at least one error on both badrighr and good-wrotlp trials (Pattern B); 27 children made no errors on either (Pattern E); 35 children made at least one error on bud--right but none on good-burorzg trials (Patterns A and D); and 19 children made at least one error on good-nlr’orlg trials but none on had-right (Pattern C). Our interest here was in whether it was the same children who made errors on both types of trial. Eleven children made errors on both, but 54 made errors on one type of trial only. While outcome-influenced errors occurred on both good-bvrorlg and bade-right trials, only 17% of those who made errors, did so on both types of trial. It may therefore be more parsimonious to account for errors on bad-right and good-wrotzg trials in two different ways, rather than to attempt to explain why the majority of those who ignore message properties, do so only when the message is either good or bad. Firstly, we shall consider an alternative account of the greater difficulty in identifying a bad message as such when communication is successful rather than unsuccessful. With our procedure, when a wrong card was chosen, speaker’s and listener’s choices were isolated from the other cards in the set while the child was questioned about the reasons for communication failure. When the listener chose the right drawing however, it was not made apparent at the time of questioning that there was another card to which the message referred. The child had previously been made aware of the other drawings in the set, but this was not repeated. We would therefore expect it to be more obvious to the child that the message did not apply to the speaker’s card only when the listener chose the wrong card than when he chose the right one. Hence when the child first recognizes the relevance of this fact, he is likely to judge that messages which are in fact too general are adequate if the listener chooses the right card. That is, we would expect outcome-centered judgments to occur. There is now no need to argue that when the child first comes to recognize the role of the speaker and his message in communication failure, properties of the message are ignored. The assumption is that the child examines the message and its referents from the beginning, but is inexpert at doing so. Perhaps the errors on the bad-right trials are best accounted for in the way outlined above, i.e., in terms of how obvious the multiple reference of the message is, while errors on the good-wrong trials are to be accounted for in a different way. These latter will be considered below.

Underskmding Causes

of Message Failures and Successes

375

Of the children who made at least one error in judging message adequacy on the good-wrong trials, seven blamed only the listener for failure on the bad-wrong trials, while the remaining 26 children blamed the speaker at least once on these. Errors were particularly common among the younger speaker-blamers: 14 of the 21 speaker-blamers (67%) aged between 5 - 9 and 6 - 10 made at least one error, compared with 12 of the 41 aged 6 - 1 1 to 7 - 10 (29%). These children may have developed an assumption that if communication fails, the responsibility must be the speaker’s. In some cases, this was true only of the child’s first response, and once he was asked “What should he have said?” the child admitted that the message was all right and that the fault lay with the listener. Other children insisted that there was something wrong with the message even though they could not specify an improvement. Some children did suggest improvements. For example, when the message was “... a man with a pointy hat”, it was suggested that the speaker was responsible for the listener’s choosing the black top hat because he should have said” . . . a ver_v pointy hat” or “... a white pointy hat”. It may be that when the child has grasped the idea that the speaker and his message can be responsible for communication failure, he tends to overgeneralize this, and it is not until later that he can judge messages more independently of communication success or failure. To argue that the older child is more capable than the younger at isolating relevant factors in explaining an event is consistent with the general Piagetian description of development. (6) Effects

of sating

position

Whether the child sat by the speaker or the listener significant determinant of answers given.

did not emerge as a

General Discussion and Conclusions The results of the use of different combinations of good and bad messages and successful and unsuccessful outcomes obliges us to revise our earlier conclusion about what listener-blaming children know about the relevance of the quality of the message to the success and failure of the communication (Robinson & Robinson, 1976a; 1976b). It appears that some children who do not admit that a too-general message can cause communication failure do nevertheless consider messages to be responsible for the success of a communication, and that some of these discriminate between good and too$eneral messages in terms of their contribution to the success. Perhaps young

376 E. J. Robinson and W. P. Robinson

children’s first assumption is that all messages are good, and they gradually come to recognize that some are better than others in that they are more likely to be associated with communicative success. Seeing bad messages (whether inappropriate, too-general or having some other inadequacy such as different interpretations by listener and speaker) as causes of communication failure may develop as a consequence of this. This suggestion is in contrast with our earlier supposition that young children have no understanding that messages can be adequate or inadequate, and that coming to see various types of bad messages as causes of failure is but the opposite side of the coin to seeing good messages as contributing to communicative success. There was no evidence that when children first recognize the role of the message in communication failure they ignore properties of the message and focus only on the outcome. Errors made by children in their assessment of bad messages leading to successful outcomes may be more readily explained in terms of the ease of seeing that the message referred to more than one card. We may think of the younger, listener-blaming children as deciding about the inadequacy of messages on the basis of different answers to the question “Does the message fit the speaker’s card?“. If the answer is “Yes”, the message is judged to be adequate and the blame for communication failure is ascribed to the listener only. If the answer is “No”, the message is judged inadequate and then the blame is ascribed to the speaker. These children give response Pattern A, judging all the too-general messages to be adequate, but may judge inappropriate messages to be inadequate and may recognize that messages contribute to the success of a communication. At a later stage of development, adequacy is defined by different answers to the question “Does the message fit only the speaker’s card?” As with the first question, if the answer is “Yes”, the message is judged to be adequate and the listener is blamed, and if the answer is “No”, the blame is located with the speaker and the message. When this question first enters his repertoire, the child is inexpert at identifying the multiple reference of messages which are too general. This means that he is more likely to recognize the inadequacy of messages which are too general when communication fails than when it is successful. Hence the child will give response pattern B or D. At about the same time, the child, having grasped the idea that the speaker and his message can be responsible for communication failure, tends to over-apply this rule and hence make errors on good-wrong trials, (Pattern B or C). Finally, the child becomes capable of considering both message properties and outcome, to give Pattern E.

Understanding Causes of Message Failures and Successes

3 77

While this account is consistent with most of the data, it does not accommodate the lack of complete concordance between adequacy and blame judgments, nor the ability of some listener blamers to differentiate between good and bad (too-general) messages in terms of their contribution to successful communication. The account will no doubt need to be modified, but should be a useful basis for future research. One alternative account would refer to an intermediate stage of outcome-centeredness with three variants - patterns B, C, and D; the problem would then be to give and account for the reasons for the specific choices made. At present we cannot do this. The evidence does, however, seem to be strong enough to allow us to go beyond description towards the kind of explanation offered above. Having interpreted our results within a developmental framework, it is appropriate to test for age-related differences in the response patterns. We performed an S-test upon the frequencies of young and old children giving response Patterns A; B, C and D (combined); and E, and found a significant trend for older children to be in the categories we presumed to be more advanced (z = 2.46; p = 0.01). The relevant frequencies are: Pattern A, young 16, old 10; Patterns B, C and D, young 22, old 17; Pattern E, young 7, old 20. That is, the trend was for younger children to be more likely to judge all the messages to be adequate, for older ones to judge good messages adequate and bad ones inadequate irrespective of outcome, with outcomeinfluenced judgments forming an intermediate stage. Should this developmental account be supported by future research it should be possible to begin to explore the relationships between the developing understanding about communication and communicative performance. How are the young child’s communicative skills limited by his lack of understanding, and how do they improve as his understanding develops?

References Asher,

S. R. (1976) Children’s ability to appraise their own and another person’s communication performance. Deve.? Psychol., 12, 24-32. Asher, S. R., and Oden, S. L. (1976) Children’s failure to communicate: an assessment of comparison and egocentrism explanations. Devel. Psychol., 12, 132-139. Ferguson, G. A. (1965) Nonparametric trend anulysis. Montreal, M&ill University Press. Flavell, .I. H., Botkin, P. T., Fry, C. L., Wright, J. W., and Jarvis, P. E. (1968) The development of roletaking and communication skills in children. New York, Wiley. Gleitman, L. R., Gleitman, H., and Shipiey, E. F. (1972) The emergence of the child asgrammarian. Cog., I, 137-164. Glucksberg, S., Krauss, R., and Higgins, E. T. (1975) The development of referential communication skills. In F. D. Horowitz (ed.) Review of child development research, Vol. 4, Chicago, Univcrsity of Chicago Press.

378

I<. .I. Robittsorz atd IV. I? Robitlsou

Krauss.

R. M., & Glucksbcrg. S. (1969) The dcvelopmcnt of communication compctcnce 3s a function of age. C/&f Devcl., 40. 255 -266. Maratsos, M. I’. (1973) Notqoccntric communication abilitic\ in prcsch~~ol children. Child DCIYJ~., 44, 697 700. hlenip-Peterson. C. 1.. (1 97.5) The modification of communicative hchaviour in preschool-aged children as a function of the listcncr’s perspective. C/&l Dcwl., 46. 1015-1018. Pingct, J. (1959) The latl,yuap and thou&t 0.f the child (3rd cd.) 1 ondon, Koutlcdgc & Kcgan Paul. Rohinwn. 1,. J., & Robinson, W. P. (1 9768) The young child’\ understandin? of communication. Dcvel. Psjrhol. 17, 378.-333. Robinwn, L. J., & Robinson W. I’. (1 976b) Dcvclopmcntsl changes in the child’ explanations of communication firilurc. Austral. .I. Ps~‘clrol.,2.~. 1 SS- 165. Robincon, I J.. & Robinson W. P. (1977a) Children’s explanations of t‘ailurc and the inadequacy of the mitunder
Dc\ enfants ont pour tichc d’ohscrvcr un Jcu au tours duquc! deus poupCcs communiqucnt cn envoyant dcc mcssapcs qui doivcnt permcttrc i la poup& ccns& Gtrc auditriw. de choisir unc carte identiquc i cclle de l’autre poupCc. 1 c\ commentaircs et .justifications critiques de chacluc cnfant wnt btudic\ cn fonction dc l’ipc, dr l’addquation du message. dc la correction du chois dc la carte ct de la position dc l’cnfant par rapport a”\. poup&5. Lee r&sultats montrcnt quc lc\ cnfant, lcs plus jcunes jugent commc s’il? dcmandaicnt cus-mFmc\ si lc mcssgc envoy; Ctait cn dCaccord avcc la carte du locutcur ct nc hl~ment lc locuteur quc dans cc LX\. Lcr, enfants plus ;igds critiqucnt lc locutcur ct font rcmarquer l’inad&luation du message quand cclui-ci n’idcntific pasunivoquemcnt la carte du locutcur, mSmc si la communication a ahouti i un succ:\.

Cognition, 5 (1977) 379~-392 @IClscvier Sequoia S.A., Lausanne

Discussions ~ Printed

in the Netherlands

Reply to Winograd* B. ELAN DRESHER Brown

University

NORBERT Harvard

HORNSTEIN

University

In our article “On some supposed contributions of artificial intelligence to the scientific study of language” (Cognition, 4. 32 l-398) we argued that much recent AI research into language is concerned primarily with the task of developing programs that can “understand” language in some limited domain and not - contrary to the claims made by AI researchers - with developing explanatory theories of language. We supported this contention with a fairly detailed review of some recent work by prominent AI researchers in language such as Winograd, Minsky, and Schank’. We showed that whatever the technological merits of this research, it does not yield explanatory principles of the sort which are necessary to the development of any scientific theory. Our point in this regard is a general one: we nowhere criticize this research on the grounds that it is directed at explaining phenomena which are inherently unworthy of scientific investigation. Rather, we maintain that the purported explanations being advanced are not really explanations at all. More specifically, in practically every case, the accounts given are, at best, based on programs or plans for programs which simply presuppose the very phenomena which they are meant to explain. Thus, at the very heart of these programs lie the most idiosyncratic facts concerning the limited domains dealt with. It is this feature that makes these programs inextendable in any principled way to problems lying beyond the very restricted domain for Hence, whatever successes these which they were originally designed. programs enjoy is a function of the limited domains to which they were tailored and not to any general principles which could serve as the basis of an explanation of the phenomena in these domains. Lacking general principles, this work cannot contribute to the articulation of a scientific theory *We would like to thank Ned Block, Noam Chomsky and Jerry I:odor drafts of this reply and our article. WC would also like to acknowledge Canada Council during the period in which these articles were written.

for their comments on earlier the generous support of the

'We also dealt with the work of Wanner, Kaplan and their associates concerning ATN systems, but as our points about this work were somewhat different and as this part of our paper was not addressed in Winograd’s reply we shall not deal with that work again here.

380

B. Elan Dresher and N. Hornstein

of language. This is true of every aspect of human linguistic behaviour -~ be that the ability to distinguish grammatical from ungrammatical sentences, the ability to process speech in context, the ability to produce sentences, or whatever. In sum, our argument is that this work is unscientific according to the most general criteria of what constitutes scientific explanation’ that whatever else science is, it proceeds through the elaboration of general principles. It is because this AI research makes no effort to elaborate any such principles that if fails to be of scientific interest3. In our paper we made the following points: (1) An examination of some recent influential work in AI reveals that it does not contribute anything new, either in the way of description or explanation, to our understanding of human linguistic behavior. (2) This failure can be accounted for by the assumption that the real goals of AI research into language - as opposed to the stated ones ~ are not scientific but technological: the development of machines that can manipulate language in limited domains. (3) In contrast to the AI approach to the study of language we suggest that there exists an alternative approach - that based on the theory of generative grammar most closely identified with Noam Chomsky ~ within which various aspects of the study of language can be more fruitfully pursued. It is important to note that the first of these points does not depend on the other two; nor does the second depend on the third. Thus, our critical remarks concerning the actual details of the work that we reviewed are independent of our account of what we think the real goals of this work are. Similarly, whether or not we are correct about the fruitfulness of generative grammar does not affect the validity of our claims concerning AI research. Therefore, a convincing defense of the AI approach to research into language could not confine itself to a discussion of the third point - our view of what a scientific theory of linguistics should be ~ but could be expected either to take issue with our characterization of current AI work --- for example, by discussing the contributions that it has made to the study of language - or

*See tlempel and Oppenhcim (1965), Hempel(1965), Popper (1959), and Schcffler (1963 for discussion of the role of general principles in explanations. 31ii fact, much of the work that we reviewed, quite aside from not offering any explanatory principles, neither presented any facts hitherto unknown nor did it show that the data that it was trying to account for had anything to do with human language abilities. It was supported by no experimental evidence and was frequently counterintuitive as well. For example, Winograd proposes a model of processing which is based on assumptions about human language processing which he supports with no evidence whatsoever. See Dresher and Hornstein, pp. 346450.

Reply to Winograd

38 1

to give reasons for believing why such an approach could be expected to make such contributions. In his response to our paper (Cognition, 5, 1977), Terry Winograd does not address himself to any of our comments concerning the details of the work that we survey. Indeed, he writes: I find myself in agreement with many of the comments which deal with technical details, including some concerning details of my own previous work. (pp. 15 1-l 52)

It should be noted that discussion of these “technical details” constitutes more than two thirds of the paper. Moreover, it should not be thought that our comments concerning these “technical details” involve very minor and easily corrigible errors. For example, it is not just a “technical detail” that Winograd’s parser presupposes the theory of parsing to which it is purportedly contributing (see Dresher and Hornstein [D & H] pp. 346-35 l), or that Minsky’s frame theory is no theory at all (see D & H pp. 356-362), or that Schank’s theory of conceptual dependency, where it is coherent (see D & H p. 369), never goes beyond the level of restating the problems that it is meant to deal with (D & H pp. 365-376). We find it odd that Winograd can agree with many of these sorts of criticisms but still maintain that this kind of research constitutes a fruitful approach to the scientific study of language. By not discussing these sorts of specific comments, Winograd evades one of the most important points of our paper. Winograd mentions as well that there are some technical comments that he does not agree with, but he does not elaborate on any of these in his reply, either. Rather, the thrust of his reply is that our specific comments are not directed to showing that AI research into language is unscientific in the most general sense but that it is unscientific only from the perspective of a very particular view of what research into language ought to consist of. He writes concerning us: They adopt unquestioningly and dogmatically a paradigm for the study of language which has been developed and articulately expounded by Noam Chomsky. The real

point of their paper is that AI researchers are not working within this paradigm. (P. 152)

Thus, our argument that AI research is unscientific is true, according to Winograd, only if the term ‘scientific’ is equated with ‘Chomskian’. But this sort of criticism, Winograd suggests, is not of general interest; for far from providing a suitable framework within which to discuss general issues concerning research into language, Winograd claims that “the currently dominant school of Chomskian linguistics is following an extremely narrow and isolated byway of exploration” (p. 153). He rejects our criticisms by rejecting the “specific arbitrary set of meta-linguistic beliefs’ (p. 153) which

382

B. Elan Dresher and N. Hornstein

he claims informs them. Current AI research into language, according to set of assumptions about Winograd, “is based on a [different - N.H./B.E.D.] the nature of language and the methods by which it can be understood” (p. 168). This set of assumptions constitutes a new paradigm for the study of language within which our arguments have no force. However, Winograd’s interpretation of our critique is based on a fundamental misrepresentation of Chomsky’s position and, more particularly, of our own. We will show that once these misrepresentations are clarified the paradigm difference that Winograd argues for becomes largely illusory. Hence, Winograd’s reply fails to refute our initial argument that AI research into language is unscientific by the most general criteria of what constitutes scientific research.

Il. Winograd’s account

of the “Chomskian

paradigm”

Winograd’s claim is that by a “neat pice of intellectual legerdemain” (p. 155) the “Chomskian paradigm” which we supposedly follow “unquestioningly and dogmatically” narrows the study of language to the study of Universal Grammar which is, in turn, narrowed to the study of the formal properties of grammars; and further, that followers of this paradigm insist “that this enterprise constitutes the whole of ‘the scientific study of language”’ (p. 163). Therefore, according to Winograd, by this process one is led “to remove from the purview of ‘universal grammar’ [and hence the scientific study of language - N.HJB.E.D.1 all study of the processes and mechanisms which underlie language use” (p. 156). Indeed, the scientific study of language is seen as “excluding the study of language comprehension and production along with all aspects of language as a means of communication. In the Chomskian paradigm for the scientific study of language, there is an assumption that valid generalizations can be made about the set of sentences judged grammatical by a native speaker, but that it is not possible to form scientific theories of the mechanisms by which people actually use language” (p. 157). Given these theoretical biases, it is predictable, according to Winograd, that we would find most of AI research into language “unscientific” for in “the computational paradigm for the study of 1anguage”adopted by AI researchers, “the primary focus of study is on the processes which underlie the production and understanding of utterances in a linguistic and pragmatic context” (p. 168). In short, according to Winograd, our critique of AI research is basically irrelevant because we have elevated a “rather specialized concern and methodology .. . to the position of being the only ‘scientific’ study of language” (p. 160).

Reply to Winogrud

383

This account of our position is blatantly incorrect. Nowhere in our critique of AI research into language do we suggest or imply that the study of processing, production, comprehension, or any other part of language use is inherently unscientific or that everything except the study of formal grammar or competence must be excluded from the scientific study of language. Quite on the contrary, we explicitly include as part of a scientific study of language the study of processing and linguistic performance in genera14. For example : To sum up, a scientific study oflanguage will aim at theories which attain the level of explanatory adequacy, i.e., theories which provide principles according to which the human language faculty is organized, and which account for the facts of language acquisition and use. (Emphasis added, p. 329 B.E.D./N.H.) Indeed, most of section III of our paper is concerned with various approaches to language processing. We note very explicitly that an overall theory of language will be incomplete if it is limited to a theory of grammar and competence: Transformational grammar is a model of linguistic competence, which is the tacit knowledge that speakers have of the structure of their language. It is not intended to be a theory of how this tacit knowledge is put to use in actually speaking and understanding a language. An overull theory of language use must specify, in addition to a theory of competence, a theory of production and a theory of pursing. (Emphasis added, p. 378 B.E.D./N.H.) Nor do we ever criticize any particular proposal on the grounds that it deals with subject matter which must be excluded from a scientific study of language or because it does not deal with constraints on formal grammar. For example, we begin our discussion of Winograd’s parsing model with the statement: It is understood by everyone that any overall theory of language will contain a theory of parsing. Hence it is interesting to see what contributions, if any, Winograd’s program makes to such a theory. (p. 346) Our objection to Winograd’s approach to parsing is not that parsing is inherently unscientific but that “he advances no reason of any kind - neither theoretical nor empirical - for setting up his parser the way he does” (p. 350). Note that this critique has nothing to do with “any specialized concern and methodology” other than that one ought to advance reasons in support of theoretical proposals.

4Cf. p. 328. In fact, portions of the relevant passage (from an earlier draft) are cited by Winograd, p. 157, in which we explicitly claim that “a theory of grammar does not exhaust the subject matter of research into language”. For reasons to be made clear below, Winograd interprets this as’meaning that the theory of (formal) grammar does exhaust the subject matter of research into language.

384

B, Elan Dresher and N. Hornstein

As an example of our “arbitrary” approach to what is acceptable explanation, Winograd cites the following passages from our paper:

as an

Minsky presents a totally unconstrained system capable of doing anything at all. Within such a scheme explanation is totally impossible. (pp. 357-8) It is a commonplace of research into language that unconstrained transformational power enables one to do anything. If one can do anything, explanation vanishes. (P. 357) These comments, contrary to Winograd’s assertions, are not based on any special features of the “Chomskian paradigm”; in particular, they are not based “on the notions of formal generative power” (p. 16 1). Rather, they are based on the assumption that any explanation crucially involves the use of general principles. For a general principle to have any content it must exclude certain possibilities. In our paper, we go to some lengths to demonstrate that Minsky’s frame theory, at least insofar as it deals with language, contributes virtually nothing to an explanation of the phenomena he discusses precisely because he “presents a totally unconstrained system capable of doing anything at a11.“5 This notion of explanation makes no appeal to any exotic details of some particular paradigm. Rather, the view that explanation involves general principles has been relatively uncontroversial since the days of Plato and Aristotle. It may be true, as Winograd writes, that “there is no simple answer or that “there are whole bodies of to the question ‘what is explanation?“’ philosophy dealing with this problem” (p. 161). But nothing that Winograd says leads us to question the general validity of the need to base explanations on general principles. In short, Winograd’s interpretation of our position bears no resemblance to what we actually say. Indeed, he ascribes to us positions that we explicitly reject, thereby distorting in a fundamental manner the basic point of our article. The reader may well wonder, in light of our response and in the light of the above quotations from our original article, how it is that Winograd could have so fundamentally misunderstood our comments concerning the nature of AI research into language. The misunderstanding arises through a persistent misinterpretation of the “Chomskian paradigm” which he attributes

‘Winograd seems to agree later on that Minsky’s frame,theory is not a theory “in the usual sense of the word” (p. 172). He does not, however, say in what sense of the word it is a theory nor does he say what insights he believes frame theory has made concerning any aspect of the study of language. This is not surprising, for, as we showed, the insights of frame theory are nil. Winograd, however, correctly points out that despite all this, frame theory has “influenced researchers” and has “laid out a direction of exploration” (p. 172).

Reply to Winograd 385

to us. In particular, at each point where it is asserted (either in our paper or in the work of Chomsky that Winograd cites) that a certain matter is central to the Study of language, or that a scientific study must address itself to some matter, Winograd interprets this as meaning that a scientific study must limit itself exclusively to that matter. For example, Winograd cites the passage that “a theory of human natural language understanding is impossible if it is not carried out in the context of a study of the principles of UG” (p. 155 in Winograd). He goes on to note that the term “Universal Grammar” excludes “all study of the process and mechanisms which underlie language use” (p. 156).6 But just because the study of universal grammar does not include the “study of language comprehenion and production” it does not follow, as Winograd says, that “in the Chomskian paradigm for the scientific study of language there is an assumption . . . that it is not possible to form scientific theories about the mechanisms by which people actually use language” (p. 157). Thus, from our statement that the study of UG is a necessary part of the study of language, Winograd goes on to attribute to us the much different claim that it is the only part of such a study. Winograd to the contrary, neither anything we say nor anything that he cites from Chomsky supports such an interpretation. The same misunderstanding arises throughout Winograd’s paper. To cite just one more example: in reply to our contention that the central problem of a theory of language is to answer the question how does someone construct a grammar,’ Winograd objects “to the blindness engendered by the insistence that this enterprise constitutes the whole of ‘the scientific study of language”’ (p. 163). But there is a great distinction, in our view, between claiming that something is central to a certain study and claiming that it constitutes the whole of that study.’ A second theme running through Winograd’s critique of the “Chomskian paradigm” concerns the role that the study of grammar plays in the context

‘This statement of Winograd’s is incorrect under the reasonable assumption that what people know is of partially a function - “underlies” - what they do. Thus, the principles of UG which the speaker tacitly knows and uses in the construction of his particular grammar certainly do “underlie language use”. See note 12 for further discussion. ‘The wording of the section that Winograd quotes on p. 163 was changed in a later draft as it was found to be misleading. The word ‘does’ was replaced by the word ‘can’ in the printed version (cf. p. 323). ‘Another example of this occurs in Part V section G of Winograd’s paper. There, he objects to our claim that the syntax plays a major role in conveying meaning. He points out, apparently contrasting this view with what he takes to be our own, that “syntax is important, but it is only one of many levels of structure which are vital to conveying meaning”. Once again he incorrectly takes us to be saying that syntax plays an exclusive role.

386

B. Elan Dresher and N. Hornsteitl

of the study of language as a whole. Winograd implies that it is a special assumption of the “Chomskian paradigm” that study of grammar is central to the study of language,’ and ihat this assumption is not shared by researchers in AI. However, in practice, every processing model that we looked at in our earlier article contained at its heart a model of a grammar.” This is not surprising. It is hard to see how one could elaborate a model of processing without some notion of what it is that is being processed. It is in this sense that a study of grammar is central to a study of processing. By characterizing the tacit knowledge that a person has one contributes to an understanding of how he puts that knowledge to use.ll Winograd construes the relation between grammar and processing quite differently. He writes that it is an error to believe “that the form of the grammar reflects facts about properties of the language user”, for only “a set of rules which attempted to reflect the actual processes of language use” can be claimed to reflect properties of the language user (p. 159). But this is tantamount to claiming that only what someone does and not what someone knows is a property of that person. Since it is reasonable to believe that the way people act will be partially a function of what they know, it seems that here it is Winograd who is arbitrarily narrowing what counts as a property of a language user. Because a study of grammar will not give a complete answer to questions concerning language use, it does not follow that it will contribute nothing to answering such questions, or that it reflects no properties of the language user.” Rather, it seems reasonable to us to suppose that a theory of what a person tacitly knows about the structure of his language will form an important part, though not the only part, of a theory of language use. Furthermore, this is tacitly conceded by AI researchers. 9<‘f. p. 156. p. 163. :f See for example our discussion of Winograd’s paper, p. 346. This is not to say, of course, that a scientific study of any aspect of language use must await the full development of a theory of grammar or that no scientific investigation of language use can take place in the absence of an articulated theory of grammar. Thus WC -say contrary-to what Winograd claims is an assumption of the Chomskian paradkm - “that syntax should be thoroughly studied before turning: to problems of meaning” (p. 173). This claim has been explicitly denied in the literaturc. Cf. Chornsky-(1957, Ch. 9), Chornsky(l965, Ch. 1) and Miller and Chornsky (1963). ‘* There is a persistent attitude among researchers in AI and psychology as well, that the study of grammar is not central to a study of speech production or comprehension. If, as we have assumed, a grammar represents the tacit knowledge that a person has of his language and it is reasonable to suppose that what a person does will be significantly, though not completely, determined by what he knows, it is hard to see how the centrality of grammar - the study of a person’s linguistic competence ~ can be denied. But denied it is. Why? The problem starts with a misinterpretation of what the competence-performance distinction comes to. In the literature, it is not always represented as the distinction between what one (tacitly) knows and how one puts this knowledge to use. Frequent competence is seen as idealized performance; a cleaned-up first approximation of performance. The parallel (Continued opposite)

Reply to Winograd 3 87

To sum up, then, the main point of section III of Winograd’s paper, which forms the central part of his reply to us (though it does not exhaust his reply), is that the real point of our original paper is that AI researchers are not working within the “Chomskian paradigm”. But as we have shown, this assertion is totally incorrect: first, because we do not hold the views that Winograd claims form the basis of our critique of AI research into language; second, and more importantly, because our comments and particular arguments concerning the scientific status of AI research into language do not depend in any way on the assumptions of any particular research program but rather rest on a quite general and virtually uncontroversial notion about the nature of explanations - that they crucially and inescapably involve the articulation of general principles.

III In section VI3 of his response, Winograd sets out what he sees as being the central characteristics of a “non-Chomskian paradigm” for the study of language, within which much AI research into language is conducted. In often drawn is to the study of physical objects abstracting away from considerations of friction. Thus Winograd, for example, writes: The desire for simplification through idealization is quite reasonable, akin to the physicist’s desire to study the mechanics of ideal frictionless objects before dealing with the details of a pebble rolling down a riverbed. In this formulation, “performance” covers the details of how deals with those more the language user behaves in a particular instance, while “competence” universal properties which apply to all instances. (pp. 1566157). Thus, for Winograd, competence is a first level abstraction which, in order to make the subject matter more tractable, falsely removes certain features of reality. Just as in the real world there are no frictionless objects, so in the real world there is no language competence. Given this view of the distinction, it is not surprising that the study of grammar and its properties should be regarded as less than central. At most, given these views, it is a heuristic; it may help - just as in physics it helps to study frictionless planes - but it is not crucial. Chomsky himself has partially contributed to the misconstrual of the distinction by mentioning idealization and competence in the same breath. Idealization will be just as necessary in the study of performance as it is in the study of competence. The distinction does not revolve around the issue of idealization but around the processes which are the objects of study. The theory of linguistic competence is a proper subpart of a theory of ormance, not some first approximation of one. ped Section IV of Winograd’s paper is entirely devoted to a rather long, and to us extremely confusing, metaphor of language as biology. He proposes his lengthy metaphor following his view that two fundamentally different paradigms of language study are competing here and that “it is not possible to debate the assumptions of competing paradigms in traditional forma1 deductive terms since there is not a sufficient set of shared premises. What is needed are tools which allow us to extend the domain of our thinking, and metaphor is one of the most accessible of these tools” (pp. 163-164). However, as we reject the view that our commentsabout AI research into language rest on assumptions which only make sense within one narrowly held paradigm, we do not feel that a discussion of this metaphor will aid in clarifying the points that we have already made.

388

B. Elan Dresher and N. Hornstein

our article we discussed at some length and in considerable detail why we feel that some of the central assumptions of this paradigm are wrong and why the methods based on them are unlikely to lead to scientific theories of language. In most cases, Winograd’s discussion of these points simply involves the repetition of claims which we discussed in our paper without introducing any new arguments or examples; in particular, Winograd rarely deals with any of the specific arguments or examples that we discuss. Therefore, rather than go over all the points in detail again we will deal with only a few points which we believe require additional comment. (1) As we mentioned above in section II, some of the differences which Winograd alleges to exist between the “Chomskian paradigm” and the “computational paradigm” are illusory. For example, one basic assumption which Winograd claims underlies the “computational paradigm” but not the one that we presumably adhere to is the following: It is possible to study scientifically ticular of language use. (p. 168)

the processes involved in cognition,

and in par-

He claims, again, that this view involves the rejection of “the Chomskian view that processes are inaccessible to scientific study” (p. 168). As pointed out above, Winograd has misinterpreted the “Chomskian” view in general, and ours in particular, concerning the scientific status of language processing. To repeat, to our knowledge no one has ever maintained that “processes are inaccessible to scientific study” and, in fact, the opposite has always been explicitly maintained. (2) Another distinction which Winograd draws between the “Chomskian” and “computational” paradigms is that the latter, but not the former, recognizes the “centrality of process” (p. 168). In the “Chomskian paradigm” according to Winograd There is a strong basic belief that the best methodology for the study of language is to reduce the language facility to a set of largely independent “components”, and assign different phenomena to each of them. This is in direct contrast to a system-centered approach which sees the phenomena as emerging from the interactions within a system of components. (p. 169)

As far as we can see, there is absolutely no difference positions. In our paper we recognize, for example, that

between

the two

since this theory [i.e. a theory of UC - B.E.D./N.H.] will ultimately have to produce a grammar, and a grammar will have to operate through the interaction of its various subparts, it is reasonable to suppose that research in UC could fruitfully proceed by the elaboration of the principles that underlie the subparts and the interaction among them. (p. 321, emphasis added)

Reply to Winograd

389

We go on to make a similar point concerning processing: it is reasonable to suppose that the object of a theory of processing will also be the elaboration of certain general principles, which, in this case, underlie the functioning of the various components in a processor and the interactions among them (p. 328, emphasis added) Clearly, it would be totally pointless to elaborate principles which underlie the interactions of various components if one believed that no phenomena emerged from these interactions. In our view, no substantive issue of theory or methodology is involved in Winograd’s putative distinction.14 (3) Another of Winograd’s pseudo-distinctions involves his assertion that within AI - but presumably not within the “Chomskian paradigm”: there is broad agreement that it is an issue of central importance to understand the properties of these representations [i.e. the representations underlying human language use], and develop a better understanding of how they take part in computational processes. (p. 171). Again, interest in the nature of linguistic representations is not confined to AI but is characteristic of much work in linguistics. Our point again is not that one should not study the nature of representations underlying language use. Quite the contrary: For what else is the study of formal grammar, for example, if not a part of the study of the nature of representations underlying language and language use? More to the point is that none of the work we discuss in fuct makes any contribution to our understanding of this question. In connection with Minsky’s paper, Winograd notes that any explanatory theories will have to deal with the observed facts about representation and computation. (p. 172) This statement is true but it is irrelevant to our discussion of Minsky’s paper, for Minsky brings forward nothing new concerning the nature of “observed facts about representation and computation”. (4) Winograd reiterates his belief that the important properties of language will be explained through the way different aspects of language are processed and the ways these processes interact in language use. Through studying the structure and behavior of computer programs which carry out analogous processes, we will develop a better understanding of this interaction. (p. 173) This claim is a repetition of a claim often made by AI researchers on language and it is one that we examine in great detail in our paper. Indeed, it is one of the central arguments of our paper that the writing of computer I4 Or to use a popular idiom, Winograd is dealing in semantics. The exact same point holds for Winograd’s alleged distinction between the “reductionism” which he attributes to Chomskian linguistics and some work in AI and his own “organicism.”

390

B. Elan Dresher and N. Hornstein

programs such as Winograd’s does not contribute to explaining the “important properties of language,” among other reasons because the processes which these programs carry out are not “analogous” in any interesting sense that these programs do carry out to what humans do.” By presupposing “analogous processes” Winograd begs the very point which is at issue.16 Incidentally, we never say that “the desire to build working computer systems is antithetical to the development of linguistic theory” (p. 174), or that the writing of usable language “understanding” systems will “as an inevitable consequence” (p. 174) not involve scientifically important issues. In the same paragraph in which he attributes this position to us Winograd quotes us correctly as saying that “the question of universal principles need hardly ever arise” when one is building “usable language understanding systems”, and that this task “leads one away from a consideration of these issues” (pp. 330-331). However, there is a great difference between saying that issues “need hardly ever arise” and saying that it is “an inevitable consequence” that they will not arise; similarly it is one thing to say that a certain aim “leads away” from a consideration of certain concerns and it is quite another thing to say that it is “antithetical” to them. To take a somewhat extreme example for the sake of illustrating this point: It is quite conceivable that important insights into the nature of quantum mechanics can come to someone while cooking a meal; nevertheless, it is still fair to say that the question of general principles of quantum mechanics need hardly ever arise in the course of this activity, and more, that such activity generally leads one away from a consideration of such issues. Our point is that programming a computer to manipulate language in some limited domain is a task which can be carried out without any consideration of general principles of language (a point that we support by showing that the specific programs that we review make virtually no contribution to the understanding of language) and that more often than not a consideration of such issues will contribute nothing to successfully programming a computer to this end. Thus, one interested in accomplishing this programming task might reasonably avoid such general issues.

“See, for example, our discussion of Winograd’s semantic system, pp. 351-353. l6 Similar considerations hold for Winograd’s discussion of programming where question at issue by claiming that the complexity and form of a successful computer the properties of the domain in which they work” (p. 175).

he again begs the program “mirror

Reply

to Winograd

39 1

IV. Conclusion Winograd points out that It has become overly fashionable for anyone whose work is not generally accepted in a scientific field to claim that this is because he or she is engaging in a “scientific revolution” and that all objections to the work are due to a defense of the old established paradigm. (Note p. 152) Despite his recognition of the dangers of this approach this is exactly the tack that Winograd takes in replying to our article. Thus, rather than dealing directly with our critique of recent AI research into language he construes all of our comments as arising from our adherence to an old established paradigm - the “Chomskian paradigm” - which is now being challenged by the new “computational paradigm”. We have tried to show that there is no paradigm difference of the sort that Winograd claims that there is, for the so-called “Chomskian paradigm” as he portrays it does not exist. More importantly, his appeal to a paradigm difference is irrelevant to our particular arguments, as they do not rest on any special assumptions about the nature of explanation.17 While it is possible that there currently exist many different paradigms which guide the study of language, and while it is possible that differences between paradigms can be so fundamental that it is not possible to discuss their assumptions without appeal to metaphor, we feel that the appeal to paradigm differences in this case has only served to evade our earlier criticisms and obscure the real issues involved in a scientific study of language.

Bibliography Chomsky, N. (1957) $vntactic Structures, Mouton, The Hague. Chomsky, N. (1965) Aspects of a Theory of Syntax, MIT Press, Cambridge, Mass. Miller, G. A. and N. Chomsky (1963), Finitary models of language users, in R. D. Lute, R. Bush and E. Galanter, eds., Handbook of Mathematical Psychology, Vol. 2, John Wiley and Sons, New York. Dresher, B. E. and N. Homstein (1976), On some supposed contributions of artificial intelligence to the scientific study of language, Cog., 4, 321-398.

“Furthermore, it is unclear that the notion of a paradigm can support the weight that Winograd wishes to place on it. Throughout his reply Winograd espouses a particularly narrow and extreme brand of Kuhnism which, if taken to its limit, would rule out all rational discussion of competing scientific theories. Kuhn’s notion of paradigms and Winograd’s extreme interpretation of it is an extremely controversial notion in the history and philosophy of science. For various critiques of Kuhn’s position see Lakatos and Musgrave (1970), Scheffler (1967). and Shapere (1964). It is also interesting to note Kuhn’s replies to the discussion in Lakatos and Musgrave (1970), where the extreme interpretation that Winograd adheres to is explicitly rejected by Kuhn.

392

B. Elan Dresher and N. Hornstein

C. (1965), Aspects of Scientific Explanation (ASE), The Free Press, New York. C. (1965), Aspects of Scientific Explanation, in ASE. C. and P. Oppenheim (1965), Studies in the Logic of Confirmation, in ASP. 1. and Musgrave (1970), Criticism and the Growth of Knowledge, Cambridge University Press, 1970. Popper, K. (1959), The Logic ofScientific Discovery, Hutchinson, London. Scheffler, I. (1963), The Anatomy of Inquiry, Alfred A. Knopf, New York. Scheffler, I. (1967), Science and Subjectivity, Bobbs-Merrill, New York. Shapere, D. (1964), The structure of scientific revolutions (a review), Philosoph. Rev., LXXIII. Winograd, T. (1977), On some contested suppositions of generative linguistics about the scientific study of language, Cog., 5, 151-179. Hempel, Hempel, Hempel, Lakatos,

395

Cognition

Author Index of Volume 5

Binks, Martin G., 47 Brown, Roger, 73, 185 Byrne, Richard, 287

Jeannerod, Marc, 3 Johnson-Laird, Philip N., 189

Delis, Dean, 119 Dickson, W. Patrick, 215 Dresher, B. Elan, 147,377

Kean, Mary-Louise, 9 Kulik, James, 119

Evans, J. St. B. T., 265

Lane, Harlane, 101 Lewis, Selma, 333

Grieve, Robert, 235 Grosjean, Francois, 101 Hatano, Giyoo, 47 Hoogenraad, Robert, 235 Hornstein, Norbert, 147, 377

Miyake, Naomi, 2 15 Miyake, Yoshio, 47 Murray, Diarmid, 235 Muto, Takashi, 2 15 Newstead, S. E., 265

Petrey, Sandy, 57 Reber, Arthur S., 333 Robinson, E. J., 363 Robinson, W. P., 363 Schank, Roger C., 133 Slater, Anne Saxon, 119 Smothergill, Daniel W., 251 Schonbach,

Peter, 18 1

Vadhan, Vimla P., 25 1 Wilensky, Robert, 133 Winograd, Terry, 15 1