Cognition, 6 (1978) 263-289 @Elsevier Sequoia S.A., Lausanne
1 - Printed
in the Netherlands
Three conditions on conce...
31 downloads
1210 Views
6MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Cognition, 6 (1978) 263-289 @Elsevier Sequoia S.A., Lausanne
1 - Printed
in the Netherlands
Three conditions on conceptual naturalness* DANIEL
N. OSHERSON**
Massachusetts
Institute
of Technology
Abstract Human infants are predisposed to organize their experience in terms of certain concepts and not in terms of others. The favored concepts are called natural, the remainder, unnatural. A major problem in psychology is to state a principled distinction between the two kinds of concepts. Toward this end, thepresentpaper offers three, formal necessary conditions on the naturalness of concepts. The conditions attempt to link the problem of naturalness to the distinctions between sense versus nonsense, simplicity versus complexity, and validity versus invalidity. 1. The problem
of natural concepts
I. 1 Naturalness and conceptual systems One achievement of childhood is the development of a conceptual system that is used to organize experience and structure thinking. Partially on the basis of experience - perhaps only minimal - children come to fix on a set of concepts for this purpose. A moment’s reflection reveals that there are an unlimited number of conceptual systems that a child might choose, even counting only those potential systems of concepts that do justice to the correlational texture of the world. That is, reality grossly underdetermines the set of concepts that we use to organize it. There must be, then, some predisposition to emerge from childhood with one set of concepts rather than another. Let us dignify that predisposition with some terminology. Consider the broad range of experiences that are normal or typical for a given culture; the children of that culture are disposed to map those normal experiences onto a certain set of concepts and not onto others. Let us call the chosen set of concepts the naturaZ concepts for that culture, and all others unnatural. It is a central feature of human intelligence that it selects the set of natural concepts that it does; no logical necessity underwrites the particular choice *Frank Keil, Janet Krueger, Julius Moravcsik and Thomas Wasow made helpful comments on earlier drafts of this paper. The remaining confusions, blunders, and obscurities are my own. **Requests for offprints should be addressed to Daniel N. Osherson, Massachusetts Institute of Technology, Cambridge, Mass. 02139, U.S.A.
264
Daniel N. Osherson
we make: other kinds of intelligences, faced with typically human experiences, may well choose differently. To make this point another way, human babies may be taken to implement a function that is defined on environments and takes values in conceptual systems (much as in Chomsky, 1975, ch. 1). A baby maps an environment that is normal or typical for a given culture into a conceptual system that is, by definition, (humanly) natural with respect to that culture. Infants of other species of intelligence implement different mappings, and may thus evolve unnatural concepts (that is, unnatural from the human point of view) in normal human environments. Note that a concept may be acquirable in abnormal circumstances or by dint of explicit instruction and drill without prejudicing its status as natural or unnatural. It is the possibility of mastery on the basis of ordinary childhood exposure (typically casual) that defines naturalness or its lack. In this paper I will attempt to provide nontrivial necessary conditions on the class of concepts that are natural for our culture. To do this I will exploit the intimate connections that exist between the phenomenon of natural versus unnatural concepts on the one hand, and such other intellectual phenomena as the abilities to distinguish validity from invalidity, sense from nonsense, and simplicity from complexity. These distinctions will be examined in turn as we proceed, and related to conceptual naturalness. 1.2 Five tests of conceptual naturalness Before getting under way, more should be said about the difference between natural and unnatural concepts. Natural concepts include green, blue, furniture, belief tiger, every, and so forth. - Not the words “green”, “furniture”, and “every”, but the concepts that lie behind them. An unnatural concept is meloncat. An object counts as a meloncat just in case it is a watermelon about to be squashed by a giant pussycat, or a pussycat about to be squashed by a giant watermelon. More generally, natural concepts are distinguished by the domain of environments sufficient for their mastery by infants (as noted above). Since direct verification of naturalness is often not feasible, it is useful to employ indirect tests. The following tests have prima facie validity, and we shall rely on them as rough indices of naturalness. First, natural concepts tend to have syntactically simple realizations in natural language, at the limit as lexical items like “grapefruit”. Unnatural concepts are generally expressed more grudgingly, often requiring elaborate defining expressions as well as disjunctive clauses - as in the definition of meloncat. What needs explaining is not the relative comprehensibility of syntactically simple structures of natural language compared to syntactically complex ones, but rather the difference between concepts assigned simple
Three conditions on conceptual naturalness
265
syntactic realizations in a natural language, and concepts whose expression is perforce more cumbersome. So it would be question-begging to attempt to explain the difference between natural and unnatural concepts by invoking the length or disjunctive nature of the defining expression in natural language; such differences in syntactic complexity are part of the phenomenon of naturalness versus unnaturalness. Ordinary English could, after all, have had simple means of designating meloncats; just as it might have offered only twenty word, disjunctive expressions meaning “friendly”. There is no absolute metric according to which meloncat is a more complicated concept than friendship. The second sign of a natural concept is that it is “projectible” in the sense that it can figure in law-like generalizations that support counterfactuals. Universal generalizations invoking meloncathood tend to seem accidentally true at best, even in the face of confirming evidence. Third, natural but not unnatural concepts figure in our assessments of identity through time of a given object, and of the similarity of different objects. Fourth, natural but not unnatural concepts support successful metaphors, similes, and analogies. And fifth, natural concepts are distinguished by the relative facility with which people can perform deductions pivoting on those concepts.’ 1.3 Some helpful idealizations of the problem These five senses of “natural concept” are not coextensive; the criteria do not agree all the time. But there is substantial overlap, at least within our culture; and the overlap seems sufficient to risk, especially at this early stage of investigation, confounding these diverse criteria by referring to the naturalness of concepts simpliciter. Moreover, the work of Eleanor Rosch (1973), and of Berlin and Kay (1969) suggests that different cultures do not differ much in their intuitions of naturalness. But for safety’s sake, I will limit my claims to one culture: ours. As a final idealization of the situation, it will sometimes be useful to ignore the obviously graded quality of conceptual naturalness, and proceed instead as if the distinction were dichotomous. These idealizations result in many unclear cases. But enough palpably natural concepts and palpably unnatural concepts remain to render challeng-
r
Thevalidity of these tests is not guaranteed on a priori grounds: it is logically possible that concepts easily acquired by human infants demand prolix expression in natural language, etc. Such a perverse state of affairs, however, would threaten to render much adult psychology miraculous. The assumption of validity has sufficient empirical plausibility to warrant its adoption as a hypothesis for the present study.
266
Daniel N. Osherson
ing (to say the least) the problem of stating a non-question-begging between the clear cases. To this task we now turn.
2. The first condition:
conformity
to the
difference
M principle
2.1 Spanning The first condition on conceptual naturalness that we shall entertain derives from the work of the philosopher Fred Sommers (1959, 1963). Sommers attempts to define ontological types or categories out of the idea of a “category mistake”. A category mistake is an absurd attribution of a property to an object. The linguistic reflex of a property is a predicate; that of an object, a term. A predicate is said to spun a term just in case attribution of the designated property to the designated object is not a category mistake. More precisely : Definition. A predicate spans a term if and only if that predicate-term combination makes sense and can be assigned a truth value. This value can be either true or false. Thus, the predicate “green” spans the term “the frog” because the sentence “The frog is green” is sensible and has a truth value, namely, true. The predicate “green” also spans “milk” because the sentence “Milk is green” is also sensible and has a truth value, namely, false. Negation yields the true sentence “Milk is not green”. In contrast, “green” fails to span the term “the prayer” since the sentence “The prayer is green” is senseless; it is a category mistake, neither true nor false. Negation yields the equally absurd sentence, “The prayer is not green”, as if prayers were some other color, or transparent. On the other hand, the predicate “three hours long” spans “the prayer” since “The prayer is three hours long” is sensible. But “three hours long” does not span “milk”. Certainly the sentence “The prayer is green” can be interpreted in some metaphorical way, but let us set aside that issue; spanning concerns nonmetaphorical interpretations only. 2.2 Predicability trees Sommers’ discovery (or at least, claim) is that people’s intuitions about spanning in natural language organize themselves into a tree, called a predicability tree. Such a tree is illustrated in Figure 1. This is not Sommers’ tree; he is noncommittal about the precise tree for English, being more concerned with its form than its content. The tree of Figure 1 was constructed by Frank Keil (see Keil, 1977) to fit his own intuitions. Predicates in the tree are spelled in capital letters whereas terms are spelled in lower case.
Three conditions on conceptual naturalness
Figure 1.
267
A predicability tree (after Keil, 1977). Predicates in the tree are spelt in capital letters; terms, in lower case. The predicate IS DEAD spans all the terms spanned by the predicates IS ASLEEP, IS WILTED, and IS HONEST, namely, all the terms referring to flowers, pigs, people, etc.; simikzrly for IS SICK, which is at the same node as IS DEAD. Milk, on the other hand, cannot be sensibly (and literally) called “‘sick”,so ISSICK does not dominate “the milk” in the tree. The predicate HAPPENED YESTERDA Y spans recesses and collisions, but not refrigerators, secrets, or justice (none of the latter things can be sensibly said to happen yesterday). Recesses but not collisions are spanned by WAS AN HOUR LONG. Dominance relations in the tree represent these fat ts. IS THOUGHT ABOUT IS INTERESTING
IS RED IS HEAVY
IS TALL IS SKINNY
HAPPENED
LEAKED OUT OF THE BOX
h IS DEAD IS SICK
h WAS AN HOUR LONG
‘1,
‘1,
IS FIXED IS BRYKEN
the milk the water
h IS ASLEEP IS HUNGRY h
‘\ IS WILTED BLOOMED .
the car the refrigerator
IS HoNEST IS SORRY
ihe pig the rabbit
the flower the tree
I‘ \ \
YESTERDAY
ASPyR WAS AT the\idea 2:00 p.m. the secret
justice freedom
‘1, the recess the vacation
the start the collision
\\\
\ \
the man the girl
Two rules interpret the tree. First, a predicate spans all terms that are connected to it by a dotted line. Second, a given predicate spans all terms that are spanned by any predicate dominated by the given predicate. Examples are provided by the caption to Figure 1. One consequence of these rules is that each node may be taken to represent not only a class of predicates but also a class of terms - namely, all those terms dominated by, and hence, spanned by, the predicates at that node. Derivatively, each node can be taken
to represent the class of objects referred to by the terms represented at that node. Considering different parts of the tree reveals that “happened yesterday” spans all events, “heavy” spans all physical objects, ‘“is about a spider” spans all descriptions, and “was achieved” spans states of affairs. The tree shown in Figure 1 lists only a fraction of the natural language predicates and terms that constitute the members of each class. Thus, along with “honest” and “sorry” goes “sincere”, “kind”, and so forth. Even with such additions the tree is probably incomplete, failing to make some ontologically relevant distinction. Also note that only one place predicates are shown in the tree. Two place relations like “happier than” do not appear. Trees for predicates of a greater number of places are possible (see Keil, 1977), but there is no need to discuss them here; we’ll return to n place predicates shortly, in another context (Section 3). Appare~~tviolations of the M principle (after Somniers, 1963).
Figure 2.
LASTED NOON
TILL
IS S~~UDGED /\
‘\
’
1’
\\
\\ \ \
the
\
\
\
the signature
IS HONEST \ \ /
\ /
\
/ \
/ /
\ /
\ \ c-l the man
/
IS DIVISIBLE BY THREE /
IS RAFoNAL / \ / \ \
/’ \
/ \
/ \
/ \
/
\ \
/
V the number
Three conditions on conceptual naturalness
269
2.3 The M principle What makes our intuitions about category mistakes representable as a tree? Why don’t branches meet in the downwards direction? That such downward junctures do not occur is a phenomenon named the M principle by Sommers; the M principle asserts that predicability trees for natural language never form either an “M”or a “W” of the kind shown in Figure 2. Sommers claims that such structures can appear in a person’s predicability tree only through equivocation, as the result of ambiguous terms and predicates. In the case shown in Figure 2, the term “the period” is ambiguous in the “M”, while the predicate “rational” is ambiguous in the “W”. Thus, although the predicates “lasted till noon” and “is smudged” both span “the period”, it is clear that “lasted till noon” spans one kind of period (the time interval), while “is smudged” spans another kind of period (the punctuation mark). Consequently, this tree fragment ought to be redrawn with a term “the period,” grouped with “the recess”, and a term “the period*” grouped with “the signature”. Thus distinguished, it is as much a category mistake to call a period, smudged as it is to call a recess smudged. Similarly with rational. Of course, if an M or W turns up in someone’s predicability tree, and the relevant terms or predicates do not seem ambiguous, then the M principle would be false; this principle is a falsifiable, empirical claim, not bound to be true. I have not heard a clear counterexample to the M principle, so I think that it is true - but it must be admitted that there are apparent counterexamples that lead to some theoretical indigestion though not fatality.* Keil (1977) provides an illuminating discussion of such cases, and of the measures that can be taken to disarm them. 2.4 Psychological reality of the predicability tree So the Sommers style predicability tree represents our intuitions about category mistakes. Derivatively, the tree reconstructs intuitions we have about the similarity and dissimilarity of certain classes of things. Thus the class
*The predicate “brilliant”, for example, spans both people and ideas. Sommers’ theory requires that it be ambiguous, lest it yield a W in the predicability tree; but not everyone detects the required pair of senses of “brilliant”. In the absence of reliable tests for ambiguity, it seems wise to leave such unclear cases as “brilliant” for the theory to decide; our naked intuitions, examined individually, should not be considered an infallible measure of our competence. It is the lack of unqualified counterexamples that inspires confidence in Sommers’ Theory. Sommers’ theory seems also to require that category mistakes be distinguishable in a principled way from another kind of semantic deviance called “anomaly “; “The contest weighed five pounds” is thusdistinguished from “The dog whinnied” (assuming the latter to be more than false). Only category mistakes, not anomaly, are relevant to the construction of a predicability tree. Keil (1977) discusses the importance of this distinction to Sommers’ program, and reviews the attempts of Sommers and others to enforce it.
270
Daniel N. Osherson
Figure 3.
Predicability tree fragment used in KeilS developmental study (after Keil,
1977). THINK
OF
A-----__ HEAVY \
\
\
f TALL
milk
\ r \
\
ASLEEP
\ f
I
SORRY I
\
\
AN HOUR LONG \ \ \ t.v. show recess
idea secret
chair \ tree flower
\
\ pig rabbit
man girl
consisting of all humans seems to be more similar to the class of all non-human animals than it does to the class of all machines; and the class of all plants seems more similar to the class of all liquids than it does to the class of all events. These and other judgments conform to the rule that pairs of terms in the tree that are more immediately dominated by a common predicate have more similar referents than do pairs of terms that are more remotely dominated. Keil (1977) has verified this rule by asking undergraduates to order for similarity such pairs of classes, drawn from different parts of the tree. Additionally, Keil has collected judgments about category mistakes from kindergarten, 2nd, 4th, and 6th grade children. He worked with the tree shown in Figure 3, which is a fragment of his original predicability tree. Children’s judgments about category mistakes were assessed by determining independently whether a given sentence seemed true to a child and whether its denial seemed true. The category mistakes, from the child’s point of view, were taken to be those sentences that were neither true themselves nor had true denials. Keil was also able to elicit judgments of ambiguity from the children so as to check apparent violations of the M principle.
Three conditions on conceptual naturalness
271
Kindergarten, second, and fourth grade predicability trees, respectively (after
Figure 4.
Keil, 1977). THINK
THINK OF HEAVY
AN HOUR LONG I
TALL
I‘\ \ s7RRY
I
man
\ \ \ r \
chair milk
I t.v. show secret
ALIVE
\milk house
, \
flower ASLEEP SORRY
pig
\\\
t.v. show secret
\ fkwer
AN HOUR
HEAVY
AN HOUR LONG
HEAVY TALL
i
\ ALIVE ASLEEP
THINK OF
OF
house
ALIVE \ r ASLEEP SORRY
t.v. show
\
\ flower
man Pig man Pig
From these data a predicability tree (or nontree) for each child could be constructed using ti simple algorithm. These immature trees had two interesting properties. First, out of tens of thousands of occasions for a violation of the M principle, only 14 apparent violations were observed across the children; eight of these could be resolved as ambiguities from the child’s point of view. Second, immature trees were simply collapsed versions of the adult tree, this collapsing occurring in a characteristic way: predicates were pushed up the tree by the children; seldom was one predicate pushed past another to form the immature tree. This special kind of collapsing is seen in Figure 4, which shows kindergarten, 2nd grade, and 4th grade trees. The data are summarized in Table 1, which reports how much predicate inversion occurred at each grade. Keil (1977) provides interesting observations about these data, and how they bear on general issues in cognitive and linguistic development. 2.5 The first condition These experiments, along with others not mentioned, combine with the introspective evidence adduced earlier to verify the psychological reality of the Sommers style predicability tree. Let us now return to the problem of
272
Duniel N. Osherson
Table 1.
The extent to which children’s trees were collapses of Figure 3 (after
Keil,
1977)
Grade
Perfect collapse
One inversion of predicates
Two inversions of predicates
K (n = 16) 2 (n = 16) 4 (n = 16) 6 (n = 8)
63% 94% 100% 100%
31% 6%
6%
distinguishing natural from unnatural concepts, and exploit tree in stating our first condition on conceptual naturalness.
the predicability
First proposed condition on conceptual naturalness: A concept is natural only if there is a node in the predicability tree that dominates exactly what the concept spans. To illustrate this condition, consider the concept huquid. Something is huquid just in case it is either envious or leaks out of cardboard boxes. By the proposed criterion huquid is unnatural because there is no single node in the tree that dominates what and only what huquid spans, namely fluids and people. It is the M principle that does the work here: were huquid in the tree, the M principle would be violated. The situation is similar for neaves, which refers to anything that has leaves or happens at noon. Meloncat is also ruled unnatural by the present criterion. In contrast, the concepts square, jubilant, took over an hour, and so forth meet the proposed condition on naturalness, as they should; placing them in the tree does not lead to a violation of the M principle. We have here, at best, only a necessary condition on conceptual naturalness. That it is not sufficient is revealed by the concept denvelope - designating envelopes mailed by Moslem dentists - which spans terms all of which are uniquely dominated by a single node of the predicability tree, namely, the node for nonliquid, physical objects. This first condition is misformulated to some extent. The condition is relevant only to entire conceptual systems, rather than to individual concepts considered in isolation. A non-human kind of intelligence, after all, may entertain a set of concepts so different from ours that huquid and neaves could be placed in its predicability tree without violation of the M principle; conversely, placement of natural concepts like happy in such an alien tree might yield an M. Whether a concept conforms to or violates the M principle depends on background concepts and the predicability tree they define. So
Three conditions on conceptual naturalness
our first condition on conceptual what, and recast this way:
naturalness
ought to be weakened
273
some-
First condition on conceptual naturalness, revised: A set of concepts comprise a natural conceptual system only if they can be organized as a predicability tree, without violation of the M principle. The set of concepts consisting of those named in a standard dictionary appears to meet this revised necessary condition; if the concept huquid is added to this set, the revised condition fails to be satisfied, as desired.
3. The second condition:
ordering by extensional
simplicity
So much for the Sommers’ style predicability tree, and for the condition on conceptual naturalness that is based on it. Our next project is to link conceptual naturalness to conceptual simplicity and complexity, in a special sense of these terms. 3.1 Extensions and intensions of concepts To begin, recall that every concept has both an intension and an extension. The intension of a concept is its meaning; the extension is the class of things to which it applies. If the concept specifies a property, then the extension is a set of objects, namely, those objects enjoying the property. If the concept specifies a two-place relation, then the extension is a set of ordered pairs. namely, those pairs whose first member stands in the stated relation to the second member. Thus, the extension of the love relation is the set of pairs of people such that the first loves the second. Similarly for three place relations, and so forth. From the psychological point of view it is the intension or meaning of a concept that interests us; but we may be able to use properties of the extension of a concept to state a condition on the naturalness of the concept’s intension. One property of an extension is its simplicity or complexity. An extension is just a set of n-tuples - pairs, triples, singletons, or whatever and we may consider the simplicity of this set from a “structural” point of view, this point of view to be described later. With such a notion of simpli-
31f the intension of a concept is taken to be that which fixes the concept’s extension, and the meaning of a concept is taken to be its mental correlate, then intensions and meanings may stand in’a less intimate relation than suggested here (see Putnam, 1973). The resulting subtleties, however, have little effect on the sequel.
2 74
Daniel N. Osherson
city in hand, applying to extensions, we might go on to hypothesize that natural concepts never have complex extensions. This conjecture is tantamount to a second proposed condition on conceptual naturalness, namely, extensional simplicity. The difficulty with this last hypothesis is that the extension of a concept depends not only on the concept’s intension, but also upon nonconceptual, empirical circumstance. For example, the extension of loves depends not only on its intension but also upon the emotional lives of the several billion potential lovers on this planet. In general, the extension of a concept will vary with the domain of discourse under consideration. If the domain is made up of true Christians, the extension of loves will be universal, consisting of every pair of members drawn from the domain. If the domain is restricted to Ayn Randists, the extension of loves will consist exclusively of pairs of the form (x,x>, every member of the domain paired with him or herself and nobody else. 3.2 Comparing the extensional simplicity of concep ts A useful notion of relative extensional simplicity can be rescued variability by means of these definitions:
from this
Definition I: Concept C is extensionally at least as simple as concept C’ if and only if for all (possible) domains d, comp(C,d) B comp(C’,d), where “comp(C,d)” denotes the complexity of the extension of concept C in domain d. Definition 2: If concept C is extensionally at least as simple as concept but not conversely, then C is extensionally simpler than C’.
C’
According to Definition 1, concept C is extensionally at least as simple as concept C’ just in case no matter what domain of objects is chosen (be they real or imagined), the extension of concept C is not more complex than that of concept C’. If, in addition to C being extensionally at least as simple as C’, there is at least one domain in which the extension of C is simpler than that of C’ (not merely no more complex), then concept C is extensionally simpler than concept C’; so says Definition 2. Definitions 1 and 2 are useless, of course, without a means of calculating comp(C,d). A theory that provides such means will be described in Section 3.4. Note that these definitions impose only a partial order on concepts; not every pair of concepts will be commensurable in terms of extensional simplicity: often, one concept will have a simpler extension than the other in one domain but the reverse will be true in a second domain. In this case, neither concept is extensionally simpler than the other.
Three conditionson conceptualnaturalness
275
3.3 The second condition These definitions allow us to resurrect the spirit of our original conjecture about the relation between conceptual naturalness and extensional simplicity. Our new hypothesis is that the ordering among concepts by relative naturalness honors the ordering among concepts by extensional simplicity; that is: Hypothesized relation between conceptual naturalness and extensional simplicity: If concept C is extensionally simpler than concept C’ (in the sense of Definitions 1 and 2), then C is conceptually more natural than C’. If this hypothesis is correct, then it underwrites tion on the naturalness of conceptual systems:
another
necessary
condi-
Second condition on conceptual naturalness: A system of concepts is natural only if differences among concepts in naturalness are faithful to their differences in extensional simplicity. According to the second condition, a conceptual system is natural only if concepts ordered by extensional simplicity (in the sense of the definitions) are ordered in the same direction by relative conceptual naturalness. 3.4 Goodman’s theory of structural simplicity At last let us turn to the problem of assessing the simplicity or complexity of sets of n-tuples. Obviously, we need a theory of extensional simplicity if sense is to be made of our definitions and hypotheses. In his book The Structure of Appearance (1966), Nelson Goodman provides just such a theory, a beautifully axiomatized calculus for assessing the simplicity of subsets of nfold Cartesian products, which subsets can be taken to be the extensions of predicates. Goodman’s theory deals with the structural simplicity of sets of n-tuples in the sense that the nature of the elements comprising the n-tuples is irrelevant to the theory; all that matters is the pattern of identity and diversity among elements of the n-tuples, that is, whether the second members of all the n-tuples of a set are identical, whether the third and fifth members of each n-tuple are distinct, and so forth. Within the theory these kinds of structural properties of sets are coded in terms of a handful of set-theoretical definitions. These definitions make reference to the reflexitivity of a set (e.g., for sets of 3-tuples, whether any triple of the form &x,x> occurs in the set), the symmetry of a set (e.g., for sets of 2-tuples, whether a pair of the form oC,y) occurs only if the corresponding pair (y,x> occurs), as well as a variety of less familiar properties. The axioms of Goodman’s theory then assign a nonnegative integer to any given set on the basis of the structural properties it enjoys. This integer is its complexity value; the higher the integer assigned to a set by the theory, the more complex that set is supposed to be.
3.5 Psychological reality of Goodman’s theory Goodman has thus provided an explication of the notion of “extensional simplicity” that may be plugged into the key definitions presented earlier. Before examining the consequences for our second condition on conceptual naturalness, let us see what reason there is to believe that Goodman’s theory is true. Goodman’s theory cannot be tested directly since extensions are abstract entities that cannot be presented to subjects in a direct, perceptual way.’ As an indirect test of the theory, Janet Krueger and I presented subjects with “pictures” of extensions. Suppose that extensions of two place relations are under consideration, and that the domain of the relations is finite (say, roughly a dozen objects). We represent the domain as points on the circumference of an imaginary circle; each member of the domain is represented twice because two place relations are in question. Then, to represent the extension of a two place relation defined in that domain, the points paired in the extension are connected by lines in the picture; in some experiments we connected identity pairs from the extension with curved lines, and diversity pairs with straight lines; in other experiments the reverse choice was used. The resulting pictures are exemplified in Figure 5, which exhibits one picture for each of the five Goodman complexity values available to two place relations. Sets of stimuli like these were generated for a given experiment by selecting, at random, extensions that meet a number of conditions: each extension must contain the same number of pairs; every Goodman complexity value must be represented, and so forth. The drawings were given to undergraduates to be rank ordered for perceptual simplicity, organization, and systematicity. The students’ rank orders were then correlated against the complexity values assigned by Goodman’s calculus to the corresponding extensions. The results have been favorable to the theory. For two place relations the median within subject correlation is 0.90. When the data are averaged so that the median rank of a picture across subjects is correlated against the Goodman values, the correlation is 1.0. Three place extensions generate pictures exemplified in Figure 6, which shows a different drawing method than before. Three place pictures like these were presented to students in sets of 16 - one for each possible Goodman value for three place relations. The resulting median within subject correlation is 0.63; when the data are averaged as before this rises to 0.93. Attneave’s measure (described in Zusne, 1970) of perceptual simplicity - number of turns in outer contour, and the like applied to these drawings yields significantly lower correlations. 4Extensions, being sets, arc neither tangible nor perceptible.
Three conditions on conceptual naturalness
Figure 5.
277
Pictures of sets of pairs. Goodman complexity values range from 0 to 4.
Krueger and Osherson (1978) describe this research in some detail, and attempt to characterize the class of drawing methods that suits Goodman’s theory, that is, to characterize the drawing methods that pair complex sets with complex pictures, simple sets with simple pictures. Krueger (forthcoming) presents additional experimental data on the psychological reality of Goodman’s theory; these new experiments employ standard memory paradigms and nonpictorial representations of sets. 3.6 The second condition, evaluated and revised The experimental work just reported gives reason to believe that Goodman’s account of extensional simplicity is not far from the psychological facts. Now let us return to our second condition on conceptual naturalness. Consider the familiar two place relation smaller than, and the unfamiliar two place relation smagger than. A pair of objects oC,y) stands in the smagger than
278
Daniel N. Osherson
Figure 6.
Pictures of sets of triples. This is a different drawing method than in Fig. 5.
relation just in case x is smaller than y, or everything is at least as big as x. It is easy to prove within Goodman’s theory (see Krueger and Osherson, 1978, Appendix) that in every domain the extension of smaller than is at least as simple as that of smugger than; 5 moreover, there are certain domains in which smugger than’s extension is more complex than smaller than’s. Hence, the concept sma2Zer than is extensionally simpler than the concept smugger than, according to the definitions of Section 3.2. By the proposed second condition on conceptual naturalness, smaller than is predicted to be more natural than smugger than, which is true. Although the second condition on conceptual naturalness is supported by innumerable examples like these, it is threatened by odd cases that turn on the multiplicity of concepts that can share the same extension. The difficulty is brought out by considering the concept of horse laugh identity. An object x is horse laugh identical just in case either or both (a) x is identical to itself, or (b) horses laugh. It is easily demonstrated that horse Zaughidentity is exten5Naturally, the proof of smaller than.
relies on the meaning
of these concepts,
invoking,
for example,
the asymmetry
7kree conditions on conceptual naturalness
279
sionally simpler than the concept red; this, of course, spells trouble for our second condition. The difficulty is resolved by noting that horse laugh identity has the same extension as ordinary identity in every domain; that is, the two kinds of identities are logically coextensive, as are each of them with the concept exists. Accordingly, we can avoid this kind of counterexample by partitioning logically coextensive concepts into equivalence classes, and, on the same basis as before, attempting to order for naturalness only the respective classes’ most natural members.6 Since exists is more natural than the logically coextensive horse laugh identity, only the former is compared to red for naturalness, not the latter. Is exists more natural - even if only slightly more natural - than red? So the theory predicts. Our intuitions are not clear about the matter, probably because both concepts lie at one extreme of the naturalness continuum. In light of the theory’s success in judging more obvious cases, I am content to let it decide about this case, judging exists the more natural. It should be kept in mind that the second condition is a claim about conceptual competence, and intellectual competencies in general are seen only through the mist of irrelevant performance factors. Someone unsatisfied by these considerations can weaken the second condition still further by making differences in extensional simplicity predictive of weak (“less than or equal”) differences in conceptual naturalness rather than the present strict differences. Modified either way, the second condition is immune from any clear counterexample that I have yet been able to construct. 4. The third condition:
formalizability
as a minimal Fitch logic
The first condition linked conceptual naturalness to category sense versus nonsense. The second condition linked it to extensional simplicity versus complexity. Our last project is to link conceptual naturalness to the validity versus invalidity of deductive arguments. Specifically, we shall attempt to characterize the class of natural connectives. To begin, what kind of concept is a connective? 4. I Connectives
As discussed in Section 2.1, concepts differ in the nature of the things to which they can apply. Let us isolate for study concepts like certainly and 6Such equivalence classes are infinite; how is the most natural member of a given class to be determined? One pre-theoretical criterion of conceptual naturalness is succinct expressibility in natural language (see Section 1.2). The most natural member of a given equivalence class, therefore, ought turn up in the finite set of reasonably succinct expressions of English (say) that pick out concepts in the class.
280
Daniel N. Osherson
Table 2.
Some natural connectives Necessarily Possibly It is impossible
that
implies It is not the case that and only if Or
but not both
01
Nothing
is such that
Everything
is such that
Something
is such that _
It ought
to be that
It is permitted
that
It is forbidden
that
At all times Sometimes Never It is sensible It is foolish
to believe that to believe that
implies that apply to propositions. Riding roughshod over some technical matters, we can think of such propositional modifiers as combining a given number of propositions into a new proposition; thus, the concept implies operates on two propositions p and q to form a new proposition, p implies q. Propositional modifiers thus behave as statement connectives, and we shall call them connectives for short. Thus, implies is a two place connective, whereas the concept necessarily is a one place connective, taking a single proposition p into a new proposition, necessarily p. Natural connectives include all those shown in Table 2.7 To define an unnatural connective, W, let 0 be a given proposition of unknown truth value, say, this one: The first archer missed his first target. W may then be defined as follows. 7Note that I count some of the elementary truth functions like conjunction and disjunction as natural. Although English locutions like “or” and “if... then...” may not express these concepts (there is room for doubt about this; see &ice, 1975), simple truth functions have alternative renditions in English that are reasonably succinct. Moreover, the simple truth functions seem to meet other of the pre-theoretical criteria of naturalness germane to them.
77weeconditionson conceptualnaturalness 281
Definition.
Wp if and only if p has the same truth value as 13.
So, W that p (i.e., Wp) just in case p is false and the first archer hit his first target, or p is true and he missed it. (Different choices for 0 yield different, but formally analogous connectives.) 4.2 Reasoning, deduction, and logics In the next few sections a theory will be described that is designed to formally distinguish at least some of the unnatural connectives like W from the natural ones like necessarily. Intuitively, the theory says that a connective is natural only if reasoning that involves it can be carried out in a natural manner. The content of the theory thus consists in spelling out the meaning of reasoning in a natural manner. This pivotal notion of natural reasoning is spelled out by invoking a formally characterized class of logics, called minimal Fitch logics - “Fitch” because the minimal Fitch logics are a subset of logics whose main features are due to the logician F. B. Fitch at Yale (Fitch, 1952, 1966). The theory comes to this: every natural connective can be adequately formalized by at least one minimal Fitch logic. Fitch logics were chosen for study because of their reputation for clarity and intuitiveness; logicians generally find that Fitch style deductions enjoy a lucidity not shared by the multitude of equivalent logical systems that have been investigated. The minimal Fitch logics are a subclass of the Fitch logics, a subclass that seems to manifest even more simplicity than the average Fitch logic. The upshot is that the minimal Fitch logics provide a plausible reconstruction of the idea of natural reasoning, namely, reasoning that conforms to some minimal Fitch logic (cf., Section 4.6). Before stating the third condition, and describing the class of minimal Fitch logics it invokes, the terms logic and adequate formalization ought to be clarified. By a logic is meant a device that examines finite arrays of sentences for proofhood. These arrays are thought to begin with a possibly null list of sentences called premises and end with a sentence called the conclusion. The premises and conclusion by themselves constitute an argument. An argument is intuitively valid if it seems impossible, after reflection, for the premises to be true but the conclusion false. Suppose we have a connective or set of connectives in hand. A given logic is considered to be an adequate formalization of a connective just in case the logic accepts proofs corresponding to all but only the intuitively valid arguments that hinge on occurrences of that connective.
282
Daniel N. Osherson
4.3 The third condition
The third condition
on conceptual
naturalness
may be stated this way:
Third condition on conceptual naturalness: A connective is natural only if it can be adequately formalized by a minimal Fitch logic (i.e., the class of minimal Fitch logics contains adequate formalizations of all the natural connectives). If the condition is true, then natural connectives like possibly and and have adequate formalizations as minimal Fitch logics. This is all the condition claims. Natural connectives like and could have adequate formalizations within some other kind of logic without compromising the truth of the condition, which simply says that and can at least be formalized “minimally Fitchly”. The truth of the theory also would not rule out the possibility that an unnatural connective has an adequate formalization as a minimal Fitch logic. Being so formalizable is intended only as a necessary, not sufficient, condition on naturalness. On the other hand, the interest of the condition is compromised by every unnatural concept that is formalizable as a minimal Fitch logic. If every connective whatsoever, both natural and unnatural, had minimal Fitch representations, then the proposed necessary condition would be true but vacuous - like proposing that every natural connective has a formalization that can be written in black ink. Ultimately one hopes to find a class of logics just broad enough to allow formalization of the natural connectives and no more; but that goal will be reached gradually if at all.’ In summary, the condition offered above constitutes an empirical theory within psychology. The truth of the theory depends on which connectives humans find natural, in the pre-theoretical sense of “natural” discussed in Section 1. To prove the theory false requires the exhibition of a natural connective that is not adequately formalized by any minimal Fitch logic; to prove the theory interesting requires the exhibition of classes of unnatural connectives that similarly escape formalization by any minimal Fitch logic. 4.4 Minimal Fitch logics Brief remarks will be sufficient to indicate how the class of minimal Fitch logics is characterized (see Osherson, 1977, for a complete characterization). The heart of a Fitch logic is its set of inference rules, such as these:
‘The reader familiar with the early Chomskyan approach to explanatory adequacy (e.g., Chomsky, 1964) will recognize Chomsky’s methodological insights about linguistic planted into the conceptual realm.
in linguistics theory trans-
Three conditions on conceptual naturalness
Figure 7.
(a) Sample proof within a Fitch logic that includes inference rules (i) and (ii) of Section 4.4. (b) A subproof occurring within proof (a). Formulas (I) and (2) of (a) are the premises of the proof; (7) is its conclusion. Formula (3) is the premise of subproof (b) of proof (a), and its presence in (a) is legitimated thereby. Formulas (4) and (6) are legitimated by being copies of formukzs occurring above and to the left of them (viz., (1) and (2). respectively). Formula (5) is legitimated by rule (ii), applied within the subproof to formulas (3) and (4); (p & q) + r is parsed as A + B in this application of (ii). Formula (7) rests on an application of rule (i), where subproof (b) of (a) corresponds to the column within braces in rule (i); A is taken to be (p & q), B is taken to be r. Rule (i) may apply despite the absence of any schema within it corresponding to formula (4); application of a Fitch inference rule requires only that none of its schemata be without counterparts in the proof to tihich the rule applies. Note that the numbers in (a) are technically not part of the proof: (1)
(P &
(2)
-r
9) +
I
(P & 4)
k
(P & 4) +
(P & 9)
(3)
1
(P & 9) + r
(4) (5) (6) (7)
I
r
-r
r
--I
-(P&q)
(a)
(ii)
283
(b)
{A + B, A}, B
Such inference rules legitimate the presence of a given formula in a proof on the basis of formulas that have occurred earlier; the material within braces on the left of a given rule portrays a pattern of formulas which, if instantiated in some‘initial segment of a proof, licenses the extension of that proof to the formula portrayed on the right of the rule. Thus, rule (ii) allows a formula, B, to be added to the bottom of a proof if the conditional (A + B) and the formula A have occurred earlier therein; rule (i) licenses the appendage of a negated formula, -A, on the basis of an earlier subproof having A as premise and B and -B as nonpremises (vertical lines signify subproofs; horizontal
284
Daniel N. Oshersorz
lines separate premises, from nonpremises). The application of rules (i) and (ii) is illustrated in Fig. 7, which presents a sample proof within a Fitch logic. In addition to this functional point of view, we can consider inference rules from the structural point of view, that is, in terms of their own internal geometry. Considering the geometry of a Fitch inference rule allows us to assess its simplicity. For example, the inference rules (i) and (ii) share the structural property that the schema on the left (which do the “legitimating” of a formula) either contain a single column of formulas, and contain no formulas outside of the column, or else contain only formulas outside of a column, and no columns. That is, neither of these inference rules contains, within braces, formulas arranged both vertically and horizontally; rather, the formulas are arranged only one way or the other. In this respect these inference rules enjoy a kind of simplicity that distinguishes them from inference rule (iii), for a hypothetical connective, *:
(iii){I-&).B], A The minimal Fitch logics are partially characterized by stipulating that no inference rule of a minimal Fitch logic may be of this complex sort, illustrated by the * inference rule. That is, to qualify as an inference rule in a minimal Fitch logic, the legitimating schema may consist of formulas organized as a single column, or as formulas which are outside of columns, but not a mixture of the two. The minimal Fitch logics are fully characterized by stating five further conditions on the geometry of their inference rules. Each condition lends a certain simplicity to the inferential mechanism of the minimal Fitch logics.’ 4.5 The third condition, evaluated How successful are the minimal Fitch logics in providing a necessary condition on connective naturalness? First of all, do they allow adequate formalization of every natural connective ? In Osherson (1977) it is shown that all the natural connectives in Table 2 are adequately formalized by one or another minimal Fitch logic - or, more cautiously, for each natural connective in Table 2 there is a minimal Fitch logic available that yields precisely the
‘Returning to the analogy with Chomskyan linguistics, placing these six conditions on inference rules so as to characterize a class of logics is like placing restrictions on the form of transformational rules so as to characterize a class of grammars. The minimal Fitch logics are defined more precisely in Osherson (1977). An improved version of the theory may be found in Osherson t1978), which serves, as well, to repair a defect in the original paper.
Three conditions on conceptual naturalness
285
same theorems as standard axiomatizations of these notions, axiomatizations esteemed by logicians who work in these areas. These results provide empirical support for the theory; they do not prove the theory, of course: it is not yet known whether other natural connectives like it is to be hoped that, or the dyadic connectives of moral committment, or connectives involving knowledge and not just belief, find formalizations within the class of minimal Fitch logics. If they do not, then the theory is simply wrong. At present, the fact that a bunch of natural connectives are formalizable in the relevant way provides a modicum of support. Turning now to the complementary question, is the theory interesting? Perhaps every conceivable connective has a minimal Fitch logic formalization. In this case the third condition on conceptual naturalness would be no condition at all. In fact, the theory is not vacuous. Recall the unnatural connective W defined in Section 4.1. I have proven that W and several infinite classes of similar connectives cannot be formalized by any minimal Fitch logic whatsoever. This result is by no means deep, and it leaves open the possibility that countless other unnatural connectives do find minimal Fitch formalizations.” But the fact that W is not formalizable by any minimal Fitch logic does show that the proposed necessary condition on connective naturalness is not completely trivial. 4.6 Natural reasoning, learnability Contemplation of the minimal Fitch logics brings several additional problems into focus. Two of them can be briefly characterized. The first problem arises in attempting to cash in the intuitive motivation offered in Section 4.2 for the theory of minimal Fitch logics, namely, the suggestion that every natural concept can be reasoned with in a natural way. What grounds are there ,for believing that the class of minimal Fitch logics is an adequate reconstruction of the notion of natural reasoning? Indeed, what kinds of considerations are even relevant to assessing the adequacy of a representation of human reasoning? The information processing psychologist offers real time fidelity as the fundamental criterion of adequacy in this situation: a logic will be considered an adequate reconstruction of human reasoning if and only if there is a sufficiently simple relationship between the formalism of the logic and the sequence of information exchanges that transpire in the nervous system when someone actually reasons deductively. Such criteria figure prominently in earlier work of my own on logical abilities and
“Such cases are already known to me. As explained in Section 4.2, the existence of unnatural connectives finding minimal Fitch formalizations does not falsify the theory.
286
Daniel N. Osherson
their development (Osherson, 1976). A logician, in contrast, may find her or himself much less devoted to the minutia of ordinary reasoning, and interested instead in abstract properties of the class of proofs available to a given logic or class of logics. For example, logicians like Dag Prawitz (1965) have provided normal form theorems for several logics of the “proof tree” variety. These theorems demonstrate the existence of “well behaved” proofs for every argument that has a proof at all in the logics that Prawitz studies. Intuitively, such proofs are well behaved because, among other things, they avoid a kind of wasteful detour that Prawitz formally characterizes. Logics about which interesting normal form theorems can be proved are faithful to ordinary reasoning in a different way than process models. Elsewhere (Osherson, 1978), I give some normal form results of an elementary nature for the class of minimal Fitch logics, and attempt to clarify the sense in which normal form theorems are relevant to the psychology of ordinary reasoning. A second issue raised by the theory of minimal Fitch logics concerns the “identifiability” of a connective. Consider a child faced with deductive inferences that hinge on the meaning of a connective denoted by an unfamiliar name. To be able to distinguish new valid arguments from new invalid ones, the child must identify the mystery connective, that is, determine whether the unfamiliar name denotes conjunction, negation, some modal operator, or whatever. The problem for psychologists is to specify a device that will (a) infer the identity of such a mystery connective on the basis of some finite set of arguments in which it figures - provided that the mystery connective is natural; and (b) fail to infer the identity of the mystery connective on the basis of any finite set of arguments - provided that the mystery connective is unnatural. Such a device would not only mark the distinction between natural and unnatural connectives, but also serve as an idealized model of logical development in children. In the paper cited above, some simple facts are presented about the identifiability of the class of connectives with minimal Fitch formalizations.” 5. Concluding
remarks
I have attempted tual phenomena:
to connect conceptual naturalness to three other intelleccategory sense versus nonsense, structural simplicity versus
“The identifiability problem for connectives is greatly illuminated by the work of Wexler, Culicover, and Hamburger (1975; Hamburger and Wexler, 1975) on the homologous problem of identifying transformational languages. Chomsky (1965, 1975) provides a clear formulation of the psychological issues bound up in this approach to intellectual development. See also Moravcsik and Osherson (forthcoming).
Three conditions on conceptual naturalness
287
complexity, and deductive validity versus invalidity. Such connections suggest themselves in a variety of contexts, not only in the three areas discussed above. For example, J. D. McCawley (described in J. D. Fodor, 1977, Section 4.2) attempts to provide conditions on the meanings that may be encoded as single lexical items in a natural language; this is to be achieved by reference to the kinds of syntactic transformations available in a “generative semantics” type grammar. One function of transformations in a generative semantics grammar is to collect semantic primitives under a single node in a phrase marker, prior to lexical insertion for that node. Thus, according to McCawley, the semantic primitives for the word “thork” - meaning to give to one’s uncle and - cannot be so collected in view of an independently motivated constraint on transformations known as the “Coordinate Structure Constraint” (Ross, 1967). (That’s why there can be no sentence “John thorked Harry fifty dollars” meaning “John gave to his uncle and Harry fifty ‘dollars”.) “Thork” is thereby ruled an impossible lexical item. And it seems straightforward to convert constraints on lexical items into associated conditions on conceptual naturalness. McCawley’s proposal runs into several difficulties (see Wasow, 1976, Section 7, and J. D. Fodor, 1977, Section 4.2 for discussion), and its interest is no greater than the interest of generative semantics as a grammatical theory generally. But McCawley has, at the least, highlighted the possibility of (a) using syntactic phenomena to constrain the class of grammars that are sufficient to describe natural languages, and then (b) invoking formal properties of those (independently motivated) grammars to constrain the class of easily expressed, hence natural, concepts.‘* The philosopher Eliott Sober (1975) provides another example of a connection between conceptual naturalness and a phenomenon not explored here, namely, inductive confirmation of scientific hypotheses. A set of concepts is natural, Sober reminds us, only if it permits rationalization of our intuitions about the relative support provided by given evidence for different hypotheses; the crux of the matter is that the same formal system of inductive logic will make different claims about relative support depending on the concepts used to characterize the evidence (Goodman, 1966). A well motivated inductive logic can thus be used to further condition conceptual naturalness. Concepts are natural, on this story, only if they allow a plausible inductive logic to accurately characterize relative support. Sober presents just such a plausible scheme for choosing among competing scientific hypotheses,
‘*Katz (1972, pp. 351
- 2, and elsewhere)
formulates
a similar
program.
288
Daniel N. Osherson
thereby setting the stage for the formulation and test of yet another proposal about naturalness.13 Additional examples of this kind could be presented, but the general point is by now clear. The problem of natural concepts rears its head in almost every serious investigation of the human intellectual faculties. Consequently, the way is open to trade on each of these connections so as to attempt to develop varied theories that converge on this all important dimension of naturalness. Even limited progress in such an enterprise would throw fresh light on some old questions about human nature. 13Sober’s proposed logic suffers from internal defects, however, that require can be used to condition naturalness; see Hills (1977) for a critical review.
repair before
the system
References B. and Kay, P. (1969) Basic Color Terms: Their Universality and Evolution, Berkeley, University of California Press. Chomsky, N. (1964) Currenf issues in linguistic fheory, The Hague, Mouton. Chomsky, N. (1965) Aspects of the theory of syntax. Cambridge, M.I.T. Press. Chomsky, N. (1975) Reflections on Language, New York, Random House. Fitch, F. B. (1952) Symbolic Logic, New York, Random House. Fitch, F. B. (1966) Natural deduction rules for obligation. Amer. Philos. Q., 3, (l), 27738. Fodor, Janet D. (1977) Semantics: Theories of Meaning in Generative Grammar, New York, Crowell. Goodman, Nelson (1965) Fact, Fiction and Forecast (2nd edition), Indianapolis, Bobbs-Merrill. Goodman, Nelson (1966) The Structure ofAppearance (2nd edition), Indiandpohs, Bobbs-Merrill. Grice, H. P. (1975) Logic and conversation, in Peter Cole and Jerry Morgan (eds.) Syntax and Semantics: Speech Acts, New York, Academic Press, pp. 833106. Hamburger, H. and Wexler, K. (1975) A mathematical theory of learning transformational grammar, J. math. PsychoL, 12, (2) 137-177. Hills, David (1977) Review of Simplicity by Elliott Sober, Philos. Rev., 86 (4) 5955603. Katz, Jerrold (1972) Semantic Theory, New York, Harper & Row. Keil, Francis (1977) The role of ontological categories in a theory of semantic and conceptual development. Doctoral dissertation, University of Pennsylvania. Revised manuscript to be published by Harvard University Press. Krueger, J. and Osherson, D. (1978) On the Psychology of structural simplicity, to appear in Jusczyk, P. (ed.) On the Nature of Thought, Hillsdale, Lawrence Erlbaum Associates. Moravcsik, J. and Osherson, D. (eds.) Theories ofcognitive competence. (tentative title), forthcoming. Osherson, D. (1976) Reasoning and Concepts, Hillsdale, Lawrence Erlbaum .kSociateS. Osherson, D. (1977) Natural connectives: a Chomskyan approach, J. math. Psychol., 19, (1) l-29. Osherson, D. (1978) Logical faculty, to appear in Moravcsik and Osherson (forthcoming). Prawitz, D. (1965) Natural Deduction: A Proof Theoretical Study. Stockholm: Almqvist and Wiksell. Putnam, Hilary (1973) Meaning and reference. J. fhilos., LXX, 699-711. Rosch, Eleanor (1973) On the internal structure of perceptual ans semantic categories, in Timothy Moore (ed.), Cognitive development and the acquisition of language, New York, Academic Press, 111-114. Berlin,
Three conditions on conceptual naturalness
289
Ross,
J. R. (1974) Constraints on variables in syntax (excerpts) in Gilbert Harman (ed.) On Noam Chomsky: critical essays, New York, Anchor Books, 165-200. Sober, Eliott (1975) Simplicity, London, Oxford University Press. Sommers, Fred (1959) The ordinary language tree,Mind, 68, 160-185. Sommers, Fred (1963) Types and ontology, Philos. Rev., 72, 327-363. Wasow, Thomas (1976) McCawley on generative semantics: review of “Grammar and Meaning” by James McCawley, Ling. Anal., 2, (3) 279-301. Wexler, K., Culicover, P. and Hamburger, H. (1975) Learning theoretic foundations of linguistic universals. Theoret. Ling., 2, (3) 215-252. Zusne, Leonard (1970) Visual Perception of Form, New York, Academic Press.
Resume 11 y a une predisposition pour les bebes i organiser d’autres. Les concepts favotises sont probleme majeur en psychologie consiste a trouver de concepts. Pour cela on propose trois conditions sur le “naturel” des concepts. Ces conditions lient sens versus non-sens, simplicite versus complexite et
et non selon
leur experience dans le cadre de certains concepts dits naturels, les autres sont dits non-naturels. Un une distinction de principe entre ces deux sortes formelles necessaires qui permettent de statuer le problbme du “naturel” a des distinctions entre validite versus non-validitk.
Cognition, 6 (1978) 291-325 @Elsevier Sequoia S.A., Lausanne
2 - Printed
in the Netherlands
The sausage machine: A new two-stage parsing model* LYN FRAZIER University
and JANET DEAN FODOR
of Connecticut
Abstract It is proposed that the human sentence parsing device assigns phrase structure to word strings in two steps. The first stage parser assigns lexical and phrasal nodes to substrings of roughly six words. The second stage parser then adds higher nodes to link these phrasal packages together into a complete phrase marker. This model of the parser is compared with ATN models, and with the twostage models of Kimball (I 973) and Fodor, Bever and Garrett (19 74). Our assumption that the units which are shunted from the first stage to the second stage are defined by their length, rather than by their syntactic type, explains the effects of constituent length on perceptual complexity in center embedded sentences and in sentences of the kind that fall under Kimball? principle of Right Association. The particular division of labor between the two parsing units allows us to explain, without appeal to any ad hoc parsing strategies, why the parser makes certain ‘shortsighted’ errors even though, in general, it is able to make intelligent use of all the information that is available to it.
1. Introduction We will argue that the syntactic analysis of sentences by hearers or readers is performed in two steps. The first step is to assign lexical and phrasal nodes to groups of words within the lexical string that is received; this is the work of what we will call the Preliminary Phrase Packager, affectionately known as the Sausage Machine. The second step is to combine these structured phrases into a complete phrase marker for the sentence by adding higher nonterminal nodes; the device which performs this we call the Sentence
*We dedicate this paper to the memory of John Kimball, whose proposals about sentence parsing, as will become clear, have had a considerable influence on our own. Requests for reprints should be addressed to Janet Dean Fodor, University of Connecticut U-145, Storrs, Corm. 06268.
292
Lyn Frazier and Janet Dean Fodor
Structure Supervisor. These two parts of the sentence parsing mechanism have very different characteristics, and this provides an explanation for the relative processing complexity of certain types of English sentence. The Preliminary Phrase Packager (PPP) is a ‘shortsighted’ device, which peers at the incoming sentence through a narrow window which subtends only a few words at a time. It is also insensitive in some respects to the well-formedness rules of the language. The Sentence Structure Supervisor (SSS) can survey the whole phrase marker for the sentence as it is computed, and it can keep track of dependencies between items that are widely separated in the sentence and of long-term structural commitments which are acquired as the analysis proceeds. The significant properties of this model can be brought out by comparing it with other two-stage parsing models that have been proposed. (By “twostage model” we mean one in which the syntactic analysis of a sentence is established in two steps, one temporally prior to the other, regardless of when and how semantic properties of the sentence are determined.) Kimball (1973) proposed a model in which the first stage parser connects each lexical item as it is encountered into a phrase marker for the whole sentence. As nodes and branches are added to this phrase marker on the right, phrasal units which have been completed are snipped off from the left and shunted to the second stage parser where they are reassembled for further processing. The second stage parser may perform some semantic interpretation but it also apparently has the task of associating transformationally moved constituents with their deep structure positions in the phrase marker. Fodor, Bever and Garrett (1974, and references therein) have made a number of proposals which, as far as we can determine, add up to the following model. The first stage parser scans the input string of lexical items for cues to the location of clause boundaries, and divides the string at these points. For each clausal unit that has been isolated, it determines the within-clause constituent structure, and then shunts it to the second stage parser whose task is to establish the configuration of these clausal units in the sentence as a whole. These two models differ with respect to the nature of the units which are shunted between the two parsing devices, and also with respect to whether these units are attached together before they are shunted. The Fodor, Bever and Garrett parser shunts only clauses, while the Kimball parser shunts all phrasal units regardless of their syntactic type or size. In the Fodor, Bever and Garrett system, only the second stage parser is concerned with how the clausal units fit together, while in the Kimball system it is the first stage parser which determines the arrangement of all phrasal and clausal units within the phrase marker and they are shunted to the second stage parser with pointers which specify exactly how they should be reassembled. We will
The sausage machine
293
argue that both of these earlier models are incorrect about the type of constituent that is shunted, and that in fact the shunting unit is determined by its size rather than by its syntactic status. Our first stage parser, the PPP or Sausage Machine, will analyze a string of several (seven plus or minus two?) words at a time,’ and these may constitute a clause (e.g., After we drove to Caluis) or even two clauses (e.g., I hope you’re sorry) but often only a subclausal phrase (e.g., the man in the green raincoat). We will also argue that the first stage parser is responsible only for forming these phrasal/clausal packages, and that the second stage parser has to decide how to connect them together, as in the Fodor, Bever and Garrett model. Before turning to the specific evidence for these claims, we note that there is some general motivation for a two-stage model of the human parsing mechanism, viz., that there appears to be a rather severe limit on the capacity of working memory. In a single-stage parser, which constructs and retains the phrase marker for the whole sentence, the available computation space will inevitably decrease as more and more of the lexical string is processed. But it is well known that sentence length is not a good predictor of sentence complexity for the human parser. In a two-stage parser, the demand on working memory can be kept within reasonable limits without putting the system under excessive strain towards the end of a long sentence, for partial analyses can be cleared from the first stage parsing unit as they are established. Of course, it is assumed that the second stage parser has the capacity to represent the complete sentence. But this does not undermine the argument from memory limitations, for it is a well-attested (if unexplained) fact about human memory that the more structured the material to be stored, the smaller the demand it makes on storage space. If it is assumed that the first stage parser assigns a certain amount of low level structure to a previously unstructured word string, and that the second stage parser groups the resulting units into a full phrase marker, then each stage will be handling roughly the same number of units at the relevant level of structure. The appearance of a difference in storage capacity between them would simply be due to the fact that YEstructural units at the first stage would subtend only a small fragment of the sentence, while n structural units at the second stage could accommodate the whole sentence.
‘The capacity of the PPP may be defined not in terms of words but in terms of syllables or morphemes or conceivably in terms of time. Its proper definition is a very interesting question, but we have not attempted to disentangle all of these alternatives. Later, we will suggest that the amount of material that can be simultaneously viewed by the PPP may be influenced by the complexity of the syntactic computations that have to be performed over it.
294
Lyn Frazier and Janet Dean Fodor
This general argument from capacity limitations rests in part on the assumption that the phrasal structure assigned to a lexical string is not stored separately from the decision making component of the parser, as in the ATN (augmented transition network) models of the parser proposed by Woods (1970), Kaplan (1972, and references therein). The representation of a sentence which these ATN parsers compute is stored in special registers; however large it grows, it does not hamper the processing of subsequent words in the string. It is also, however, completely inaccessible to the decision making unit which is responsible for processing subsequent words,’ and there is evidence that this aspect of the model is incorrect. One of Kimball’s (1973) seven parsing principles states that, where there is a choice, the human sentence parsing mechanism favors analyses in which an incoming word is attached into the phrase marker as a right sister to existing constituents and as low in the tree structure as possible. We will discuss this claim in detail in section II below, and argue that it is basically correct (though we will propose some important modifications). Its significance is that it attributes to the parser a general preference which is defined over the geometry of the phrase marker, regardless of which particular phrase types are involved. An ATN parser could certainly designed so that it would make exactly the same decisions at choice points as the Kimball parser. But because its decisions are determined by the ranking of arcs for specific word and phrase types, rather than in terms of concepts like ‘lowest rightmost node in the phrase marker’, the parser’s structural preferences would have to be built in separately for each type of phrase and each sentence context in which it can appear. Evidence that the human sentence parser exhibits general preferences based on the geometric arrangement of nodes in the phrase marker indicates that its executive component does have access to the results of its prior computations. Its input at each choice point must consist of both the incoming lexical string and the phrase marker (or some portion thereof) which it has already assigned to previous lexical items. Two quite different explanations of relative sentence complexity can be based on the assumption that a restriction on working memory plays a significant role in the operations of the parser. One explanation relates sentence ‘Our remarks are addressed to the particular ATN models which have been proposed to date, in which information flows from the decision unit to the registers but not vice versa. ATN theory provides a very general framework for the formulation of a variety of different parsing models. It would no doubt be possible to devise an ATN parser which does assess the representations stored in its registers and uses this information to guide its decisions about the analysis of incoming words. (In the most extreme case, the network would be reduced to a single arc, which would simply direct the parser to process the next lexical item. The network would no longer encapsulate the well-formedness rules of the language, as it does in current ATN models. Such a model would appear to differ very little from the kind that we are proposing. See, in particular, footnote 19 on page 322.)
The sausage machine
295
complexity to how many nodes of the phrase marker must be stored simultaneously. The other relates sentence complexity to how many mistakes the parser makes in computing the phrase marker because it cannot store many nodes simultaneously. The model outlined in Kimball (1973) is compatible with either type of explanation. But in his 1975 paper, Kimball opted for the former, which we believe to have been the wrong choice. Kimball noted, for example, that sentence (1) is more easily parsed with the adverb yesterday attached within the lower clause rather than beneath the S node of the main clause. (1)
Tom said that Bill had taken the cleaning out yesterday.
The higher clause attachment would require the first stage parser to retain the top S node of the phrase marker until the end of the sentence. For the preferred lower clause attachment, this S node can be shunted to the second stage parser at a much earlier point in the processing. Sentence complexity thus correlates with the load on memory in the first stage unit. The problem with this explanation is that it predicts either that the first stage parser will be totally unable to accommodate enough nodes to allow it to compute the less preferred reading of a sentence like (I), or else that both interpretations of (1) should be equally complex. Until it reaches the end of the sentence, the first stage parser has no way of knowing whether or not it will need to retain the highest S node for further attachments. If it decides to do so, then it will subsequently be in a position to make either attachment of the adverb. But whichever attachment it makes, the demands on memory will have been the same, and so the two interpretations of the sentence should be equally easy or difficult to compute. If on the other hand the parser decides not to retain the top S node, then it will subsequently only be able to attach the adverb within the lower clause. This predicts the preference for lower attachment in (l), and also the difficulty of computing the only coherent interpretation of an unambiguous example like (2). (2)
Tom said that Bill will take the cleaning out yesterday.
But notice that the explanation has now completely changed its emphasis. A sentence like (2) is difficult not because the memory of the first stage parser is overloaded with too many nodes, but precisely because it has avoided overloading itself by relinquishing the top S node. In general, ‘garden path’ explanations of processing difficulty account for asymmetries between sentences where pure memory load explanations do not. Our own model is of the garden path variety. The parser chooses to do whatever costs it the least effort; if this choice turns out to have been correct, the sentence will be relatively simple to parse, but if it should turn
296
Lyn Frazier and Janet Dean Fodor
out to have been wrong, the sentence will need to be reparsed to arrive at the correct analysis. The fact that hearers are not always conscious of having made a mistake in the analysis of such sentences (as they are for notorious garden path sentences like The horse raced past the barn fell) is not, we submit, a good argument against this kind of account of perceptual complexity (see Marcus, 1978). To summarize: the assumption of a limit on working memory enters into our model only indirectly. It is the motivation for the division of the syntactic parser into two substages. But this division then determines restrictions on the information that is available to each stage. In particular, the Preliminary Phrase Packager must make its decisions in ignorance not only of what will come later in the sentence but also of what came before, since some of the nodes that have already been established in the phrase marker will have been shunted to the Sentence Structure Supervisor. The PPP will therefore fail to recognize certain legitimate attachment possibilities for the lexical items it is processing. This, we will argue, is the source of Kimball’s principle of Right Association. To establish this connection, however, it is necessary to make the shift that we have described in the division of labor between the two parsing units: the first stage parser is not responsible for constructing the whole phrase marker but only determines its lower lexical and phrasal nodes.3 Many of the parsing strategies that have been proposed in the literature are ad hoc in the sense that, though they do account for the relative processing complexity of certain classes of sentences, there is no explanation for why the human parsing mechanism should employ these strategies rather than some quite different ones (e.g., their inverses). What we learn from studying these strategies, therefore, is simply the values of certain parameters in the system (see Kaplan, 1972). Since a variety of quite different parsing mechanisms can apparently incorporate equivalent strategies, little light is shed on the basic structure of the system ~ the nature of its subcomponents, the amount and type of information transmitted between them, temporal relations between their activities, and so on. Kimball’s interest was in ‘strategies’ which are not merely arbitrary rankings of alternative analyses but which follow from - and hence reveal - the fundamental organizing principles of the parser. Our modifications of his model permit this kind of
31n this respect our model is similar to other ‘chunking’ models that have been proposed (for example, by Thorne, Bratley and Dewar, 1968, and Limber, 1970, as well as Fodor, Bever and Garrett, 1974). But we should emphasize that we follow Kimball in assuming that the boundaries of the phrasal chunks are established *r Ihe course of determining the details of the within-phrase structure, rather than as a prior and independent step.
The sausage machine
explanation preferences
to be extended still further. That is, the parser’s can be seen as automatic consequences of its structure.
II. The Shortsightedness
297
decision
of the Parser
The principle of Right Association states that “terminal symbols optimally associate to the lowest nonterminal node”. This predicts the preferred interpretation of sentence (1) above, the difficulty of sentence (2), and covers a wide range of other examples. The verbal particle up in (3) is more naturally associated with smashed in the lower clause than with called in the higher clause. (3) Joe called the friend who had smashed his new car up. The prepositional phrase to Mary in (4) tends to be associated with the letter rather than attached to the higher NP node which dominates the whole of the note, the memo and the letter, or to the VP node which is higher still. (4) John read the note, the memo and the letter to Mary. The possessive ‘s in (5) most naturally associates with just the lower noun phrase Mary rather than with the higher noun phrase the boy whom Sam introduced
to Mary.
(5) I met the boy whom Sam introduced
to Mary’s friend.
The relative clause in (6) is attached beneath the NP node that dominates the job rather than higher up at the level of VP or S, where it would be interpreted as having been extraposed from the subject noun phrase. (6) The girl took the job that was attractive. The preference for low attachments is so strong that it persists even in unambiguous sentences where only a higher attachment would lead to a syntactically and semantically coherent phrase marker. Sentences (7) - (10) all tend to be misanalyzed on a first pass. (7) (8) (9) (IO)
Joe looked the friend who had smashed his new car up. John read the note, the memo and the newspaper to Mary. I met the boy whom Sam took to the park’s friend. The girl applied for the jobs that was attractive.
This structural generalization over a variety of otherwise disparate sentence types is impressive, and the preferences that it accounts for are sufficiently robust to be accessible to intuition (though experimental confirmation of
298
Lyn Frazier and Janet Dean Fodor
them would not be amiss). We therefore accept that something like the principle of Right Association is operative within the human parsing mechanism, and turn to the question of why this should be so. Kimball considered, but then rejected, the idea that Right Association is simply an automatic consequence of the shunting procedure which removes parts of the phrase marker from the first stage parsing unit. His principle of Closure states that “a phrase is closed as soon as possible, i.e., unless the next node parsed is an immediate constituent of that phrase”;4 and the principle called Processing requires that “when a phrase is closed, it is pushed down into a syntactic (possibly semantic) processing stage [the Processing Unit, or PUI and cleared from short-term memory”. Kimball considered an abstract tree structure of the form (11) to which the incoming item k must be attached.
It follows from Closure that by the time item k is encountered, the only part of (11) which will still be represented in the first stage parsing unit will be the node E and the items it dominates; the attachment of B beneath E would not have resulted in an immediate constituent being added beneath A, and therefore A and all of its dependent structure would have been snipped off and shunted to the PU before’k was received. Therefore the only possible attachment of k would be as a constituent within E; a higher attachment under A would require A to be called back from the PU and would run foul of the principle of Fixed Structure, which states that “when the last immediate constituent of a phrase has been formed, and the phrase closed, it is costly ever to have to go back to reorganize the constituents of that phrase”.’
4This formulation of Closure could be interpreted in two different ways, depending on whether the “unless” clause means that the next node parsed must be analyzed as an immediate constituent of the phrase, or whether it means that the parser will in fucr analyze the next node as an immediate constituent of the phrase. On the first interpretation, Closure incorporates an attachment strategy, i.e. it requires that, where there is a choice, an item should be attached into the phrase marker in such a way that the current phrase can be closed. On the second interpretation, attachment decisions in cases of temporary ambiguity would be left to the guidance of other strategies; depending on where an incoming item was attached, Closure would merely indicate which phrases should be considered closed. p are assuming this weaker interpretation here. (See also footnote 6, page 299.) In what follows we will argue that the point at which shunting occurs is not governed by Closure but simply by the limited capacity of the first stage parser. However, the general explanation still holds: certain attachments of items are ruled out because the relevant nodes have already been shunted to the second stage parser.
The sausage machine
299
Kimball decided not to identify Right Association with Closure because he considered that these two principles are opposed to each other in a phrase like (12). (12) old men who have small annual years of service
pensions
and gardeners
with thirty
Both Kimball’s argument and the demonstration that it is incorrect are extremely intricate,6 so we will finesse this objection and concentrate on the advantages to be gained from attributing Right Association to the narrow window that the first stage parser has on the sentence. First, the tendency towards low right association of an incoming constituent sets in only when the word is at some distance from the other daughter constituents of the higher node to which it might have been attached. As Kimball noted, there is little or no pressure in a sentence like (13) to attach the prepositional phrase for Susan as a modifier within the object noun phrase. The higher attachment (14a) even seems to be preferred to the lower attachment (14b). (We propose an explanation for this preference in section IV below.) (13) Joe bought the book for Susan. (14)
a.
“1
Joe
I
Susan
6Kimball’s argument depends heavily on the details of the phrase structure rules which he assumed for English, which are highly debatable. It also depends on construing Closure as an attachment strategy. We know of no unequivocal evidence for the attachment strategy interpretation of Closure. In fact, there is considerable evidence (see Frazier, 1978) that the parser prefers to keep phrases open as long as possible, rather than closing them as soon as possible. And this preference does not conflict with, but actually follows from, the parser’s preference for low right attachments.
300
Lyn Frazier and Janet Dean Fodor
(14) b. NP 2’1 V
I N
NP
/\
I
bought
Joe
NP /\ Det
N
/“\ P
NP
I
I
I
I
the
book
for
N
I
Susan
Sentence (15).
(13)
contrasts
in this respect
with (4) and (8) above, and also with
(15) Joe bought the book that I had been trying to obtain for Susan. This dependence of right association on constituent length is exactly what would be predicted if right association is due to the shunting of previously processed material. Let LIS suppose for the sake of argument that the first stage parser has the capacity to retain six words of the sentence, together with whatever lexical and phrasal nodes it has assigned to them. Then in processing (13), it will still be able to ‘see’ the verb when it encounters fbr Susan. It will know that there is a verb phrase node to which the prepositional phrase could be attached, and also that this particular verb is one which permits a for-phrase. But in sentence (1 S), where a long noun phrase follows the verb bought, the first stage parser will have lost access to bought by the time f’or Susurz must be entered into the structure; the only possible attachment will be within the long noun phrase, as a modifier to trying to obtuitl.
The alternative to this explanation of right association is the assumption that the first stage parser does have access to both lower and higher nodes, and that it simply chooses to make the lower attachment for some independent reason. If this were so it would be natural (though not obligatory) to predict that the difficulty of a sentence is a function of how high an incoming constituent must be attached. Kimball suggested that this was so, but all of the evidence seems to be against it. In sentence (16) there are in principle three different positions in which ~~csterdu~~might be attached, but the lowest position is ruled out on semantic grounds.
The sausage machine
(16) Joe
said that
Martha
claimed
that
1984 will be blissful
301
yesterday.
Our informants disagree as to which of the two remaining analyses of (16) is preferable. Some favor the highest attachment for the adverb, but they tend to read the sentence with a large pause before the final word, making it an afterthought which modifies the whole sentence but is perhaps not fully integrated into it. In any case, it is generally agreed that both higher attachments are awkward, and more or less equally so; neither of them is at all comparable in acceptability to a lowest-clause attachment when this is available. Once again, this is exactly what we would expect if right association is the result of nodes being shunted out of the first stage parsing unit. Once a node has been shunted, it is unavailable to the first-stage parser for further attachments regardless of what its position in the phrase marker is or how long ago the shunting occurred. Up to this point, the data are compatible with the kind of snipping and shunting process which relates the two stages of Kimball’s parser. But there are other sentences in which right association is not manifest which demand a redefinition of the first stage parser’s responsibilities. Sentence (17) has been constructed so that there are in principle three possible attachments of the word yesterday, two as right daughters of clauses that have already been parsed, and one as left daughter of a new clause. The lowest attachment as right daughter, which is favored by right association, is excluded by the tense; the competition is therefore between a high attachment as right daughter and an attachment as left daughter. (17) Though Martha claimed that she will be the first woman yesterday she announced that she’d rather be an astronaut.
president
As we would expect from the violation of right association, this sentence is not fully natural (especially when presented without internal punctuation or read without a major intonation break, as it must be to test the preference between competing interpretations). But it seems clear that the grouping yesterday she announced is preferred over the grouping Martha claimed . . yesterday. This preference cannot be due to a limit on the height of an attachment, because yesterday is attached even higher in the preferred phrase marker (18a) than in the alternative (18b). This observation suggests that the important parameter may not be the height of the attachment in the phrase marker, but rather how local it is relative to other words in the lexical string. Since yesterday in (17) cannot coherently be grouped with the words immediately on its left, it is grouped with those immediately on its right rather than with more distant words on
302
(18)
Lyn Frazier and Janet Dean Fodor
S
a. Ad”AS AdC-----hS
*d/----Y
A
I
though
NP
’ yesterday
A Martha
a she announced that she’d rather be an astronaut
‘i claimed
COMAS
she will be the first woman president
b.
she announced that she’d rather be an astronaut
I
Martha claimed
COMP
she will be the first woman president
its left. A local attachment tions of (19) - (22). (19) (20) (2!) (22)
John John John John
read read read read
the the the the
principle
also explains
the preferred
interpreta-
letter to Mary. memo and the letter to Mary. note, the memo and the letter to Mary. postcard, the note, the memo and the letter to Mary.
The sausage machine
303
Because a conjunction of noun phrases is a ‘flat’ structure, rather than right branching, the attachment of to Mary as a modifier of the noun phrase would be at the same height in the phrase marker in all of (20) - (22). But as the noun phrase grows longer, the inclination to attach to Mary to just the closest sub-constituent (the letter) becomes stronger. A preference for local association of a word with other nearby words is unexplained if the first stage parser is building up a complete phrase marker for the sentence. It is easily accommodated, however, if we assume that the first stage parser has the properties of the Sausage Machine, i.e., that its task is only to assign structure to groups of adjacent words in the lexical string, and to transmit these as separate phrasal packages to a second stage unit which will link them together with higher nodes into a complete phrase marker. A word at ambiguous phrase boundary could be incorporated into a package with the words on its left, or it could become the first word of a new package including the words on its right. An important prediction of this model is that the preferred attachment of a constituent may differ depending on whether it is effected by the Preliminary Phrase Packager (Sausage Machine) or by the Sentence Structure Supervisor. It is the PPP which has a limited window on the sentence and which will therefore be able to make only local attachments. The SSS, which can view the whole of the phrase marker, will not be so constrained. This predicts different effects for constituents of different length. A long constituent which is packaged up as a separate unit by the PPP will be attached to other phrases by the SSS; it should therefore show no special tendency towards local attachment. Short constituents, which will be grouped together with others in a package by the PPP, are those for which a local attachment should be strongly favored. This is exactly what the data show. The examples with which Kimball motivated his Right Association principle involve the attachment of verbal particles, possessive 3, single word adverbs, and short prepositional phrases.’ But there are innumerable sentences in which whole clauses are attached high in the phrase marker without any apparent ill effect on perceptual complexity. In (22), for example, the but-clause has to be Chomsky-adjoined to
‘Sentence (6) above is the only one of the examples which Kimball cited as instances of Right Association in which the constituent to be attached consists of several words. It is not clear, however, that the difficulty of the extraposed-relative analysis of (6) does, after all, have anything to do with Right Association. It seems more plausible to attribute it to the fact that this analysis, unlike the preferred analysis, presupposes that the sentence has been transformed. Indeed, it has often been claimed that there is a transderivational constraint in English which excludes the extraposed-relative analysis of such sentences as ungrammatical.
304
Lyn Frazier and Janet Dean Fodor
the highest S node over the preceding clause in (24).
clause; the same is true of the adverbial
(23) We went to the lake to swim but the weather was too cold. (24) Grandfather took a long hot bath after he had finished
his chores.
Our explanation for these cost-free violations of the Right Association principle is that the PPP takes but and after to be the first word of a new phrasal package and leaves it to the SSS to attach this package into the phrase marker. The same applies to (17) above; if the PPP discovers that it cannot coherently connect Jmterday into its current package (which will be the complement clause within the tlzouglz-clause), it will then make yesterday the first word of a new package, which the SSS can attach without difficulty at the highest level of the phrase marker, as in (18a). What we have shown is that right (or local) association is sensitive not only to the length of the intervening constituent but also to the length of the constituent to be attached. Notice the striking contrast between (25) and (26). (25) John threw the apple that Mary had discovered was rotten out. (26) John threw the apple that Mary had discovered was rotten out of the window and into the rosebush. The one-word locative in (25) shows a strong inclination to group with the preceding words had discovered was rotten even though this makes no sense: the longer locative phrase in (26) is much more easily analysed as modifying the higher and more distant verb threw. It might be objected that these sentences differ in their syntax, since out is a particle in (25) but a preposition in (26). However the length effect can also be observed in (27) and (28) where this difference is controlled for. (27) a. b. (28) a. b.
Ellen brought the pies that she had spent the whole morning to the party. Ellen brought the pies that she had spent the whole morning to the potluck supper at the church hall. Joe got the idea that the coastguard was going to send a across. Joe got the idea that the coastguard was going to send a across to the two men clinging to the sinking canoe.
(28a) and (28b) are both in principle ambiguous between getting across and sending the liferaft across, but the former interpretation to recognize in (28b).
baking baking liferaft liferaft
the idea is easier
The sausage machine
305
It has often been observed that sentences are stylistically awkward to the extent that they have long constituents preceding short constituents. A long direct object noun phrase followed by a verbal particle is just one instance of this generalization; the particle tends to be absorbed into the noun phrase rather than attached as its right sister within the verb phrase. In our model this follows automatically from the limitations of the PPP. Because the noun phrase is long, the PPP will no longer have access to the verb and its dominating VP node, when the particle is encountered. Because the particle is short, it will be grouped by the PPP with other neighboring words, and the only candidates are the last few words of the noun phrase. Previous attempts to explain the data have concentrated on the length of the first constituent and have ignored the compounding effect of the shortness of the second constituent. Bever (1970) proposed that the parser abides by the principle “Save the hardest for last”, and suggested that a long constituent which comes first will exhaust immediate memory before a following short constituent can be processed. But this will not explain the difference in perceptual complexity between (25) and (26) for the noun phrase the apple that Mary had discovered was rotten should occupy exactly the same amount of immediate memory in both cases, and should interfere equally with the processing of the following constituent regardless of its length. It might be suggested instead that the parser benefits from knowing, as soon as possible after a constituent has been opened, how many daughter constituents it will have and of what type. If the daughter constituents arrive in the sequence long-short, recognition of the structure of the VP will be delayed longer than if they are ordered short-long. But once again, there is no reason why the sequence long-short should be any more difficult than long-long.
III. The Sausage Machine Model The human parsing mechanism is generally very efficient, as long as the sentence provides it with sufficient information to govern its decisions at the time they must be made. But we have argued that it exhibits one specific kind of ‘stupid’ behavior, which is explicable if the major phrases of a sentence are packaged up by a first stage parser with very limited access to structure that has already been established. Let us consider in detail how this preliminary phrase packager will operate. We assume that the PPP’s ‘viewing window’ shifts continuously through the sentence,* and accommodates perhaps half a dozen words. The degree a (See
overleaf)
306
Lyn Frazier and Janet Dean Fodor
of structure which has been assigned to this string of words will be greatest on the left and will decline towards the right, where items may not yet even have been lexically categorized. We will concentrate on an item X, roughly in the middle of the window, which has had a lexical node assigned to it but has not yet been linked to other words by means of nonterminal nodes.9 We assume that the task of the PPP is to group as many items as it can into a single phrasal package. If it were to form only very small packages, there would be more of them relative to the number of words in the sentence; the SSS would therefore be left with more attachment decisions to make, and it would have to make them at a greater rate. Efficient functioning of the system demands that each of the two stages should do as much of the work as it is able to within the limits on capacity at its level of analysis. The item X is therefore to be grouped, if possible, with other nearby items. Grouping it with items on its left should be optimal, for these items have already been assigned some structure into which X could be incorporated. If this is not possible, the PPP’s second choice would be to terminate the package it has been constructing and to start forming a new one, in which X will be the leftmost daughter. What the PPP cannot do, of course, is to attach X as a sister to items on the left which have already passed out of its viewing window. As we have noted, this characterization of the first stage parsing unit explains a number of related observations: the preference for low right attachments; the sensitivity of this preference to the length of the prior constituent; its sensitivity to the shortness of the constituent to be attached; and the preference for local attachment as a left daughter rather than distant attachment as a right daughter. We now show that this model offers a new
‘The phrases formed by the PPP do not have to be snipped off from any larger structure in order to be transmitted to the SSS. They might be shunted to the SSS, as in Kimball’s model and in the Fodor, Bever and Garrett model. However, a picture that we find more natural is that of the PPP shuttling its narrow viewing window through a sentence, forming its phrasal packages, and depositing these in the path of the SSS which is sweeping through the sentence behind it. This would have the advantage of increasing the decision lag of the SSS, by permitting the SSS to ‘look over the shoulder’ of the PPP, while the latter is forming its phrasal packages, and start considering their possible attachments before they are complete. It also leaves open the question of whether the PPP’s viewing window shifts in a truly continuous fashion through the sentence, or whether it jumps from one phrasal package to the next. We have not attempted to distinguish these two possibilities here, though they may prove to make different predictions about the details of the PPP’s sensitivity to constituent length. 9We leave open here the question of whether lexical category decisions are based entirely on information retrieved from the mental lexicon or whether they can be influenced by syntactic information. However, in assuming that lexical categorization typically occurs quite early relative to the postulation of higher nonterminal nodes, we are rejecting the idea that category decisions are made solely on the basis of the syntactic structure that has been assigned to prior words. (See footnote 16 on page 217).
The sausage machine
307
explanation for the extreme processing difficulty of center embedded sentences. The correct analysis of a sentence like (29) goes against the grain of the PPP, because adjacent phrases must not be grouped together into the same phrasal package. (29) The woman the man the girl loved met died. If the PPP did package the first six words of (29) together as a phrase, they could only be interpreted as constituting a conjoined noun phrase (missing its and). This, of course, is exactly the kind of misanalysis which Blumenthal (1966) showed that people tend to impose on such sentences.“. Notice also that even if this sentence-initial garden path is avoided in an example like (30) because its noun phrases are not conjoinable, one is still inclined to analyse the sequence of verb phrases as a list or conjunction. (30) The woman someone
I met loved died.
If the whole phrase marker were computed by a single parsing device, this would be difficult to understand. For, having correctly imposed right branching structure on the noun phrases, the parser should be able to predict the subsequent appearance of a tier of three verb phrases. But if it is the PPP which is packaging the verb phrases, we would expect its analysis to be quite local and unaffected by the structure of the preceding noun phrase sequence. The SSS will be aware of the nested structure, of course, but it will be helpless because it will inherit an incorrectly structured VP sequence from the PPP. The only correct phrasal packages which the PPP could usefully transmit to the SSS, in the case of the center embedded sentence (29), are those shown in(31).
“‘The conjunction analysis (i) will also be favored over the (correct) the Minimal Attachment principle discussed in Section IV below.
(0 NP
NP
(ii) NPANp’ NP’
S
relative
clause
analysis
(ii) by
308
(31)
Lyn Frazier and Janet Dean Fodor
a.
the girl
loved the girl
loved
d.
the woman
the girl
NP
loved the girl
loved
A safe but extremely uneconomical solution would be for the PPP simply to group each determiner with its following noun, and send six separate packages to the SSS - three noun phrases and three verb phrases. For the PPP to form package a (the girl loved) would be a minimal improvement; it would still be under-using its capacity, and there would be five separate packages in this nine-word sentence for the SSS to cope with. Package b (the mm the girl loved) would therefore be preferable; it would combine five words, and further reduce the decision density for the SSS by chopping the nine words into only four packages. Packages c, d and e would obviously be better yet, but only if the PPP could achieve them without error. These packages are clearly in danger of exceeding the capacity of the PPP,” and “We will argue shortly that package b is the largest package the PPP can form, even though package c also falls within the six word limit that we have tentatively been assuming for the PPP. We believe this limit to be approximately correct. Hut, as observed in footnote 1 above, the amount of lexical material The PPP can hold may be variable. There are at least two factors which might affect package size. There may be a trade-off between the number of words the PPP can hold and the number of nonterminal nodes which dominate those words. The number of nonterminal nodes is relatively high in relative clause constructions such as are involved in an example like (31). This observation is reminiscent of the hypothesis (Chomsky, 1963; Fodor and Garrett, 1967) that the perceptual complexity of a sentence depends on the ratio of its nontcrminal nodes to its lexical items. However, it (continued opposite)
The sausage machine
309
if they did, the results would be disastrous. That is, if the PPP were to begin forming one of these packages and then discovered that it could not squeeze in the last word or two, the result would be a non-constituent and the SSS would not inherit the correct units of the sentence to work with. Let us consider what would be required for the PPP to form package h. First, it would have to avoid grouping NP, with NP, and NP,. But it would refrain from making this grouping only if NP, were long enough to qualify as a separate package of its own. This predicts that a center embedded sentence with a long NP, should be easier to parse, other things being equal, than sentence (29). Similar arguments apply to the verb phrases. VP, is to be packaged with the preceding two noun phrases, but VP, and VP, are not. Also, VP, and VP, must not be packaged together. The optimal situation, therefore, should be one in which both VP, and VP, are long enough to be formed into separate packages. Apart from this, all the other constituents (viz., NP,, NP, and VP,) should ideally be short, to facilitate their being packaged together. These predictions are confirmed by a comparison of (29) with (32), which is considerably easier to parse. (32) The very beautiful young woman the man the girl loved met on a cruise ship in Maine died of cholera in 1962. Two alternative explanations of this effect of constituent length can be rejected. The first is that lengthening a constituent inevitably increases its semantic content, and thus provides semantic constraints which can facilitate the analysis. Note, however, that the extra material in (32) in no way restricts the possible pairings of noun phrases with verb phrases. Semantic facilitation may be what is at work in (33), but it is not in (32). (33) The snow the match the girl lit heated melted.
should be noted that this ratio plays only a tangential role in our own explanation of center embedded sentences. Furthermore, it looks as if there may be a conflict between assuming that complexity increases as this ratio increases, and the fundamental finding that more verbal material can be stored the more structured it is. The second possible factor affecting package size (which is not open to this objection) is the complexity of arriving ut the correct structure for a package. This would be greater, for example, if the PPP were to adopt a faulty hypothesis which it then had to revise. This will be the case in sentence (31) (even in packages like b and c which contain only two of the three noun phrases), because the Minimal Attachment principle (as noted in footnote 10 above) will tend temporarily to garden path the PPP in relative clause constructions. Assuming, therefore, that the parser’s memory capacity and computational capacity are not entirely independent of each other, it seems plausible to suppose that the PPP’s package size is not constant. And if so, then the center package in a sentence like (31) may be limited to only four or five words even though packages may be perhaps seven or eight words long in other contexts.
3 10 Lyn Frazier and Janet Dean Fodor
A second possibility is that lengthening the sentence (without increasing its degree of nesting) simply gives the parser more time to consider possible analyses and select the correct one. But it is easy to show that lengthening constituents is not a general palliative. Sentence (34) has as many long constituents as (32), but they are just the wrong ones from the point of view of the PPP. As predicted, this sentence is more difficult to process than (32), and perhaps even more difficult than (29).‘* (34) The woman the sad and lonely loved with all her heart met died.
old man the pretty
little
schoolgirl
It should be noted that there is no need, in this account of center embedded sentences, to set an arbitrary top limit of two on the number of S nodes that the parser can store simultaneously, or on the number of simultaneous applications of the clause parsing subroutine. The fact that processing difficulty rises so sharply with the degree of embedding falls out automatically. If only the inner two noun phrases and the first verb phrase can be packaged together, there will have to be two packages for a two-clause sentence, four packages for a three-clause sentence, six packages for a four-clause sentence, and so on. As the number of packages increases, so does the problem of establishing the correct package boundaries, the decision pressure on the SSS, and the chance that a package will be attached incorrectly (e.g. as a conjunct). Previous proposals about the source of the difficulty in center embedded sentences have emphasized memory limitations. Miller and Isard (1960) suggest that there is a limit on memory for re-entry addresses for the clause processing subroutine; others (such as Yngve, 1960, and Kimball, 1975) propose that memory is overloaded by retention of words and nodes in the early clause fragments while they await completion. We cannot review all such proposals in detail here, but we will illustrate, in the framework of a phrase shunting model, what we believe to be a general defect that they all share. ‘*We may also compare (31) with a sentence differs only in the length of VP*. (i) The very beautiful
young
woman
like (i), which
seems to be harder
the man the girl loved met died of cholera
to process
though
it
in 1962.
In fact, the relative constituent lengths represented by (31) appear to be the optimal ones. Since VP2 is long in (31), this suggests that it is better for the PPP to package up VP2 separately rather than to try to incorporate it into a package together with NPa, NPs and VPt In footnote 11 we suggested a possible explanation for this. It may be, also, that the parser favors certain types of packages over others. Package c, which includes VP a, is a relative clause. It may be that a relative clause, separated from its head noun phrase, is not an optimal type of package (quite possibly because the head noun phrase constitutes the ‘filler’ for the ‘gap’ in the relative clause see b’odor, 1978).
The sausage machine
3 11
Right branching constructions, unlike center embedded constructions, are easy to parse. In a right branching structure, by definition, a higher clause is complete when the clause embedded in it begins. But the parser does not always have evidence that the higher clause is complete. For all the parser can tell, a right branching sentence such as (35) might have continued as in the center embedded sentence (36). (35) I saw a boy who fully been making (36) I saw a boy who fully been making
dropped the delicate model airplane at school. dropped the delicate model airplane at school into a puddle cry.
he had so carehe had so care-
An obvious point, which any explanation of center embedded sentences should accommodate, is that, up to the word school, sentence (36) is just as easy to parse as sentence (35); it seems clear that both sentences are processed in exactly the same way. If it is assumed that clause fragments are retained in memory until it is certain that they are complete, then the need to retain clause fragments cannot be the source of difficulty in (36), for just the same fragments will be being retained in (35). If it is assumed instead that incomplete clauses can be shunted to the second stage parser (either generally, or at least when they could be complete clauses), then the memory explanation for (36) also fails. The only special characteristic of (36) would be that it requires that extra phrases can be attached into incomplete clauses after they have been shunted. It might be claimed, of course, that this is what makes (36) difficult (cf. Kimball’s Fixed Structure principle), but there is really no evidence that this is so. As we argued in connection with sentences (23) and (24) above, extra material can be attached into an already shunted constituent without any noticeable difficulty - as long as the phrase lengths are not such as to trick the PPP into forming incorrect phrasal packages. (In sentence (36) the last two constituents are too short for comfort; parsing is easier if they are lengthened, as in I saw a boy who dropped the delicate model airplane he had so carefully been making at school into the puddle of mud beside the back door reach down and pick it up by its broken tailfin.)
Our explanation of center embedded sentences does not face this problem. It accepts that both right branching and center embedded sentences can be parsed and shunted phrase by phrase, and it attributes the difficulty of center embedded sentences not to memory overload but to the problem of establishing the correct phrasal units. This has the additional advantage that it allows for the fact that center embedded constructions of different types differ considerably in perceptual complexity. Just how difficult a particular example is will depend on how many opportunities it offers for misassignment of phrase boundaries, how persistently its structure garden paths the
3 12 Lyn Frazier and Janet Dean Fodor
parser, how many nonterminal nodes have to be postulated at each step, how many constituents have been moved or deleted by transformations, and so on. The classic examples, like (29) above, have a structure which would be difficult for the parser to determine on all of these counts. One aspect of our model that we have not yet touched on is the important matter of how the PPP decides where to chunk the lexical string into phrases. It must make these decisions very fast. And it must make them with very little knowledge of what follows the package boundary; its capacity is so limited that it would be implausible to attribute to it a look ahead span of more than one or two words at most. The real crime, as we have seen, is for the PPP to package together words which do not belong together in the phrase marker. This does noticeably increase processing difficulty. Somewhat less serious mistakes would be to close a package too soon or to keep it open too long. Premature closing of a package will make more work for the SSS but may otherwise do no harm, because additional daughters can be added to a node by the SSS. For example, a verb phrase might be packaged up without the inclusion of a late arriving prepositional phrase; but if the prepositional phrase were sent over to the SSS as the next package it could still be attached beneath the VP node where it belongs. This could lead to trouble, however, if the prepositional phrase were short and were packaged together with other following words which did not belong in the verb phrase. The SSS could make the correct attachment only by undoing the work of the PPP. Trying to squeeze extra words into the current package could also be counterproductive, for it might happen that the limits of the PPP’s capacity are reached at a point which is not a natural phrasal break in the sentence. In such circumstances it would have been better for the PPP to terminate the current package a word or two sooner, and start afresh with a new phrase as a new package. The data we have presented suggest that the PPP does aim for packages which are as long as possible. But to operate efficiently it must presumably know when it is soon going to have to impose a break, and it must have some way of optimizing its chances of making a legitimate break. It is possible that this is what lies behind Kimball’s New Nodes principle. New Nodes states that “the construction of a new node is signalled by the occurrence of a grammatical function word”. There are numerous problems with this principle as it stands. The definition of a “new node” is quite unclear; Kimball’s examples are so varied that it may simply be any nonterminal node (other than a lexical category node) whose introduction into the phrase marker is occasioned by some lexical item in the sentence. If so, we need to know why a natural language should contain special signals for these nodes, and why these signals should be grammatical words rather than content words like nouns and verbs and adjectives.
The sausage machine
3 13
In some cases the value of a specific lexical signal for a nonterminal node is clear. A sentence such as He knew the girl in the bakeshop was hungry contains a temporary ambiguity. Because know (like many other verbs in English) can take either a simple direct object or a complement clause, the parser will not know whether to attach the girl in the bakeshop directly beneath the VP node, or beneath an S node subordinate to the VP. If the parser tends not to insert the S node in these circumstances (see the discussion of Minimal Attachment in Section IV below), then the phrase marker will have to be corrected subsequently when the verb phrase was hungry is encountered. The complementizer that before the noun phrase would have resolved the temporary ambiguity and permitted the parser to avoid this error. Hakes’ (1972) study of sentences with and without complementizers confirmed that their presence does indeed facilitate parsing. It is much less clear for many other grammatical words what special role they might play in parsing (over and above their obvious linguistic role of distinguishing singular from plural, conjunction from disjunction, the various spatial relations from each other, and so on). A determiner, for example, requires an NP node to be introduced into the phrase marker, but this only rarely resolves a temporary ambiguity, and in any case the need for an NP node is just as clear if a noun phrase begins with a noun as if it begins with a determiner. Prepositions undeniably signal PP nodes, but they do not resolve the very common uncertainty about whether a prepositional phrase should be attached within a noun phrase or directly under the VP. A conjunction such as and or OY demands a superordinate node over both conjuncts, but it arrives after the first conjunct and will often be too late to prevent that conjunct from being incorrectly attached into the phrase marker without the superordinate node above it. Similarly, as Kimball observed, grammatical words typically follow the phrases to which they are adjoined in SOV languages, which suggests - somewhat surprisingly - that natural languages do not try very hard to save the parser from having to go back and insert nodes over constituents that have already been processed. Despite these problems, some part of New Nodes can be salvaged in the sausage machine model. It is not implausible that grammatical words should have a special signalling function that content words lack. Grammatical words are members of small closed lexical classes. A parser which is able to distinguish these words in the lexical string l3 therefore has access to a very superficial source of information about phrasal structure, which it could make use of before any extensive syntactic analysis has been performed. 13Bradley (1978) provides lexical retrieval system.
evidence
that
“closed
class” lexical
items are contacted
through
a special
3 14
Lyn Frazier and Janet Dean Fodor
This could be of value to a ‘detective’ style parser, which gathers up all the words in a clause and processes them together as a sort of structural anagram. Determiners, prepositions, conjunctions and the like would provide immediate clues as to what sorts of phrases the clause contains. These superficial signals would also be useful to the PPP of our model, which needs to be able to look at the next word or two in the sentence and make a rapid decision about whether or not to try to squeeze it into the current phrasal package. As an injunction to the PPP, New Nodes could be reformulated as: if the phrasal package under construction is approaching the limit of (for example) six words, close it at (i.e., to the left of, in English, but to the right of, in Japanese or Turkish) the next grammatical word, and group subsequent words into a new package. Our suggestion, then, is that whether or not grammatical words signal the existence of higher nodes, they signal likely points at which to chunk the lexical string into packages. This reinterpretation of New Nodes does not imply that every grammatical word triggers the closure of a phrasal package. Rather, the signal will be made use of just in case closure is independently going to be necessary quite soon. Thus phrases like the man in the top hat or that Sam may scream could be packaged up as units by the PPP despite the presence within them of determiners, prepositions, and auxiliary verbs. In fact, we see no reason to restrict the PPP to introducing only certain types of nodes (e.g., only lexical and non-clausal phrase nodes). It will very often happen in practice that higher nodes such as S are introduced only by the SSS. But as long as a sentence or clause is short enough to fall within the scope of the PPP, we assume that the PPP can supply its S node. The distinction between the two parsing units is therefore not a matter of what kinds of operation each can perform, but only of how much computation each is able to perform on any given sentence. As noted earlier, the sausage machine model differs in this respect both from Kimball’s, in which the first stage parser assigns all nodes in the phrase marker, and from Fodor, Bever and Garrett’s, in which the first stage parser assigns all and only within-clause phrasal structure. The effects of constituent length on the right association phenomenon suggest that the first stage parser’s operations are indeed governed by constituent length rather than by constituent type. And the experiments which have been taken to show the special significance of clausal boundaries (Caplan, 1972, and others reported in Fodor, Bever and Garrett, 1974) must at least be interpreted in the light of Carroll and Tanenhaus’s (1975) demonstration of similar effects for long non-clausal noun phrases.14 14There is also, of course, the possibility parsing only because propositional units tence.
that clousul units appear are of special importance
to play a special role in syntactic in the semuntic analysis of a sen-
The sausage machine
IV. The Farsightedness
3 15
of the Parser
In this section we will do our best to defend two very strong claims about the human parsing mechanism: (a) that the division of the parser into the PPP and the SSS is the on& source of constraint on the structural hypotheses it considers; (b) that this division of the parser is the only source of constraint on the sequence of attachment operations it performs in building up a phrase marker. (a) amounts to the claim that there are no special strategies that the parser uses to select a hypothesis to pursue at a choice point in the sentence; (b) amounts to the claim that there is no special schedule that requires the parser to enter some nodes in a phrase marker before the others. Later, we will consider some apparent counterevidence to both of these claims, but we will argue that it is simply an automatic consequence of the time and memory pressures that the parser is subject to. Both (a) and (b) are intended to apply to both the PPP and the SSS. The PPP has its weaknesses, as we have argued, but these can all be attributed to its limited capacity. So far we have had little to say about the SSS, except that it does not make the kinds of shortsighted decisions characteristic of the PPP. The assumption that the SSS has virtually unlimited capacity is motivated by the fact that some part of the human parsing mechanism is extremely good at keeping track of syntactic dependencies which span longer stretches of the sentence than the PPP can accommodate. For example, a question beginning with a WH-phrase must have a corresponding ‘gap’ in the deep structure position of that phrase. If there is no gap, as in (39), the ungrammaticality of the sentence is easy to detect. (39) *Which student
did John take the new instructor
to meet the dean?
People are also quick to notice the ungrammaticality of a sentence like (40) which contains a gap where it should not, i.e., in which an obligatory constituent is absent, and its absence is unaccounted for by any transformational rule. (40) *John took the new instructor
to meet.
A parser which did no more than attach incoming words into the phrase marker in accord with the rules of the grammar would serenely analyze a non-sentence like (40) as if it were well-formed. The receipt of each lexical item would occasion the assessing of grammatical rules to determine what nodes may legitimately appear above it, and any item which could not be integrated into the phrase marker in accord with these rules would be readily detected. But the absence of a lexical item would not occasion any rule assessing, and so the parser would be unaware that a constituent required by
3 16
Lyn Frazier and Janet Dean Fodor
the rules was missing. The fact that people are not misled by sentences like (40) shows that any such model is incorrect. The human parsing mechanism not only processes what it does receive but also makes predictions concerning what it is about to receive. We note in passing that ATN parsing models have structural predictions built into their network of arcs. The network simply cannot be traversed if an obligatory constituent is missing, because the arc corresponding to that constituent will be the only ‘bridge’ across part of the network. ATN parsers are therefore efficient at detecting ungrammaticalities of omission as well as those of other kinds. However, in Section I we argued that current ATN models are deficient insofar as the phrase marker that is being constructed is not available as input for decisions about subsequent words in the sentence. It also seems likely that an ATN parser, with its rigidly sequenced arcs, will have difficulty in recovering from ungrammaticalities and making sense of the sorts of partially scrambled sentences that are so common in every day conversation. The human parsing mechanism can detect ungrammaticalities but is not devastated by them. We believe it to be one of the assets of our model that it can reconstruct this resilience. The PPP will continue to form its phrasal packages, and the SSS will continue to find appropriate locations for them in the overall predictable structure of the phrase marker, even if some aspects of the structure are indeterminate because earlier packages were misordered or absent or of the wrong type. We propose to permit both the PPP and the SSS to postulate obligatory nodes in the phrase marker as soon as they become predictable, even if their lexical realizations have not yet been received. The PPP, for example, upon receiving a preposition or an obligatorily transitive verb can enter an NP node as its right sister before any elements of the noun phrase have been located in the lexical string. A prediction which is more likely in practice to be made by the SSS is that a sentence beginning Either John is . has a coordinate structure and must therefore contain a second S node as sister to the S node over the first clause. If these predicted nodes should continue to dangle for lack of any corresponding lexical items in the sentence, they will signal ungrammaticalities of omission. I5 They will also sometimes serve to resolve what would otherwise be temporary ambiguities in sentences. The role of the word was in the sentence fragment That the ~mmgest of tllc children wus proved to __. is unambiguous. The complement clause must “The abscncc of an obligatory constituent does not always result in ungrammaticality, for it may constitute a ‘pap’ from which a constituent was moved or deleted by a transformational rule. In this paper WC have not discussed the special parsing routines required for transformed sentences. These are examined in detail in I:odor (1978, in press), where it is argued that the parser uses its ability to predict constituents in order to detect these gaps on-line.
The sausage machine
3 17
verb phrase, its position the phrase requires that the lexical this verb should precede verb phrase the main Therefore was to... can be attached the suborclause, not the main But either would appear to be legitimate to a parser which did not enter the predictable subordinate VP node before attempting to connect was into the phrase marker. This is only one of innumerable examples in which node prediction can save a parser from the danger of being garden pathed by potential attachment ambiguities. And informal evidence suggests that people do not respond to such sentences as if they were temporarily ambiguous; the impossible attachment seems never to be contemplated. (Note that in the example above this cannot be accounted for in terms of semantic constraints, and the constituent lengths are such that right association by the PPP is an unlikely explanation.) It seems, therefore, that the human parsing mechanism does anticipate predictable aspects of the phrase marker. The prediction of nodes is an accomplishment usually associated with top down parsers. We have proposed a model in which phrase marker construction proceeds bottom up, in the very special sense that the PPP supplies the nodes immediately above lexical items before the SSS connects the resulting phrasal package together with others by means of higher nodes. But claim (b) states that, within each stage of the parser, processing is not necessarily bottom up, nor is it necessarily top down or governed by any externally imposed schedule. Pure bottom up parsing is governed by what we will call a No Incomplete Nodes principle, which stipulates that a node may not be entered into the phrase marker until all of its daughter nodes have been established. This concentrates the work of building the phrase marker rather late in the processing of the sentence, and it also precludes all sorts of potentially useful predictive activity. Since higher nodes cannot be entered until after all the words they dominate have been received, these nodes cannot be made us of for the ‘forwards’ prediction of words or nodes within that portion of the lexical string. Pure top down parsing is governed by what we might call a No Orphaned Nodes principle, which stipulates that a node may not be entered into the phrase marker unless all of the higher nodes which connect it to the top S structural decisions node have also been postulated. l6 This concentrates 161n an extreme form, top down parsing is governed by a constraint on hypothesis formation than. just a constraint on the entry of nodes into the phrase marker. That is, it requires that the should predict the next lexical item and all of the nodes which link it into the phrase marker, basis of the nodes which are already present. The only role of the lexical item which actually next in the lexical string is to confirm or disconfiim this prediction. (Continued overleaf.)
rather parser on the occurs
3 18 Lyn Frazier and Janet Dean Fodor
towards the beginning of the parsing process. Since it demands ‘upwards’ prediction of higher nodes, it does permit considerable ‘forwards’ prediction of their forthcoming daughter nodes. l7 But it has the disadvantage that errors of analysis may result from the parser’s beingforced to make decisions about higher nodes before the lexical string makes clear what the right decision is. For example, the parser must decide how to attach the word that in the sentence fragment John told the girl that silly old-fashioned... before processing subsequent words (such as __. joke yesterday, or joke had offended a more amusing story, or . jokes had become the latest craze) which could indicate whether that was a demonstrative determiner, a complementizer introducing a relative clause, or a complementizer introducing a complement clause. Kimball (1975) rejected both pure bottom up and pure top down schedules. He observed that top down parsing cannot be error-free for natural languages since these contain left-recursive structures, such as noun phrases with other noun phrases as left daughters. A noun phrase may therefore be recognizable as a noun phrase, but carry no indication about how many nodes intervene between it and the top S node. Kimball then considered two intermediate schedules. A Predictive Analyzer (somewhat misleadingly named) enters higher nodes above a lexical item only up to some specified height (e.g., first node above, first two nodes above, all nodes up to and including the first S node above). An Over the Top parser makes forwards as well as upwards predictions; it enters not only a dominating node but also that node’s next daughter node. But neither of these schedules
Top down hypothesis formation, in this sense, would be extremely inefficient for natural languages, in which there is a considerable degree of both syntactic and lexical choice at almost every point in a sentence. The No Orphaned Nodes principle, by contrast, permits the parser to formulate its structural hypotheses on the basis of incoming lexical information as well as information about the partial phrase marker that has already been constructed. For example, if the current partial phrase marker is compatible with either an NP node or a PP node as sister to the verb in the verb phrase, the parser need not consider the PP possibility if the next lexical item is Ihe, which can only be dominated by Det and NP. Lexical category ambiguities certainly exist in natural languages. But typically the nodes which are most easily determined are those which immediately dominate an incoming lexical item, and those which are immediately dominated by nodes already present in the phrase marker. The greatest uncertainty attends the linking nodes which mediate between these. (Note that it is the parser’s decisions about these unpredictable linking nodes which are governed by the Minimal Attachment principle that we propose below.) 17Thcre are variants of top down parsing in which forwards predictions are prohibited by what might be called a No Dangling Nodes principle: a node may not be entered into the phrase marker until a lexical item has been received for that node to dominate (either directly, or indirectly via intermediate nonterminal nodes). Kimball (1973, 1975) appears to have been using the term “top down” in this restricted sense. But the No Dangling Nodes principle would seem to deprive a parser of the main benefit of top down parsing, which might otherwise have compensated for the inevitable risks.
The sausage machine
3 19
gets to the heart of the deficiencies of pure bottom up or top down parsing schedules. They retain some of the disadvantages of each, postulating fewer nodes than they could in some contexts, and more nodes than is safe in others. The characteristic property of natural language sentences is the variable predictability of parts of their phrase markers. There are no fixed parameters y1 and m such that exactly n upward nodes and exactly m forward nodes are predictable for each lexical item in each sentence. It is illuminating to consider the sentence fragment (41). (41) John put the mustard
in the . . .
A parser making full use of the rules of English grammar could establish with certainty the partial phrase marker (42).
‘vp’\t i John
r
1
Put
/“\
Det I the
i
mustard
/“\ i T\ in
Det..
.N..
I
That is, the parser knows that the lexical string must contain another noun as daughter to the NP within the PP, but it does not know how many more words will precede this noun. It knows that the verb phrase must contain an NP and some sort of locative phrase, but it does not know whether these predictable nodes can be identified with the nodes over the mustard and in the... . (The PP node might instead turn out to be a sister to the NP node, with another NP node above them both; this higher NP node might be conjoine! with another, under yet another higher NP node; and so on.) Thus the parser can establish three nodes above the word put but only two nodes
320
Lyn Frazier and Janet Dean Fodor
above the first the; it can establish the next sister node to put but not the next sister to the second the; it can establish the existence of a second sister to put but not its identity except within certain limits; it can establish the identity of a sister to the second the but not its serial position. It is hard to imagine any sort of fixed schedule which could permit a parser to represent all the secure facts here without simultaneously forcing it to make dangerous guesses about other unpredictable aspects of the phrase marker. To cope with natural languages with the maximal blend of reliability and efficiency a parser must be purely information-driven, permitted to build up the phrase marker in any order at all and to enter a given node no sooner and no later than it can confidently do so on the basis of the lexical string. Efficiency and reliability are notable characteristics of the human parsing mechanism but they are not the only relevant considerations. The information-paced parser will not attach a node until it is certain how to attach it. But there is no guarantee that a temporary attachment ambiguity ever will be resolved by the words in the sentence. And even when it is, the parser has no way of knowing how long it must wait for disambiguating evidence. (This is especially true for attachment ambiguities at higher levels of the phrase marker, because the higher a node is, the longer the stretch of the lexical string it can span.) Given that unstructured verbal material is more costly to store in memory than structured material, the waiting strategy may turn out to be even more dangerous than a guessing strategy. Unless nodes are attached together in some fashion, they may be lost to the parser altogether. And there is abundant evidence in the psycholinguistic literature that the human parsing mechanism does make structural decisions in advance of the evidence. The guesses that it favors are usually attributed to specific strategies, or rankings of alternative hypotheses, which guide the parser’s activities at choice points in the sentence. In fact, most or all of the familiar examples can be accounted for by one very general strategy, and this strategy can then be explained away in terms of the demand characteristics of the parser’s task. The general strategy is what we will call the Minimal Attachment principle. This stipulates that each lexical item (or other node) is to be attached into the phrase marker with the fewest possible number of nonterminal nodes linking it with the nodes which are already present.18 Minimal Attacht8A similar, though weaker, principle is proposed in Kimball (1975, p. 164). “In formal terms, the machine is seeking a path (from some postulated mode) to the root [S node] , where certain symbols may be repeated. Let us define an equivalence relation between paths, so that two paths are equivalent if they differ only in repetition of a given symbol. Thus, the paths Det NP S, Det NP NP S, and Det NP NP NP S S S are all equivalent. We can then pick a canonical representative of such an equivalence class to bc the most collapsed string, in this case the first.” It is this most collapsed string of nodes that the parser is assumed to postulate when there is a choice. Note that, unlike Minimal Attachment, this principle would not favor the string Det NP S over the longer string Det NP PP S, and it is unclear whether it is intended to favor Dct NP VP S over Det NP S VP S.
The sausage machine
ment accounts
for the preference,
noted in Section II, for the attachment
32 1
of
for Susan directly beneath the VP node in a sentence like John bought the book for Susan, where the verb phrase is short enough so that the PPP’s ten-
dency towards local attachment into the object noun phrase is not operative. Minimal Attachment accounts for the preference, noted in the discussion of New Nodes, for the direct object analysis of the noun phrase in a sentence fragment like We knew the girl... even though this phrase might equally well be the subject of a complement clause. It accounts for the preference, observed in center embedded sentences, for a conjunctive analysis of an NP NP sequence, rather than an analysis in which the second NP begins a relative clause modifying the first one. It accounts for the preference (noted by Bever, 1970, Bever and Langendoen, 197 1, Chomsky and Lasnik, 1977) for analyzing a clause as main rather than subordinate wherever possible. It accounts for the preference (noted by Wanner, Kaplan and Shiner, 1975) for the analysis of a that-clause which follows a noun phrase as a complement clause rather than a relative clause. It even predicts certain lexical category decisions (though perhaps these are also influenced by the nature of lexical access operations). For example, it predicts that the word that is more easily interpreted as a determiner than as a complementizer in a context like That silly old-fashioned joke/jokes . . . . And it predicts that the verb raced in The horse raced past the . . . is more readily interpreted as an active intransitive verb in the main clause than as the passive participle of a transitive verb in a reduced relative clause modifying the horse. Frazier (1978) provides experimental evidence for the operation of Minimal Attachment in a variety of different constructions. These preferences of the parser are less extreme, perhaps, than those due to the PPP’s packaging routines; they can be swayed to some extent by the content of individual sentences. But their overall direction is very clear. Regardless of what sort of constituent is to be attached, or what the alternative attachments are, the simplest attachment is always the one that is favored. For a parser which is obliged by memory limitations to make structural decisions in the absence of sufficient evidence, Minimal Attachment would be a very rational strategy to adopt. For one thing, the minimal attachment analysis will make the least demand on memory; even if it does turn out to be wrong, it will have been less costly than some other wrong analysis. Furthermore, trying the simplest attachment first ensures that revisions will be uniform - they will all consist of adding extra nodes to the phrase marker. (A Maximal Attachment strategy would also permit orderly revision procedures, but maximal attachments are of course not well-defined in a grammar with recursion.) Minimal Attachment also presupposes minimal rule
322
Lyn Frazier and Janet Dean Fodor
accessing. If the well-formedness conditions are mentally represented in the form of phrase structure rules,19 each node between a lexical item and the top S node of the phrase marker will require the accessing of another rule, to determine in what configurations it may properly appear. For example, if an NP node has been established at the beginning of a sentence, Minimal Attachment will require it to be entered as immediate daughter to the top S node, by reference to the rule S + NP - VP. If additional intervening nodes were postulated instead, they would have to be checked against further rules such as NP + NP - conj - NP, or NP + Det . N and Det + NP - ‘s. This last observation about Minimal Attachment is what permits it to be dispensed with as an independent strategy. We need only suppose that the structural hypothesis which the parser pursues is the first one that it recognizes. Establishing the legitimacy of the minimal attachment of a constituent will take less time than establishing the legitimacy of a long chain of linking nodes. In normal conversational contexts sentence parsing has to be performed very rapidly, with little leeway provided by the constant arrival of new words to be processed. It is therefore not at all ad hoc for the parser to pursue whichever structural hypothesis most rapidly becomes available to be pursued, quite apart from the fact that this will also be the easiest one to store and the easiest one to correct if wrong. 19We have argued that, when making its subsequent decisions, the executive unit of the parser refers to the geomerric arrangement of nodes in the partial phrase marker that it has already constructed. It then seems unavoidable that the well-formedness conditions on phrase markers are stored independently of the executive unit, and are accessed by it as needed. That is, the range of syntactically legitimate attachments at each point in a sentence must be determined by a survey of the syntactic rules for the language, rather than being incorporated into a fixed ranking of the moves the parser should make at that particular point, as in an ATN parser. We have no direct argument to offer in favor of the further assumption that the well-formedness conditions for the language are stored in the form of phrase structure rules, though such rules have, of course, proved particularly suitable for the linguistic description of natural languages. They serve, however, to characterize only deep structure phrase markers, not the surface phrase markers which determine the actual sequence of lexical items in a sentence. Fodor (1978, in press) argues that these rules are nevertheless applied by the parser to the surface forms of sentences and that discrepancies between surface and deep structures are resolved by restoring constituents which have been deleted or moved from their original deep structure positions. It is worth noting that transformational dependencies often extend across more words in a sentence than the PPP can accommodate, and that the major burden of determining how to fill in transformationally induced gaps in sentences will therefore fall on the SSS. The prior operations of the PPP would therefore have to be guided by a superset of the phrase structure rules for well-formed deep structures. These rules would allow for transformationally moved and deleted constituents in the surface forms of sentences, but would also, inevitably, let through some similar structures which happen not to be legitimate. This may account for the fact that the PPP, at least when under pressure, is apparently capable of forming some very strange phrasal packages. For example, in the sentence He took the hat, the gloves, the coat and the vest off, there is a tendency to group the words the vest off together even though the lexical and syntactic constraints of English do not allow this as a possible phrase.
The sausage machine
323
It is particularly interesting that for this explanation to go through, it is not even necessary to suppose that the human parsing mechanism considers alternative hypotheses in serial rather than in parallel. Its goal might be to pursue all the legitimate hypotheses simultaneously. But because the alternatives are recognized at different speeds, its parallel processing of them would be staggered. Only the first of them might be available by the time the very next words in the sentence had to be processed. And if further computation is cut short by the arrival of new items, demanding yet more structural decisions, there is the possibility that the minimal attachment will be the only one which is ever recognized. (Without committing ourselves to this idea, we offer it as a possible explanation for the conflicting experimental data on the serial or parallel processing of ambiguous sentences.) We have shown that claims (a) and (b) are both false as they stand. The structural hypotheses that the parser pursues are systematically restricted by the Minimal Attachment principle as well as by the shortsightedness of the PPP. The sequence in which the parser postulates nodes in the phrase marker is governed by more than the bottom up relation between the two stages, since there is some pressure towards early postulation of higher nonterminal nodes as in a top down system. Nevertheless, we have argued that these further restrictions emerge naturally or inevitably from general limits on the memory and time available for sentence parsing under normal circumstances. Even with this modification, our claims for the explanatory value of the two stage model will no doubt turn out to be too strong, but it is remarkably difficult to come by any clear evidence for further constraints on what the parser may do and when. The model that we have proposed falls somewhere between the ‘detective’ model of Fodor, Bever and Garrett, and the more rigidly constrained models inspired by the development of parsing systems for computer languages. Detective models are also driven primarily by the availability of information in the sentence rather than by an externally imposed schedule. But the Fodor, Bever and Garrett parser must have a very considerable decision lag if the internal structure of a clause is not decided until all the words of a clause are available to be juggled into a best-fit structure. Detective models also appear to presuppose some kind of internal attention shifting mechanism, which is governed by strategies of its own, and determines which clues to the structure of the sentence will be attended to first or should be given more weight in cases of conflict. The greater flexibility of such models makes it considerably more difficult to predict exactly what moves the parser will make in response to a given lexical string. These richer models are therefore very difficult to put to a detailed empirical test. The sausage machine model, as we have tried to show, makes some rather precise predic-
324
Lyn Frazier and Janet Dean Fodor
tions, and accounts, with the fewest number of ad hoc assumptions, for the peculiar mix of blindness and intelligence that is observed in the human parsing mechanism. References Bever, T. G. (1970) The cognitive basis for linguistic structures. In J. R. Hayes (ed.) Cognition and iha Development of Language, John Wiley, New York. Bever, T. G., and D. T. Langendoen (1971) A dynamic model of the evolution of language. Ling. Inq., 2,433463. Blumenthal, A. L. (1966) Observations with self-embedded sentences. Psychon. Sci, 6, 453-454. Bradley, D. C. (1978) Computational distinctions of vocabulary type. Unpublished doctoral dissertation, MIT. Caplan, D. (1972) Clause boundaries and recognition latencies for words in sentences. Percep. Psychophys., 12, 73-76. Carroll, J. M., and M. K. Tanenhaus (1975) Functional clauses are the primary units of sentence segmentation. Unpublished paper distributed by Indiana University Linguistics Club. Chomsky, N., and H. Lasnik (1977) Filters and control. Ling. Znq., 8, 425-504. I:odor, J. A., T. G. Bever and M. F. Garrett (1974) The Psychology of Language, McGraw-Hill, New York. Fodor, J. A., and M. F. Garrett (1967) Some syntactic determinants of sentential complexity. Percep. Psychophys., 2, 289-296. Fodor, J. D. (1978) Parsing strategies and constraints on transformations. Ling. Inq., 9, 427-473. Fodor, J. D. (in press) Superstrategy. In W. E. Cooper and E. C. T. Walker (eds.) Sent&e Processing. Psvcholinnuistic Studies Presented to Merrill Garrett. Lawrence Erlbaum Associates, Hillsdale, N:J. Frazier, L. (1978) On comprehending sentences: syntactic parsing strategies. Unpublished doctoral dissertation, University of Connecticut. Hakes, D. T. (1972) Effects of reducing complement constructions on sentence comprehension. J. verb. Learn. verb. Behav., II, 278-286. Kaplan, R. (1972) Augmented transition networks as psychological models of sentence comprehension. Artif. Intell., 3, 77-100. Kimball, J. (1973) Seven principles of surface structure parsing in natural language. Cog., 2, 15-47. Kimball, J. (1975) Predictive analysis and over-the-top parsing. In J. Kimball (ed.) Syntax and Semantics, Volume 4, Academic Press, New York. Limber, J. (1970) Toward a theory of sentence interpretation. Quarterly Progress Report of the Research Laboratory of Electronics, MIT, January, No. 96. Marcus, M. (1977) A theory of syntactic recognition for natural language. Unpublished Ph.D. dissertation, MIT. Miller, G. A. and S. Isard (1964) Free recall of self-embedded English sentences. Inform. Control., 7, 292-303. Thorne, J., P. Bratley and H. Dewar (1968) The syntactic analysis of English by machine. In I). Michie (ed.) Machine Intelligence, American Elsevier, New York. Wanner, E., R. Kaplan and S. Shiner (1975) Garden paths in relative clauses. Unpublished paper, Harvard University. Wanner, E. and M. Maratsos (1974) An ATN approach to comprehension. Unpublished paper, Harvard University. Woods, W. (1970) Transition network grammars for natural language analysis. Comm. ACM, 13, 591-602. Yngve, V. H. (1960) A model and an hypothesis for language structure. Proc. Am. Phil. Sot., 104, 444466.
The sausage machine
325
Resume Dans cet article on propose un mecanisme de segmentation des &on&s qui assigne en deux &tapes une structure syntagmatique aux suites de mots. La premiere methode de segmentation assigne des noeuds lexicaux et syntagmatiques a des suites de 6 mots environ. La seconde ajoute des noeuds i un niveau superieur pour lier ces blocs syntagmatiques et obtenir ainsi un marqueur syntagmatique complet. Ce modele de segmentation est compare d’une part aux modeles ATN et d’autre part au modele en deux &tapes de Kimball (1973) et Fodor, Bever et Garrett (1974). Nous pensons que les unites qui passent du ler au 2; niveau sont caracterisees par leur longueur plutot que par leur forme syntaxique. Ceci expliquerait les effets de la longueur des constituants sur la complexite perceptuelle des phrases enclassees et des phrases du type de celles qui tombent sous le principe de l’association i droite de Kimball. La distinction specifique du travail entre les deux unites de segmentation permet d’expliquer, sans faire intervenir des strategies ad hoc, certaines erreurs de segmentation meme si, en g&ukal, il est possible de faire un usage intelligent de toutes les informations disponibles.
Cognition, 6 (1978) 327-351 @Elsevier Sequoia %A., Lausanne
- Printed
in the Netherlands
A review* of John R. Anderson’s Language, Memory, and Thought KENNETH University
WEXLER of California,
Irvine
Until 15 or 20 years ago if a linguist were to look to psychology for a discussion of psychological issues related to language, the most likely place to look would be the field of “learning theory”. More recently, however, there has grown up a field called “cognitive psychology” which concerns itself with “language, memory and thought”, among other concepts. Whereas learning theory, for the most part, concerned itself with fairly simple and relatively unorganized instances of behavior, cognitive psychology typically acknowledges that structure and organization are an important, crucial part of the human being’s competence. A dominant approach within cognitive psychology is modeling based on a conception of how a computer works. We can call this field “information processing psychology”. Anderson’s book is ambitious in that it “... presents a theory about human cognitive functioning, (and) a set of experiments testing that theory . ..“. Anderson’s work has the virtue that he attempts to be precise and even formal about a variety of matters. Compared to much other work in information processing psychology, it is possible to understand much of what is presented, without resorting to metaphor and guesses about what is intended. There is probably no other work in the field of information processing psychology of comparable scope which specifies its ideas so clearly. To my mind, these are important considerations. It is also my impression that many (of course not all) information processing psychologists consider Anderson’s work to be at the cutting edge of the field, to embody what they are most proud of. Language, Memory and Thought (henceforth, LMT) may perhaps be looked on as the state-of-theart book. In addition, LMT discusses a number of methodological and analytical tools from a variety of areas which might be useful in the background and technical arsenal of a cognitive psychologist or linguist.
*The preparation of this review was facilitated by National Science Foundation Grant NSF SOC 74-23469. I would like to thank Peter Culicover, William Batchelder, Reuven Brooks, Noam Chomsky, W. K. Estes, Tom Nelson, and W. C. Watt for comments leading to an improvement in the presentation of the material in this review. None of these are responsible fo; my views. Requests for reprints should be addressed to Kenneth Wexler, Programme in Cognitive Science, School of Social Sciences, University of California, Irvine, Calif., U.S.A.
Given this assumption of the status of the book within the field, it is perhaps surprising that the conclusion that one reaches after reading the book is: (1) Remarkably little is known about the range of processing issues discussed in LMT. (2) There is not much prospect of adding to scientific knowledge by pursuing the methods represented in LMT. (3) At least one of the ablest practitioners in the field (Anderson himself) has considerable (principled) doubts about the possibility of doing what he and others are trying to do. (4) There is remarkably little that a linguist (or even a psychologist) could learn by reading LMT.
ACT as a Theory 1.1. Summary The purpose
of Cognition
of ACT of LMT is to
. present a model in the cognitive psychology tradition . . . . This model is an attempt to provide an integrated account of certain linguistic activities - acquisition, comprehension, and generation of language, reasoning with linguistic material, and memory for linguistic material . .. . This book is greatly concerned with the concept of memory, which has undergone drastic change in the last decade in Cognitive Psychology. It used to refer to the facility for retaining information over time. Now it more and more refers to the complex structure that organizes all our knowledge. (2) After a few chapters discussing related work, Anderson describes his theory (called ACT) in some detail. Then follow chapters which are intended to apply ACT to various human abilities, including “the activation of memory”, “inferential processes”, “learning and retention”, “language comprehension and generation”, and “induction of procedures”. In this review I will be mostly concerned with the implications of LMT for the study of language. The length of the work, together with space constraints here, dictate that a number of topics will not be covered. Mostly I hope to provide a discussion of the significance and possibilities of the kind of work represented by LMT. The ACT model is intended to be an improvement on the HAM model of Anderson and Bower (1973). An important element of ACT is that it “involves the integration of a memory network with a production system. The memory network is intended to embody one’s propositional knowledge about the world. The production system is intended to embody one’s procedural knowledge about how to perform various cognitive tasks” (p. 3).
Review of Language, Memory and Thought
329
The memory networks which represent propositional knowledge are finite labeled graphs. For example, ACT would represent the “propositional content” of “John hit Mary” as in (1). (This is Anderson’s Fig. 5.2, p. 148).
(1)
U
S
/ X
I
W
John
/7
P V
R
A
/\ Y
2
W I Hit
W I M&-y
In (l), u, v, x, y and z are nodes, while S, R, A, and W are labels on the links of the network. There are a fixed number of link labels which are used in constructing all networks. S is to be interpreted as “subject”, P as “predicate”, R as “relation”, A as “argument”, and W as “word”. The label W indicates that the lexical item on one end of the arrow corresponds to the object represented by the node on the other end of the arrow. Clearly, all of this is familiar territory to the linguist or philosopher concerned with semantics, even if they haven’t thought of semantic representations as “networks”. Anderson discusses a number of details which need not concern us. In general they are not surprising. Anderson even (Chapter 7) gives a formal semantics of ACT representations. As an instance, the interpretation of a subject-predicate construction is that the subject is a subset of the predicate and is the only construction in ACT that “bears a truth value” (155). The second part of Anderson’s theory is a “production system”. According to Anderson, these derive (their name, and some of their properties) from the production systems of Post. These are sometimes known in mathematical linguistics as “unrestricted rewriting systems”. All generative grammatical descriptions stated in terms of rewrite rules are examples of such Post systems. For example, phrase-structure grammars are formally production systems. But ACT’s “production system” is actually a special, restricted kind of Post system. The system is to be thought of as a specification of how changes are to be carried out in “memory” as processing proceeds. “All productions are condition-action pairs. The condition specifies a conjunction of features that must be true of memory. The action specifies a sequence of
330 Kenneth Wexler
changes to be made to memory”. The “conditions” perform operations like checking to see whether a variable has a particular value whereas the “actions” perform such operations as creating nodes and changing values of variables. A particular “sub-system” will typically contain a number of (ordered) productions. For example, the production system for parsing noun phrases given in Table 4.3 (p. 145) contains five productions. Thus, ACT’s production systems have something of the flavor of grammatical transformations (in terms of formalisms, not in terms of their theoretical status, since transformational grammar is not a processing theory). But actually there are a number of specific technical differences between transformations and productions, so that the analogy is not particularly useful. Anderson provides a grammar for productions (Table 6. I, p. 184) which specifies exactly what objects productions can be. This is certainly a marked advance in preciseness in comparison to much work in artificial intelligence (a related, though different field), where often simply a particular program intended to simulate or actually to represent an ability is given, without specification of the class of constructs allowed by the theory. The grammar of productions is to be compared to a theory of language, whereas a particular production system (allowed by the grammar of productions) is to be compared to the grammar for a particular language. An important aspect of the notion of “production” is that it is supposed to be a theoretical construct underlying all kinds of (“non-propositional”) processing, not only linguistic processing. For example, Anderson demonstrates a production system which is supposed to simulate the behavior of a subject under the “Sternberg paradigm”, perhaps the most well-known experiment in cognitive psychology. In this experiment, a subject hears a list of digits. Then he hears a “probe” digit and has to respond whether or not the probe was on the original list. The crucial dependent variable is response time. ACT has some of its features delineated. We will not discuss most of them in any detail. These features include assumptions about the strength of productions, selection of productions, application of productions, and the strengthening of productions. 1.2. The Goals of Cognitive Psychology The most difficult question to answer about LMT is, what has been accomplished? On the surface there are a number of ‘“results” in LMT. These include mathematical and conceptual results and experimental results, including the fitting of models to data, sometimes relatively successful in the goodness-of-fit sense. Nevertheless one is left with the feeling that nothing of importance has been decided, or even analyzed. This is true
Review of Language, Memory and Thought
33 1
despite what I consider to be an important methodological goal, the use of precise and formal methods in the analysis of human cognition. The problem is that the formalism does not bear on crucial problems. There has sometimes been an unfortunate tendency in some areas of psychology to substitute the pursuit of particular methodologies for the pursuit of results. In my experience many information processing psychologists don’t even think that it is possible to achieve results or discoveries in this field. Anderson states a version of this belief himself. He writes The goal of a cognitive theory might be stated as the understanding of the nature of human intelligence. One way to achieve an understanding would be to identify, at some level of abstraction, the structures and processes inside the human head. Unique identification of mental structures and processes was once my goal and it seems that it is also the goal of other cognitive psychologists. However, I have since come to realize that unique identification is not possible. There undoubtedly exists a very diverse set of models, but all equivalent in that they predict the behavior of humans at cognitive tasks. Realization of this fact has had a sobering influence on my research effort, and caused me to reassess the goals of a cognitive theory and the role of computer simulation. (4) After proposing a number of arguments for the position that “unique identification is not possible”, Anderson offers as a substitute goal the criterion that a theory have “practical application”. He writes I am less interested in defending the exact assumptions of the theory and am more interested in evolving some theory that can account for important empirical phenomena. By a theory that accounts for ‘important empirical phenomena’ I mean one that addresses real world issues as well as laboratory phenomena. Such real world issues for ACT would include how to improve people’s ability to learn and use language, to learn and remember text, to reason, and to solve problems. This reflects my belief that the final arbiter of a cognitive theory is going to be its utility in practical application. Thus, I am proposing a change in our interpretation of what it means to understand the nature of human intelligence. I once thought it could mean unique identification of the structures and processes underlying cognitive behavior. Since that is not possible, I propose that we take ‘understanding the nature of human intelligence’ to mean possession of a theory that will enable us to improve human intelligence. (15 - 16). Thus there are two aspects of Anderson’s position on the goals of cogni-
tive psychology that we must discuss. First, what are his reasons for believing that “unique identification is not possible”? Second, what are the grounds for believing that “practical application” should become the central criterion of cognitive psychology? Anderson gives three arguments against the possibility of unique identification. The second and third are essentially instantiations in psychological contexts (serial versus parallel processing and imaginal versus propositional
332
Kenneth Wexler
representation) of the more abstract first argument. Therefore, we can concentrate for our purposes on the first argument, though there are interesting issues in the other arguments that we will have to ignore. First Anderson notes (p. 5) that, ignoring physiological data .. our data base consists of recording the stimuli humans encounter and the responses they emit. Nothing, not even introspective evidence, escapes this characterization. Anderson then claims that although the set of possible cognitive theories is not well defined it presumably can include all the types of machines that are studied in formal automata theory... . For any well-specified behavior (presumably including cognitive behavior) there exist many different automata which can reproduce that behavior. Any well-specified behavior can be modeled by many kinds of Turing Machines, register machines, post-production systems, etc. These formal systems are clearly different, and in no way can they be considered to be slight variations of one another. The machines are sufficiently different that it would be very hard to judge whether a theory specified in one was more or less parsimonious than a theory specified in another. Moreover, within a class of machines like the Turing Machines there arc infinitely many models that will predict any behavior... . Within such a class, parsimony could probably serve to select some model. However, suppose our goal is not to find the simplest model but rather the ‘true’ model of the structures and processes in the human brain. How much credibility can we give to
the claim that nature picked the simplest form? Anderson goes on to say that “the preceding argument is the basic justification for the claim that it is not possible to obtain unique identifiability of mental structures and processes”. As noted above, Anderson then illustrates his point with two more substantive issues in cognitive psychology. “These examples, however, do not establish the general thesis of this section. That follows simply from the fact that for any well-specified behavior one can produce many models that will reproduce it” (p. 6). The first point to note about Anderson’s argument is that it is true ofa fields of science, as has been pointed out many times, in different contexts. Namely, in Anderson’s “general thesis” we can simply replace “behavior” by “set of data” and preserve truth, yielding: For any well-specified set of’clatu one can produce many models that will reproduce it. Consider any finite set of observations of a variable. It is obvious that an infinite number of curves can be drawn through the set of data points. Observations of this nature are commonplace in the philosophy of science. Anderson draws the conclusion from such observations that “unique identification” is not possible, and therefore the goals of cognitive psychology cannot be understanding, in the traditional sense, but rather should be “practical application”. But notice that it must follow from the fact that Anderson’s general thesis
Review of Language, Memory and Thought
333
applies to all science that his conclusion must also apply to all science. Namely, since unique identification is not possible, the goals of science should change. Anderson’s conclusion, then, must be that we should give up the attempt to discover true theories. But surely this is the wrong conclusion to draw. The fact that many models will predict any behavior must be taken together with the empirical fact that science has succeeded in a number of cases, despite the existence of many models that predict the data. Thus Anderson’s arguments must be taken as a problem (for the philosophy of science). How is it possible that science can attain good theories despite the existence of numerous models for any set of data? How this problem is to be solved is an active area of interest in the philosophy of science. Perhaps the most generally suggested answer involves the notion of “simplicity” in one form or another. The question still remains: what is simplicity? Anderson of course recognizes the possibility of such an answer, as can be seen in the quote above. He suggests that across classes of machines it would be difficult to obtain a simplicity metric, and that even if a metric could be found, “how much credibility can we give to the claim that nature picked the simplest form?“. Questions of this sort are an active area of exploration in the philosophy of science, and this is not the place to review the literature. To give one possible answer, it may be argued that simplicity is a property of the human mind. There is no reason such a notion of simplicity couldn’t apply to machines from different classes, perhaps even disallowing certain classes of machines as possible theories. The question of nature picking the “simplest form” remains somewhat obscure, but it is quite conceivable that there are aspects of the world that cannot be understood scientifically, and this may be because the theories which explain these aspects are not “simple” in the sense of the human mind. It is also conceivable that different beings could discover correct theories for these aspects of the world. Notice that under this conception of “simplicity” (probably under other conceptions also) particular kinds of data might be such that the human mind will not discover significant theory concerning them. Anderson does not give any argument against the possibility of discovering theory underlying cognitive phenomena that could not be given against any scientific enterprise. But the fact that he makes the “non-unique” diagnosis may be due to a failure in a particular enterprise, information-processing psychology. It is consistent with the understanding of science hinted at above that the sets of data that information processing psychology allows may not be such that the human mind can discover the true theory underlying them. When Anderson says that he is “less interested in defending th,e exact assumptions
334 Kenneth Wexler
of the theory”, presumably we are to infer that the exact assumptions of the theory cannot be defended. It appears that there are no exact assumptions in LMT that Anderson would want to defend, nor, given his beliefs quoted above, does he believe that such defendable assumptions can be found. It seems obvious that this state of affairs, if general, represents a failure of information-processing psychology. It is not easy to see exactly what the reasons for this failure are. Here I can only offer some observations. First, reading Anderson’s long and notationally complex book, one is struck by the fact that there is such a loose relationship between empirical material and theory building. A number of experiments are discussed, but in general these only test small particular points of particular models of aspects of the theory. If the data are not fit, the changes (if any) that the non-fit suggests do not touch deeply on the theory. In short, the theory is a rather broad scheme covering ostensibly a huge array of human abilities. It is so broad that it is almost untestable. To make the theory testable (that is, to make it precise enough that it can generate empirical predictions), a large number of rather particular assumptions have to be added to the theory. In no way are these added assumptions particularly natural or forced by the theory. Nothing in the theory constrains the nature of the new assumptions. In short, the notion of explarzution is missing from LMT. Science does not have to be this way. In particular psychology does not have to be this way. Psychophysics (or the broader field of perception) is a good example. In this field a large number of empirical phenomena guide rich and restrictive theory building (the phenomena are mostly reports of introspective judgments, very much as in linguistics. See Batchelder and Wexler (1977) for a discussion of the similarity of linguistic and psychophysical data). The study of these phenomena combined with intricate analytic theory has led to important successes in the understanding of sensory systems (see Granit (i977) for a review of these). 111psychophysics there ure non-trivial assumptions that can be defended. Another good example is linguistic theory. In many linguistic articles, hundreds of pieces of data are presented and a theory is constructed which is consistent with all of the data. Of course, there are difficulties, and the theory is amended in face of the data. But the output is a set of principles which can be defended (on the basis of the presented data). The theory is not so loose that the data don’t matter. Note that it is not the listivzg of data that is important here (in many cases the data are well-known). Rather, it is the construction of a restrictive theory. Turning now from Anderson’s criticism of “unique identification” to his be taken as a criterion for scientific proposal that “practical application” theories. we can simply note that Anderson gives no argument to suggest
Review of Language, Memory and Thought
335
that such a criterion will work. Secondly, in LMT Anderson (as he observes) gives no illustrations as to how such a criterion might apply; that is, there aye no practical applications discussed in LMT. Our conclusion, then, is that Anderson has neither succeeded in showing that traditional scientific theorybuilding criteria are inapplicable in cognitive psychology, nor has he proposed a reasonable alternative. I. 3. Modularity What then is the source of the lack of progress evidenced in the kind of program that is pursued in LMT? One might be tempted to speculate that it is Anderson’s insistence on practical application that doesn’t allow the pursuit of significant results. However, as noted above, Anderson doesn’t actually attempt to obtain practical applications. Anderson even argues that the practical application criterion can be used to justify more traditional criteria mentioned in philosophy of science (pp. 19 - 20). It is difficult to give a single answer to the question of the cause of the lack of progress. However, I would suggest one hypothesis as to the nature of the problems with a theory like ACT. LMT takes as its domain the entirety of human cognition, phenomena ranging from memory and problem-solving to language. Anderson’s goal is to build a system that will mimic all aspects of a human’s cognitive abilities. It is not only that understanding of these abilities is to be the goal of a general cognitive science (which I would argue is a reasonable definition of such a science). But, much more strongly, one theory is to explain all the phenomena. (Anderson does exclude “perception or other nonsymbolic aspects of cognition” from his subject matter, but he leaves to later research the possibility of dealing with these in ACT terms.) There are chapters dealing with retrieval from long-term memory, inference making, learning and retention of prose, language understanding and generation, induction, and language acquisition. An attempt is made in these chapters to provide ACT
models for these tasks (p. ix). Anderson does write of a “need for abstraction” (p. 21), but what he seems to mean by this is that certain aspects of the data available to a scientist might be ignored. He gives as an example the fact that theories of memory might only predict probabilities and patterns of recall, rather than exactly which items are recalled. I am concerned here, however, with a different kind of abstraction, with what has been called a “modular” approach to the problem (in a different sense from Anderson’s use of “modular”). In this approach entire areas of competence are idealized away from, while the scientist concentrates his theory on a sub-component of human cognitive abilities. If it is in fact the
336
Kenneth Wexler
case that different explanations underlie different abilities, then this approach might yield results whereas Anderson’s won’t. Anderson is attempting to build a theory which will simulate all of a human’s cognitive abilities. In general the goal of research in artificial intelligence is to construct programs which will simulate a human in a “real” task. At the same time Anderson desires practical results. These considerations, it seems to me, taken together might predict the course of the research. To compare this strategy to a case from the history of science, we might suppose that Newton, instead of attempting to discover the laws of motion, had attempted (at least schematically) to build a rocket which could fly to the moon. This supposed case has two features in common with Anderson’s attempts. First, it is a “practical” application. Second, it is concerned with “simulating” a “real” task. After all, the laws of motion are only a small part of what it takes to get a rocket to the moon, and they are “idealized” so that they would have to be modified when they apply, anyway. What would have happened if Newton had made this attempt? He would surely have gotten bogged down in any one or more of a host of difficult problems. For example, the construction of the appropriate materials, the search for the principles of engine design, the attempt to discover a proper fuel, problems in the theory of heat and so on. Had this consistently been Newton’s goal, we might expect that he might never have discovered the laws of motion and, in fact, may never have achieved scientific insight. Yet it seems to me that Anderson’s goals (and that of much of artificial intelligence research) are comparable to those of the imagined Newton. The results obtained and obtainable might also be similar, on this kind of analogy. Or consider Mendel’s Principles, which ultimately led to some of the greatest scientific advances of this century (molecular biology). Mendel’s Principles by no means explained (or simulated) the entire development of plants. For example they do not describe exactly how nutrients in the soil affect the plants, or how the process of photosynthesis takes place. Nevertheless the principles, for all their abstraction from the entirety of the process of development, are powerful scientific principles. I. 4. Restrictiverms
of‘ the Theory
A major criticism of LMT is that there are simply no principles of any explanatory power that emerge from the work, nor is there any reason to believe that such principles will be found if more effort along the lines of LMT is made. The lack of principles may be attributed to defects in both empirical methods and theory. On the one hand, particular assumptions are introduced with essentially no evidence. On the other hand, the theory of LMT is con-
Review of Language, Memory and Thought
337
strutted so as to allow for as wide a range of models as possible, instead of the opposite approach, which seems necessary to rational theory building. These two issues are not unrelated. Since almost any model is compatible with the theory of LMT, empirical evidence cannot be relevant. Anderson attempts to prove (p. 141) that ACT can mimic an arbitrary Turing Machine. (Recall that a Turing Machine, according to Church’s Thesis, characterizes the notion of what calz be effectively computed.) Anderson writes (p. 144) that we can say that ACT predicts that humans are capable of performing any task a TM can perform with the following qualifications of memory, speed, distraction, and random error. As far as it goes, this seems an accurate prediction about human abilities. What is important and yet undecided is what ACT predicts about the relative difficulty of various tasks in terms of processing times and error rates. Deriving a characterization of performance limitations is the traditional task of psychology and it will occupy much of the remainder of this book. Actually, no kind of characterization of “performance limitations” in the relevant sense is carried out in LMT. Also, notice how misleading it is to say that “humans are capable of performing any task a TM can perform”, subject to memory and time limitations. The only kind of evidence I can find in LMT that relates to this claim is the notion that for a sufficiently simple Turing Machine, an intelligent and instructed human can look at the table definition of the Turing Machine, and make computations according to this. Anderson reports (p. 10 1) that this is possible, using himself as subject. and a simple 2 state, 2 input machine (Fig. 3.3) as an example. Anderson can mimic the machine “in his head” but “eventually I lose track of my position on the tape, the contents of the tape, or both”. Presumably for more complicated TM’s, even the original commitment of the transition table to memory would not be possible. In this sense a human can perform any task a TM can perform, subject to memory and time limitations. But notice how little insight this claim gives us into what humans actually are good at, what kinds of cognitive processes and behaviors come easily and naturally to them. Within this TM (or ACT) framework, the only aspect of a task that can make it difficult for a human is the fact that it will tax his memory (or perhaps, with added assumptions, go along too quickly for him). Thus within this framework, aspects of memory and the speed of a particular process will be the only kind of construct to study. However, suppose that a human actually is constructed so that he will do a particular task very well, and that he will not do this task according to a Turing Machine transition table. The way that the human is constructed will allow him to do the task without overtaxing his memory. But the task may
338
Kenneth Wexler
be complicated enough so that when a Turing Machine table is constructed for the task, a great deal of memory is needed. Since Anderson is operating in a framework in which only memory and time limitations distinguish between the possibility and impossibility of a human’s doing a task, he fails to characterize human abilities in any other terms. Consider one of Anderson’s own examples, the Sternberg paradigm. In Section 3.3 Anderson shows (following Newell) how a “Production System” can be written which will do the memory task that a human does, and in Section 4.3, Anderson shows how an ACT system can be written which will also do the memory task. But these demonstrations do not support either Newell’s Production System theory or Anderson’s ACT, because if a subject behaved quite differently (e.g., refused to co-operate, said the Gettysburg address, etc.) a production system or ACT system could be written which would do just that behavior. ACT can model not only the Sternberg result, but also its opposite, or anything else of the sort. There is no explanatory power in ACT because there are no restrictions on human abilities. Suppose, as an analogy, that somebody presented a theory of gene action and of inheritance in which he first claimed that his theory would allow humans to have children who were humans. We would immediately ask, however, if there were any favored status in the theory to the prediction that humans would have human children, or whether humans could, so far as the theory specified, have other animals as children. If there were no favored status to the former prediction, we would immediately reject the theory, or consider it vacuous. Such is the case with ACT. Since, to the extent that it is specified, any behavior is allowable (subject to memory limitations) we should have no confidence that it accurately reflects human abilities. My remarks should not be confused with an argument that we should study only models of particular experiments. Breadth of coverage is an important goal. Especially important is the construction of theories which explain major, difficult problems, such as the problem of language acquisition or language comprehension. These are broad problems. But theories must be restrictive if they are to explain these broad problems. ACT does not contribute to our understanding, say, of how it is possible that language can be learned.
Evidence So far as I can see, there are four kinds of evidence that LMT discusses that are relevant to any of its theoretical claims. These are (i) experimental evidence, (ii) the type of data used by linguistic theory, (iii) the possibility
Review of Language, Memory and Thought
339
of producing running programs that accomplish difficult cognitive tasks, and (iv) Anderson’s intuitions about theories. In general, the lack of restrictive power of ACT makes the evidence non-compelling. 2.1. Experimental
Evidence
Anderson mentions a large number of experiments in LMT. Basically these do not contribute to explanation, and they cannot, because the theory is too loose to make any kinds of predictions which cannot be falsified without demanding that the theory change. There are too many experiments to allow discussion here of even representative ones. As an illustration, I will mention only one, selected on the grounds of its being short and simple to explain. This experiment is an example of one in which a prediction from the theory is not realized empirically. Yet the theory stands. Given this kind of example, the cases where the predictions are met by the experiment become of less interest. The experiment (pp. 164- 165) concerns the subject-predicate distinction, which Anderson claims is a fundamental assumption of ACT. Anderson first notes that in SVO sentences “the verb and object are closer to each other than either is to the subject”. He writes that a prediction is that subjects should be faster retrieving a connection between verb and object than between verb and subject because the former are closer together. This can be tested by presenting the subject with pairs of words and asking them to decide whether the pair came from the same sentence. ACT would seem to predict that subjects should be faster making this judgment about verb and object than other pairs. In unpublished experiments I have failed to find evidence to support this.
Anderson goes on to say why “it seems naive to have expected such effects”. His reason is that subjects “set up” much more than a simple representation of the sentence, but expand and elaborate on the sentence “in many unspecified ways, introducing all sorts of extra and uncontrolled connections among elements”. For example, they create scenarios built around the sentence. Note that these extra sentences have introduced extra connections between each pair of elements in the sentence. Clearly a proximity metric based on supposed closeness of object and verb is going to be worthless if a subject goes through anything like this elaborative process. Since subjects do, I do not think one can use sentence recall data to directly decide issues of representation.
This last sentence, about the usefulness of certain forms of data in making particular theoretical decisions seems reasonable, but in LMT this isn’t taken as a general lesson. What is important to notice here, is that, despite the total
340
Kenneth Wexler
failure of the experiment, the theory (ACT) remains intact, without even new assumptions having to be added. The point is that the prediction was made only with the tacit assumption of a large number of other principles (e.g., a model of how the subject responded in the experiment given a certain representation of the sentence). The general point here is very important. The major experimental measure in LMT is response time. This is a measure which is so dependent on such a large variety of interacting processes in any complex cognitive task that it may be impossible to use it as the cornerstone for the building of a general cognitive theory. Yet this is precisely what Anderson advocates (p. 20). Of course, prior to the attempt, there is no way of knowing for sure what methods will work in attempting to discover a good theory. Thus it is important to point out that information processing theory, basing itself on response time studies, has not succeeded in discovering the stages, etc. of cognitive processing. This is especially true of the more complex cognitive tasks with which LMT is concerned. But even on simpler, less characteristic human tasks, the claim might be substantiated. For example, the Sternberg experiment, which Anderson points out is one of the best researched tasks in cognitive psychology, is the subject of a large amount of controversy and equivocation. The early results led to the hope (among some at least) that here was a clear example of interesting results for cognitive psychology, that is, non-intuitive results which could not be predicted on the basis of a simple model on an efficient processor. But this interpretation of the results has been extensively criticized within cognitive psychology (For methodological criticisms see Taylor, 1976). The experiments have led to further theorizing in, for example, memory retrieval (e.g. Ratcliff, 1978). Whether reaction time studies will lead to a correct specification of stages for these simpler processes remains to be seen. Whatever the status of these results, I know no response time results with respect to, say linguistic tasks, which have succeeded in specifying the stages of processing. It is interesting to note that when Anderson wants to build a theory that does some psychological work that makes contact with human linguistic abilities, he doesn’t build his theory on response time data (which, we have just argued, it might be impossible to do). Rather, he takes over (modified, and with much of the content missing) structures from linguistic theory. To take one of a large number of examples, none of the 17 productions for analyzing declarative sentences in Table 11.2 has been proposed because of response time data, so far as I can find in LMT. Nor is there any response time data to test the productions.
Review of Language, Memory and Thought
2.2. Linguistic
341
Evidence
Anderson often attempts to use non-experimental evidence, of the type used in linguistics. However the results obtained from this kind of data are as unconvincing as those using experimental data, and for similar reasons. Basically, the notation is so universal that any particular fact can be accommodated in a large variety of ways. Also there is no attempt to handle a variety of facts, or of pushing for a compelling explanation based on fact. This problem could be illustrated with a very large number of instances from LMT. Once again, I choose an example only because of the ease with which it can be introduced. This is what Anderson calls “semantic checking” (pp. 468-469). Anderson notes that his grammar will accept many “semantically anomalous constructions”. The examples that he gives are “Mommy received and “the two red ball” Daddy from a red ball”, as one kind of “anomaly”, or “a large balls”, as another kind. The parser described in LMT will build a network structure for these phrases. Thus “to detect such anomalies it is necessary to have productions that will check the network structures built by the parsing routine”. Of course, it is straightforward to write ACT productions which will handle a particular fact. Thus the production that will detect a contradiction for “the two red ball” and “a large balls” is given as (VI#l)
& (VI#plur)
* (VI *contradictory)
Numeral, adjective and determiner modifiers of a noun become conjoined representation. If one of them is properties in the network “semantic” “singular” (ball or a) and the other is “plural” (two or balls), the production above will interpret the phrase as “contradictory”. But this is one fact, and ACT can of course represent any such fact. There is nothing surprising about this. The Production above is simply restating the fact in a particular notation. But there is no further development of semantic checking. There is not even an extension to phrases very similar to the ones that Anderson mentions. For example, what kind of modifier does the add? Why is “the large balls” possible, whereas “a large balls” is not possible? Since “the large ball” is also possible, perhaps we could take “the” as adding neither “singular” or “plural”, but as being, say, “neutral”. But then what is the difference in “the large balls” and “large balls”, which can easily be shown to be semantically not identical? If “the” simply adds no “modifier”, why can we say “large balls are on the desk”, but not “large ball is on the desk”? All these well-known questions (and many more simply about these phrases) are not even considered in LMT, nor is there any reason to think that the answers will be found within ACT. Of course, any particular linguistic fact can be stated in ACT. But there is no reason to accept the particular ACT formalisms for these facts. More generally, we can have no
342
Kenneth
Wexler
confidence that all “anomalies” can be “detected” by simply adding ACT productions. Is there any reason to think that we won’t need an infinite number of such productions? (Clearly there are an infinite number of “anomalies”). All this goes undiscussed in LMT, and for good reason. There is nothing about ACT that provides answers. Thus there is no reason to think that the kind of use of linguistic evidence that we see in ACT will be helpful in supporting the theories presented in LMT. Anderson’s tendency to introduce quantities of theoretical entities with little care is perhaps most apparent when he discusses language processing, for here he has access to the linguistic tradition, with its rich set of concepts. Thus Noun Phrase becomes a theoretical entity, as do many other familiar categories. But the particular structures which Anderson introduces are, so far as I can tell, arbitrary. Exceedingly few examples of sentences are discussed, so that we have little idea of how the sets of productions could be expanded to appropriately handle a large number of sentences. And the structures that m-e introduced are introduced without evidence or argumentation as to why they are appropriate. Why, for example, is there a Noun Topic node which includes Adjectives and the head Noun, with no other structures? To take another example out of a very large number that could be mentioned, the only illustration of LMT’s “Production System for Generating Noun Phrases” (p. 484) thatisgiven is the sentence “Daddy received the red ball from Mommy who was in the room.” Now, as is well-known, if the relative clause in this example is restrictive, the sentence is ungrammatical. Anderson doesn’t recognize this, or discuss whether the relative clause is restrictive, though presumably this is the desired interpretation. He writes that “As in the production system for comprehension, the generation productions for the relative clauses have been completely omitted for the sake of brevity in this book”. IfMommy is replaced by a noun phrase which is not proper in the above sentence, then it becomes grammatical. I can find no place where this becomes relevant in Anderson’s productions. The general point is that a good deal that is known and theorized about in linguistic theory is simply ignored in LMT, while many linguistic constructs are taken over in an arbitrary and ungrounded way. Semantic interpretation, of course, is a crucial part of any language processor. Anderson wants his propositional representations to adequately capture semantic interpretation, and he gives arguments, for example, why aspects of ACT’s representations are better than those allowed in HAM. Anderson’s concern for semantics is certainly more sophisticated than what is usually found in the information processing literature in psychology. But nevertheless it is clear that the semantic formalisms delineated in ACT are essentially taken over (with sometimes different notation) from the literature in philosophy and linguistics, with much lost in the translation. Many
Review oflanguage, Memory and Thought
343
well-known problems and sometimes even solutions are lost. Anderson (1977, p. 136) refers to “non-meaning-bearing morphemes like the and who” and (p. 143) he writes that “... noun phrases consist, optionally, of some initial non-meaning-bearing morphemes...“. Anderson might have some notion in mind of how to define “meaning-bearing” so that these references were reasonable, but he doesn’t say what it is. And, as I earlier pointed out there is no mention in LMT of how to distinguish, say, between a and the, nor is there any mention of the problem. Another example: at first in Anderson’s discussion, it looks as if only extensional adjectives are analyzed, so that an alleged criminal would have to be a criminal. To overcome these kinds of problems, Anderson says that “nodes” can also be “concepts”. He analyzes (p. 243) “John is looking for a so that unicorn is a “classification”, but of course, (following unicorn”, a standard philosophical criticism), John isn’t loooking for a classification. He adds productions (somewhat similar to Montague’s meaning postulates) to make particular inferences. Anderson’s suggestions barely begin to scratch the surface of philosophical thinking on these problems. We should not expect Anderson to solve the difficult problem of semantics. My point is simply that much of what is known in semantic theory is lost in LMT and many of the problems that are at least sharply delineated in semantic theory are not even mentioned in LMT. Anderson’s answer to all this could be that he’s not really interested in semantic interpretation, but is interested in the processes of sentence comprehension. But if his processes do not yield correct interpretations (i.e., interpretations that human comprehenders make), then what evidence do we have that his processes are correct? What is the point of developing a formalism that isn’t up to the standards of the field of semantics, unless something scientifically useful is done with the formalism? 2.3. Running
Programs as Evidence
Anderson himself distinguishes cognitive psychology from artificial intelligence (p. 1) and argues why the creation of programs is not the goal of work in cognitive psychology. Nor are running programs and their features discussed very often in LMT.Anderson also gives reasons (pp. 124- 125) why the creation of running programs which give adequate predictions of behavior is not possible, at least for the present. Thus, (iii) is not a relevant kind of evidence for evaluating ACT. Dresher and Hornstein (1976) suggest that the major problem with the output of artificial intelligence research as a cognitive theory is that general principles are not presented, as is necessary in scientific research. This criticism is not generally relevant to LMT, which does state principles (admittedly lacking “total rigor and precision” (p. 124). Anderson
344 Kenneth Wexler
quite cogently writes that “they are better than a listing of the program itself’. On the other hand, these principles cannot be used to derive precise predictions, as Anderson realizes (this point is also relevant to (i)). Anderson writes (pp. 174-l 75) ... to derive predictions one must make many ad hoc assumptions about the exact structure of the memory network and about the exact set of productions available, since the predictions depend on these details. Thus, before the computer simulation program could be a truly effective predictive device, one would have to develop a complete and explicit set of principles for specifying the initial structure of the program. Deriving such a set of principles would not be easy.
2.4. Intuitions
as Evidence for ACT
We are thus left with (iv) Anderson’s intuitions about how the human works as a cognitive processor. In general Anderson lays great stress on his “biases”. LMT is to be credited with the honesty with which this fact is asserted. However, while these biases and intuitions may be the sozlrce of Anderson’s theories, they cannot provide evidence for them. These intuitions must be clearly distinguished from the kinds of introspective judgments on which the linguist bases theories. These latter judgments are taken as evidence (like psychophysical judgments, as I have already pointed out), for which abstract theories may be created. Anderson’s intuitions or “biases”, however, are about the elements of the abstract theory. Anderson’s use of introspection sometimes causes severe problems with his use of evidence. For example, as noted earlier, Anderson claims that when he himself tries to “mimic” a Turing Machine, he falters as the transitions take place (pp. 1Ol- 102). He then writes that This strange mental exercise serves as a clear refutation of the psychological validity of any PS Production System which can be shown to perfectly mimic a TM . .. a fundamental problem with the current PS models is that they offer no account of loss from long-term memory. A PS system augmented with the feature of a forgetful memory would pass the empirical test at hand. Such a PS would predict that a human could simulate an arbitrary TM, but with errors. From self-observation I am led to the conclusion that this prediction is correct.
But note that Anderson proposes a number of processing mechanisms which would also fail this “test”, perhaps in a far clearer manner. To take just one example, Anderson proposes (p. 464, Table 11.2) a set of 16 productions for “analyzing declarative sentences”. A glance at the table should convince anyone that it would be difficult to entice a human to “mimic” this table, in the manner prescribed by Anderson for this Turing Machine “test”. In fact, in this case, it would be difficult to even entice an ordinary human
Review of Language, Memory and Thought
345
to understand or memorize (in an understanding way) the set of productions, which have to perform such tasks as creating nodes. There is no reason to think that Anderson would not agree with my judgments. Yet he (and I) would not conclude that this was “clear refutation” of this set of productions as a process model for analyzing declarative sentences. The point is that Anderson created this system because he knew that certain kinds of declarative sentences existed, and that humans could process these sentences. Furthermore he had certain elements of ACT that he wanted to use in creating the model. The fact that humans can’t “mimic” this system does not prove that they don’t use such a system in understanding language. Rather, it suggests that the kind of “learning” that goes into such mimicking tasks may be quite different in kind from the kind of learning that creates representational and procedural models for natural language in the human. I do not mean by this discussion that Anderson’s set of productions is the correct model for analyzing declarative sentences. There are a large number of well-known kinds of sentences that can’t be analyzed by the productions, and the evidence for these particular productions is not compelling. Criticisms could be made of almost the entire discussion of these productions, but these will be clear to any reader and would take us too far afield here. To consider only what appears to be his strongest evidence, Anderson shows how his model predicts particular difficulties in comprehending embedded sentences. He writes that (p. 470) “According to ACT, subjects will experience processing breakdown because they must interrupt one subroutine to call the same subroutine”. To my knowledge, this explanation with the difficulty of these sentences was first suggested by Miller and Chomsky (1963). Anderson does add a reason why this constraint follows from ACT. Name!y, In applying the embedded routine, the variables of the routine will be filled in with new values. The values of the first call to the subroutine will then be destroyed. It will not be possible to perform any subsequent operations in the first call to the routine if they depend on the value of these variables.
But such a constraint might be part of any process theory, as Miller and Chomsky suggest. There is nothing in the structure of the particular productions that Anderson proposes that aids the explanation, or nothing unique to ACT, at any rate. The claim that relative clauses and sentential complements to nouns involve different subroutines is part of the explanation, but this is standard in many versions of linguistic theory. In fact, as Anderson points out, his explanation runs into trouble on doubly embedded complement clauses. His explanation for this is unsatisfactory. Anderson’s theory, in fact, is a retrogression from a well-known existing theory, since it simply
346
Kenneth Wexler
accepts (in a distorted form) one part of the existing theory (the subroutine idea), but leaves out the other part (having to do with short-term memory). Anderson thereby makes an error, accounting for less than is known. For a clear discussion of the better existing theory, see Dresher and Homstein (1975, pp. 386-390) which is based on Miller and Isard (1964), which is an elaboration of Miller and Chomsky (1963). The point here is not that ACT is wrong, but rather that none of the evidence depends on it. ACT has no power here, and there is no evidence for the particular set of productions that Anderson proposes. Summarizing the discussion of the use of intuition and introspection in LMT, it seems that these are not useful sources of evidence for the theories that are presented. 2.5. Summar_v of Evidence
Reiterating again the general problem with ACT, it is simply so weak that there is no way to find evidence either for or against it. Any phenomena can be represented in ACT, not only the phenomena that turn out to be empirically true, but those that are false. Sometimes particular models which do constrain the data are stated. But these models constrain only small, tangential aspects of the theory, so that the theory ceases to deal with the complex cognitive phenomena which it is supposed to deal with. This is true, for example, of the experimental evidence for particular assumptions. In addition, often the data are so weak (even compared to current standards, as in the case of linguistic phenomena) that even if the theories were stated more stringently the evidence could not be used to sustain the theory.
Representation
and Processing
Given that the critique of LMT that we have discussed would also hold (and even more strongly in many cases) of much of the information processing approach to problems of cognition, and in particular, of language performance, a natural question is: what is to be done? What kind of change in direction would be more likely to produce significant results than present directions do? Of course, one cannot give definitive answers to a question like this, but can only try to understand the core of the problem and to propose lines of inquiry which reduce that core. In this regard, it seems to me that the core of the difficulties with LMT is the over-all weakness of the representational aspects of the theory, the question: how is human knowledge to be represented? As we have discussed, the representational theory of LMT has no content, and there is no methodology in LMT for providing
Review of Language, Memory and Thought
evidence which can lead toward sentational theory.
making a richer and more adequate
347
repre-
3.1. Weakness of ACT’s Representations Anderson agrees that the network representations have no empirical force. He writes (p. 148) “Given this liberalized conception of a network it is hard to see what strong empirical claims it makes . . . . Network representations just amount to convenient notations for representing knowledge”. Further (p. 149), “I doubt that these representational assumptions can be shown to be empirically correct or wrong”. One might wonder as to why Anderson (and other information processing psychologists) in general pay so little serious attention to representational assumptions. There are most likely several explanations for this fact. Here I would just like to discuss one interesting observation which seems‘germane to the present case. In studying the problem of language processing (including acquisition, comprehension, production), it seems almost obvious that the use of a strong representational theory would be very helpful. In particular, such a theory exists for syntax. Why isn’t it used? It seems to me that the answer to this question is simply that Anderson has a preconception that the syntax of a natural language cannot be directly constrained. This preconception, it seems to me, underlies the methodology of LMT. It is stated most explicitly in Anderson (1975, p. 345). It seems hard to specifyanystrongpropertiesthat directly constrain the syntactic form of a natural language. It seems these constraints only come indirectly by making reference to semantic information. These constraints and potential generalizations often seem obvious when pointed out. Therefore, I find myself attempting to discover and formalize the obvious. What is remarkable is how slow a task this is proving to be. It is unfortunately not the case that we are naturally aware of the powerful constraints that shape the language that we speak. It seems that it is only in attempting to simulate the language-learning process that I am coming to understand what is a natural language. Of course there is a large literature (which, in the opinion of many), is relatively successful, compared to other approaches to the study of language) which proposes, provides evidence for, and discusses “strong properties that directly constrain the syntactic form of a natural language”. Yet Anderson states that such syntactic constraints don’t exist without giving a shred of evidence or argument for his position. It is somewhat as if someone said in 1975 (the date of the last quote from Anderson) that the structure of molecules could not have anything to do with constraining genetic inheritance. Unlike the constraints that Anderson claims he is discovering, these syntactic
348
Kenneth Wexler
constraints (in the literature) are not “obvious when pointed out”, but, rather, are non-trivial hypotheses which may be right or wrong. Is it any wonder that Anderson says of his own attempts, which ignore serious research in the field, that it “is remarkable . . how slow a task this is proving to be”? In LMT, Anderson (p. 5 11) realizes that he will have to account for unacceptability judgments. He does this by assuming that a sentence is judged unacceptable if the theory cannot map the sentence into a “meaning”. Anderson thereby ignores the considerable set of data in the literature which demonstrate that there are ungrammatical sentences which have a perfectly clear semantic reading. Of course, Anderson could carry out his program by adopting a theory with a syntactic part, say, and then stipulating that any sentence which has been designated ungrammatical by the syntax will not be interpreted semantically. But this move, of course, would be unacceptable to Anderson, since syntactic constraints would then be stated “directly”. I should also point out that none of this program for explaining unacceptability judgments is carried out, nor is there any reason to believe that it could be, for the above and other reasons. 3.2. Language Acquisition With respect to language acquisition, Anderson has accepted one insight which most other researchers in information processing psychology have not. That is, that a useful strategy for research might be to discover a theory that can explain how language can be learned, concentrating on the fact that a child is successful. However, even here his results are vitiated by the lack of a contentful representational theory. Language acquisition is discussed in Chapter 12 of LMT, which includes a discussion of some known formal results in the theory of language learning (or “induction”), with some slight variations. More details on a particular theory of language acquisition may be found in Anderson (1975, 1977). (Anderson conceives the focus of the theory in these papers to be somewhat more on the learning of a second, rather than first, language.) I do not have the space here to discuss in detail Anderson’s theory of language acquisition, except to point out that Anderson reduces the problem mostly to the problem of learning word classes. Almost all of the difficult and complex constructions of language which are the subject of intensive research are ignored, and there is no way to see how Anderson’s theory can relate to them. Also, although Anderson (1977, p. 13 1) writes that “in describing a language learning program it is important to specify exactly what that program can learn”, and he conjectures that his system can learn any context-free language, he writes (1977, p. 155) that it is “impossible to
Review of Language, Memory and Thought
349
provide anything like a formal proof of the conjecture”. There is very much more that should be discussed about this attempt at a language acquisition theory. Anderson’s understanding of the formal problem makes one want to take the attempt seriously. Yet there are so many points which seem either wrong or irrelevant that the theory’s usefulness even as an attempt in this direction cannot be rated very highly. The problem of language acquisition is an incredibly difficult and important problem. Yet many scholars tend to believe it simple, or even solved. Although Anderson’s theory (LAS) ignores so much of language, and does not have much of theoretical or empirical interest to commend it, he ends his paper (1977) by writing that “the weakness of LAS .. . is sufficiently minor that I am of the opinion that LAS-like learning mechanisms, with the addition of some correcting procedures, could serve as the basis for language learning”. 3.3. Toward a Theory of Performability It is thus natural to suggest that progress might be made in processing theories if the representational assumptions were more serious, that is, if these assumptions were part of a contentful theory which rested on theoretical and empirical evidence. In this regard we might take note of a partial theory of language acquisition (or language learnability) which has been developed (Wexler, Culicover and Hamburger, 1975 and references given there, Hamburger and Wexler 1975, Wexler 1977, 1978, Culicover and Wexler 1977; see Wexler and Culicover, forthcoming, for the most complete and up-to-date survey). In this theory the problem is taken to be how the representations which have been proposed by linguistic theory (on the basis of serious evidence) can be learned. The output of the theory provides insight into both representational and processing assumptions. Now, such a theory can probably not be developed at present for all the areas of interest of LMT, for most of these areas do not as of yet have a serious theory of representation (for example, memory for factual knowledge, or problem solving). But some of the topics of LMT might be so studied. For example, the problem of language comprehension, in some of its aspects, must be intimately tied to the theory of representation of linguistic knowledge. As an analogue to the theory of language learnability we might have a theory of language performability. Such a theory would ask: how can the structures uncovered by linguistic theory be processed by a processor which had constraints of the kind that it might be reasonable to suppose that humans had (limitations of memory, etc.)? There is evidence that humans can process these structures, in general. Therefore the theory must allow for their processing (and subject to the kinds of constraints on memory and time, etc.
350
Kenneth Wexler
that are empirically correct). But it is important that the entire representational theory (or at least a major sub-segment of the theory) be investigated with the goal of insuring that the performability theory could process any representational model allowed by the theory. There are at least two reasons for this. First, it is trivial and without interest to construct a processor which can process a particular sentence or structure. Second, the kinds of abstract and general properties of structures that humans are capable of processing only emerge when a sufficiently detailed and general theory has been articulated. Thus in the area, say, of syntactic processing we are at a point in our research efforts where it may be possible to obtain significant results (e.g. Marcus, 1977). In other areas (e.g., memory for facts, problem solving), a useful strategy might be to seriously pursue representational theories of these domains. (For some efforts in this direction see Osherson’s (I 977) work on deductive inference.) A prerequisite to this study would be the acceptance of the methodological principle that separate cognitive abilities could be studied separately. Once again, there is no way of being confident about such matters, but the results of information processing psychology to date lead us to look for new directions in the pursuit of insight into these important and difficult issues.
References Anderson, John R. (1975). Computer simulation of a language acquisition system: A first report. Information Processing and Cognition ~ The Loyola Symposium, ed. by Robert Solso, Hillsdale Erlbaum, pp. 2955349. Anderson, John R. (1977). Induction of augmented transition networks. Cog. Sci.. I, 1255157. Anderson, John R. and Gordon H. Bower (1973), Human Associative Memory, Washington, Winston & Sons. Batchelder, William H., and Kenneth Wexler (1977). Suppes’ contributions to the foundation of psychology. Social Science Working Paper #13 1. University of California, Irvine. To appear in kadu Bogdan, ed., Patrick Suppes, Reidel, Dordrecht, (forthcoming). Culicover. Peter W.. and Kenneth Wexler (1977). Some syntactic implications of a theory of language learnability. .FomaZ syntax, ed. by Peter Culicover, Thomas Wasow and Adrian Akmajian, New York, Academic Press, pp. 760. Dresher, B. Elan and Norbert Hornstein (1976). On some supposed contributions of artificial intelligence to the scientific study of language. Cog., 4, 321-398. Granit, Ragnar (1977) The Purposive Brain. Cambridge, MIT Press. Hamburger, Henry and Kenneth Wexler (1975). A mathematical theory of learning transformational grammar. J. math. Psychol., 12, 137-177. Marcus, Mitchell P. (1977). A Theory of Syntactic Recognition for Natural Language. Ph.D. Thesis, MIT, unpublished. Miller, George A., and Noam Chomsky (1963). Finitary models of language users. Handbook of Mathematical Psychology, Vol. 2, ed. by R. D. Lute, R. Bush and E. Galanter. New York, John Wiley.
Review of Language, Memory and Thought
351
Miller, George A., and Steven Isard (1965). Free recall of self embedded English sentences. hzfornz. Control., 7, 292-303. Osherson, Dan (1977). Natural connectives: A Chomskyan approach. J. math. PsychoI., 16, l-29. Ratcliff, Roger (1978). A theory of memory retrieval. Psych. Rev., 85, 599108. Taylor, David A. (1976). Stage analysis of reaction time. Psych. Bull., 83, 161-191. Wexler, Kenneth (1977). Transformational grammars are learnable from data of degree Q 2. Social Science Working Paper #129. University of California, Irvine. Wexler, Kenneth (1978). Empirical questions about developmental psycholinguistics raised bv a theory of language acquisition. Recent Advances in the Psycholigj of Language, ed. by Robin N. Campbell and Philip T. Smith, New York. Plenum. Wexler, Kenneth and Peter W. Culicover (forthcoming). Formal Principles of Language Acquisition. Cambridge, MIT Press. Wexler, Kenneth, Peter W. Culicover and Henry Hamburger (1975). Learning-theoretic foundations of linguistic universals. Theoret. Ling., 2, 2 15-253.
Cognition, @Elsevier
6 (1978) 353-361 Sequoia S.A., Lausanne
Discussion - Printed
in the Netherlands
The linguistic interpretation of Broca’s aphasia A reply to M.-L. Kean HERMAN
H. J. KOLK”
University
of Nijmegen,
The Netherlands
In a very thorough paper, Kean (1977) recently has presented a linguistic model which, she claims, explains all the features of the syndrome of Broca’s aphasia, especially their agrammatism. The basic assumption of this model is that at the phonological level, a sentence is described as a string of phonological and non-phonological “words”. A phonological word is defined as “the string of segments, marked by boundaries, which function in the assignment of stress to a word (in English)” (p. 22). More specifically, it is “the domain over which the assignment of stress takes place” (p. 24). As an example, Kean gives the word d&finite. In the compound definitive, d&finite is not a phonological word. The domain of stress assignment is the whole compound definitive, because the suffix -ive affects the stress pattern ofdefinit(e). On the other hand, d&finite is a phonological word in d&finiteness, because the suffix -ness does not affect the stress pattern of the word it is added to. In Kean’s notational system, phonological words are flanked by [ # and #] ; no other #‘s may occur between these boundaries. The process of segmentation, by which these word boundaries (#) are put into the sentence is called “lexical construal”. As evidence that such a process actually takes place, Kean refers to the analysis of speech errors (cf., Garrett, 1975). For instance, when someone utters I’m in the dance for mooding instead of I’m in the mood for dancing, then the fact that mood and dance exchange, while -ing is left behind, indicates a construal of dancing as [ #[#dance#] ing#]. From her discussion of these errors it becomes clear that relevance to the assignment of stress is not the only criterion for this construal process. For instance, an error like my frozers are shoulden (intended: my shoulders are frozen) suggest a construal of shoulders as [#[#[#should#l er#l s#l . Such a construal is possible because the suffix -er is a real one (like in dealer). Her discussion of this hypothetical process is not very systematic. In particular, it is unclear how the different criteria work together in this construal process, that is, if they are in a conjunctive, disjunctive or compensatory relation to each other. *This paper was prepared in part while the author was a Visiting Research Fellow at the Veterans Administration Hospital in Boston (U.S.A.) supported by the Netherlands Organization for the Advancement of Pure Research. Reprint requests should be sent to Herman H. .I. Kolk, Psychologisch Laboratorium Erasmuslaan 16, Nijmegen, The Netherlands.
354
Herman H. J. Kolk
Now, a Broca’s aphasic “tends to reduce the structure of a sentence to the minimal string of elements which can be lexically construed as phonological words in his language” (p. 25). So, a Broca’s aphasic tends to omit nonphonological words. In the first place these are the inflectional endings; they do not affect the stress pattern of the words they are attached to. Secondly, function words can also be classified as non-phonological if one thinks of the stress pattern of the sentence as a whole: function words do not affect this pattern. Inflections and function words are indeed often absent from an agrammatic sentence. Kean gives five major arguments to support her claims. Three are supposed to favor a phonological hypothesis in general, while the other two relate to the specific model. Each of these arguments will be described and then evaluated. (1) “The phonological component of the grammar . . . specifies [al the segmental sound shape of the individual words . . . [b] stress and intonation patterns of the words in a sentence and of sentences as a whole”. (p. 15). “ . .. we [are] making the claim that there is a phonological deficit in Broca’s aphasia .. . [and] that this impairment is distributed across the entire domain of phonology” (p. 41) “... [Therefore, the] two functions of the phonological component [referred to above] provide a natural context for explaining both the segmental paraphasias and the agrammatism . . . [as it is thought to be related to stress] ” (p. 15). As a general point in favor of some phonological explanation, this argument certainly has its value. The only problem with it is that segmental paraphasias are not at all specific for Broca’s aphasia, but frequently occur with other types as well. Nevertheless, one can maintain that one should give a particular phenomenon (like segmental paraphasias) different explanations in the context of different syndromes. A much more serious difficulty arises, however, if one poses the question to what extent this argument supports her specific phonological model. As her model is now, the phonological deficit is not at all “distributed over the entire domain of phonology”. Particularly, the second function of the phonological component (assignment of stress to words and sentences) is in itself not impaired. It is only stated that the phonological structure of the sentence is reduced according to a criterion which is related to stress-assignment. Furthermore, the dysprosody which is often observed in Broca’s aphasics is assumed to be only an effect of the typically low rate of speech. Without, by the way, referring to any data, she states: “When a Broca’s aphasic speaks at a (near) normal rate over the duration of a phrase or a sentence, then normal intonation is present...” (p. 32).
The linguistic interpretation of Broca’s aphasia
355
(2) “The phonological component of the grammar must provide a phonetically unambiguous interpretation of how sentences sound. [Chomsky and Halle (1968) make a distinction between] . . . two levels of segmental representations: the level of lexical representations (where e.g., the generalization that the p’s in ‘pan’ and ‘nip’ are the same is held), and the level of phonetic representations where the presence or absence of the properties of speech sounds is specified in degrees (and the two p’s have different representations). If there were a true phonoZogicaE deficit, we would expect both these levels of representation to be affected”. Thus a Broca’s aphasic is expected not only to make literal paraphasias (e.g., saying ‘tine’ for ‘time’) but also pronounce individual segments in a way which is deviant from normal segmental articulation with respect to the degree to which some properties of sounds are present in a given segment. The first kind of errors, of course, have been often observed with Broca’s aphasics - although not only with them. Errors of the latter kind have been reported by Blumstein, Cooper, Zurif and Caramazza ( 1977). So far the argument appears a valid one. However, Kean then goes on to talk about perception instead of production. In the experiment by Blumstein et al., (1977) Broca’s aphasics listened to a series of artificial speech sounds, differing in voice-onset time (VOT). They were asked both to discriminate the different sounds (by indicating whether two sounds were the same or different) and to name them (by indicating if a particular sound was most similar to da or ta). According to Kean, the discrimination is only a phonetic capacity but the naming “requires assigning a phonological interpretation to the perceived acoustic signal ...” (p. 16). She therefore predicts that Broca’s aphasics will not be impaired on discrimination but only on naming. She then claims that this is “exactly” the pattern in speech recognition that Blumstein et al. found. This evaluation, however, must be based on a reading error. What these authors did find is something completely different. As Table 1 in their paper clearly shows, only one of the five Broca’s that were tested showed the predicted pattern (discrimination +, labelling -). Three patients performed normally both on discrimination and on labelling, and one demonstrated impaired performance on both tasks. So instead of supporting the model, the data actually appear to be in contradiction to it. (3) “Many of the morphological omissions are conditioned by the “sonorance hierarchy” (p. 17). [this term refers to] “... a ranking of sounds from the most vowel-like (vowels) to the least vowel-like (stop consonants)” (p. 16). If such a relation could be shown to exist, this would certainly argue strongly in favor of some phonological explanation. Kean again claims this relation to exist and refers mainly to the “basic research” of Goodglass and Berko (1960) who are said to have found that “post-vocalic consonantal
356
Herman H. J. Kolk
morphemes were more likely to be retained than were post-consonantal (post-fricative, -stop) ones”. Again, however, this is not what was found at all. Goodglass and Berko: (i) studied an unselected group of aphasics, perhaps including some Broca’s; (ii) they did not report on the contrast post vocalic VS. post-consonantal; they did contrast nonsyllabic and syllabic inflectional endings (e.g., books vs. watches; looks vs. catches; man’s VS. horse’s): they found a tendency for the syllabic inflectional endings to be easier. (4) We will now discuss two arguments that bear more directly on the model. The first one relates to the distinction between word-boundary morphemes and formative-boundary morphemes. “Word-boundary morphemes are those affixes which do not affect the stress of a word (in English)” (p. 22). Examples are inflections (progressive -ing, third singular -s etc.) and some suffixes that are used in word formation, e.g. -y2ess in dkfiniteness. Formativeboundary morphemes on the other hand do affect the stress of a word. As an example, Kean gives the suffix -ive in words like definitive, illtistrative. If we drop -ive from definitive we get a word with another stress pattern: d&finite. Other examples that are given are prefixes like per-, ye, sub- and ob- in such latinate words as permit, remit, submit, and object. It is unclear however, how these prefixes can be classified as formative-boundary morphemes according to the criterion of stress relevance. Dropping these prefixes leaves us with non-words for which no stress pattern is defined. Leaving this difficulty aside for the moment, Kean’s argument is that at “the level of word formation, there are different types of affixes which are effected differently in the verbal output of Broca’s aphasics and those classes can be distinguished in terms of phonological structure” (p. 23). By this she means that a formative-boundary morpheme is only rarely omitted as compared to a word-boundary morpheme. We will discuss the two kinds of examples that are given separately. First, the latinate words. Although no data are reported or referred to, let us assume for the moment that the prefixes in these words are indeed only rarely omitted and that morphemes like -ness are quite often omitted. The question now becomes: what does it mean? First, there is the uncertainty, noted above, whether an affix like ob- can be classified as a formative-boundary morpheme according to the stress criterion. Second, Kean silently takes it for granted that these formative boundaries between ob- and -ject or suband -rnit really function as boundaries in the process of sentence production. It might very well be the cast that such words are treated as units. If they are, there is no reason to expect the affixes involved to be omitted by a Broca’s aphasic. In other words, no omission can also mean: no boundary. instead of: a different kind of boundary.
The linguistic interpretation ofBroca’s aphasia
357
A morphemic decomposition of these words during sentence production could be argued for on the basis of speech errors. Garret’s (1975) analysis contains a reference to these affixes. He calls them “moribund” to indicate that they have dubious productivity and are very likely not semantically analyzed. Table VI in his paper gives numbers of errors for these affixes in one important category: stranding errors. It shows that the “moribund” affixes are sometimes separated from their stems (like in: I had instuyed tending, instead of I had intended staying). The number of these errors is 5 out of 46. However, the number of equivalent errors in which nonmorphs are involved (like in: get beady for red time instead of: get ready for bedtime) is of a similar size (6 out 46). So, at least in this analysis, there appears to be little support for the idea that words like object and intend are, as a rule, morphemically decomposed during sentence production. The other example that is given, the suffix -ive in definitive, illtistrative, looks more promising. First, its characterization as a formative-boundary morpheme seems unequivocal. Second, the morpheme has a high productivity and perhaps a better chance of functioning as a separate unit during the production of a sentence. So, if Broca’s would not omit -ive from definitive, or at least much less often than -ness from cleverness, we would have at least some argument to support the distinction between the two types of boundary morphemes (although alternative expanations are of course not difficult to think of). “However, this suggestion cannot be accepted; it is empirically falsified by the fact that suffixes such as -ive are deleted...” (p. 25). Kean seems to say here that these suffixes are omitted (no data are referred to) but that there is a way out. The solution is that the construal process, which puts the actual word boundaries into the sentence, does not only look at stress relevance. In this case, she says, a construal of definitive as [#[#definite#J ive#] is possible because dkj%ite and definitive are lexically related. As was stated before, her treatment of the construal notion is quite unsystematic and the conditions under which it operates are not very clear. We must conclude that the whole argument seems to rest on the supposed rarity of omissions of “moribund” morphemes like ob- from object; a very slender base, as we have seen. (5) Kean’s final argument concerns the word-boundary morphemes only. There are several different sources for these morphemes. Kean mentions three: (a) derivation by word-formation rules; (b) the syntactic structure of the sentence or phrase; (c) the clitization rule which attaches one word to another. All these morphemes are omitted by Broca’s aphasics according to Kean. Now, the argument is that “although there are many different sources for
3 58 Herman H. J. Kolk
these affixes, what unifies them is their phonological properties” (p. 23). Instead of three different explorations for the omissions by Broca’s aphasics we now have one. In our discussion we will restrict ourselves to the first two types of morphemes, the derivational and the inflectional (the third type, that is produced by clitization, seems less essential, and there are probably no data of aphasics to bear upon it). As examples of derivational affixes, Kean not only gives forms like -ness but - some what unexpectedly - also the number morpheme that is traditionally considered to be inflectional. Real inflectional affixes are according to Kean the comparative marker -er, the progressive -ing, the gerundive -ing, the third singular -s etc. Now, there is little doubt that in Broca’s aphasia, inflectional morphemes are often omitted. What about derivational affixes? There is no reference to any data with respect to affixes like -ness, -er (of noun formation) able and the like, nor are we aware of any such data. Such omissions would often give the impression of form-class violations which seem atypical for a Broca’s aphasic. Then, there is the number morpheme. It surely in often omitted. But is it a derivational affix? We will come back to this issue below. So far, it appears that Kean’s final argument remains quite weak because there is no or only controversial evidence to support it. There might even be evidence that goes directly against it. Broca’s aphasics show more problems with the genitive marker and the third person present verbal marker than with the plural marker (cf. Goodglass and Hunt, 1958). This finding has been used to argue for a syntactic explanation of Broca’s aphasia, since the same morpheme (-s) is involved in all cases. Kean’s model would seem to predict no difference between the three morphemes since there is only one phonological structure involved: they are all word-boundary morphemes. However, also this difference can be explained by a phonological factor, according to Kean. She claims that speech errors show similar phenomena and that therefore this particular pattern of omissions is just a result of normal phonological processes. Her argument contains three different statements. (a) The plural s is a derivational affix, the other two are inflectional affixes. (b) In the analysis of one particular subclass of speech errors, the so-called stranding errors (see below), a clear contrast is observed between derivational and other affixes (Garrett, 1975). (c) This contrast can be explained by postulating a phonological factor, namely “the degree to which a bound morpheme adheres to the item to which it is attached” (p. 3 1). Starting with the second statement, derivational affixes indeed appear to have a special status in stranding errors. In these errors, words are exchanged, but the bound morphenes are left behind, like in: /‘m in the dunce for
The linguistic interpretation of Broca’s aphasia
359
(intended: I’m in the mood for dancing). Now, if one looks at the kind of morphemes that are stranded, it turns out that there is a predominance of inflectional affixes. Derivational affixes are sometimes stranded, but - even then - nearly always in conjunction with an inflectional one (like in: All the scorers started in double figures; intended: All the starters scored in double figures). Garrett’s explanation for this contrast is entirely in terms of his sentenceproduction model. The stranded morphenes are the ones that are “syntactically active”; they are computed at “a processing level for which the syntactic organization of the sentence is at issue” (Garrett, 1975, p. 160). The derivational affixes, however, are not as such represented here; the computational vocabulary consists of morphologically complete types. Since stranding errors are assumed to occur on this level, one should indeed expect derivational affixes to be only infrequently stranded. Kean’s account of the contrast between derivational and inflectional data is completely different (cf. third statement above). She states that the difference can be attributed to a phonological factor, “the degree to which a bound morpheme adheres to the item to which it is attached” (p. 31). Derivational affixes would be more “epoxied” to the stem than inflectional ones. This solution is particularly unsatisfying, mainly because it appears to be no more than a restatement of facts: affixes are frequently separated from their stems (that is: stranded) because they are so easy to separate from their stems. The most important criticism, however, concerns the first statement. Should the number morpheme be conceived of as a derivational affix? This is certainly not a proposal everyone would accept. Without going into linguistic argumentation, it seems sufficient to point out that the very data on which her analysis’ is based, the stranding errors, strongly suggests the number morpheme to be an inflectional morpheme. While the “ordinary” derivational morphemes were only infrequently stranded (and if so, it happened when a syntactically active morpheme was also involved), the number morpheme was as frequently stranded as the tense morpheme (17 times out of a total of 46 errors). Now, the possibility remains that there is a difference between the two types of morphemes in the number of “nonerrors, where morphemes are exchanged together with their stranding” stems (e.g. the unicorn and the butterflies; intended: the butterflies and the unicorn). Kean mentions their existence without reporting or referring to differential data. Garrett (1975) does not report on them either. But even if there is such a difference, and it is of a reasonable size, its interpretation will remain questionable in the light of the above pattern of results for the stranding errors.
mooding
360
Herman H. J. Kolk
We have now come to an end with the discussion of Kean’s argumentation that Broca’s aphasia is a phonological deficit and that it works according to her model of phonological simplification. We have yet one comment to make and that is about the surprising extension of her theory to Russian. The problem she is confronted with is that in Russian, unlike English, the inflectional affix of the third person singular present tense form of the verb does play a role in the assignment of stress to a word. In order to maintain her theory she states that in Russian, the construal process, by which the phonological words in a sentence are defined, no longer employs “relevance for the assignment of stress” as its main criterion. For Russian the criterion is: what is “the phonologically simplest form within the lexical entry which includes the word in question” (p. 35). There are two objections to make. First, the explanation clearly has an ad hoc character. For English, Broca’s were assumed to have normal construal capacities. Only what they did with the result of the construal process was abnormal: they realised only the phonological words. For Russian, however, construal occurs according to a criterion of simplicity. One could imagine that Broca’s would do such a thing, but why would normals do it? Stress assignment is clearly relevant for normal language production but the same cannot be said of “simplicity”. A second difficulty concerns the functors. Since stress relevance is no longer a criterion, why are functors so often omitted in Broca’s speech? They will not be eliminated by a construal process that looks for the simplest form of a word. Or should we conceive of sentences as lexical entries? Our final conclusion is as follows. (1) With respect to the articulation impairment in Broca’s aphasia, a phonological approach may have its value. With respect to the perception of voice-onset time distinctions, however, Kean’s predictions were not confirmed. More importantly, articulatory problems are not at all specific to Broca’s aphasia and it is therefore uncertain whether they should be given the central place that Kean gives them. (2) With respect to what Broca’s omit from their sentences, Kean has not given a truly convincing argument that supports her or any phonological theory. (3) What Broca’s aphasics do say is not discussed at all in Kean’s paper. For instance, why are their sentences typically limited to simple declarative sentences (Jakobson, 1956). Why does the patient in Goodglass, Berko, Bemholtz and Hyde (1972) say the girl tall urzd the boy little instead of only omitting the comparative -er? Many people who have had experience with agrammatic patients will have the impression that much more is happening than only simplification of otherwise normal sentences. This impression is borne out by data collected in a recent experiment by Kolk (forthcoming), where agrammatic subjects were trained to produce sentences like the I&m was able to kill or he told the boy to be quiet. The
The linguistic interpretation of Broca’s aphasia
36 1
number of errors made during training that consisted only of omission of function words was extremely small (around 5%). Instead the predominance of errors were syntactical approximations of the target sentences, like the lion killed, he told the boy was quiet. Sometimes, the produced sentences were even more complex phonologically than the target sentence, like in He told the boy who was to be quiet. This kind of evidence strongly suggests that -at least for agrammatism - one should look for syntactic instead of phonological models. References Blumstein, S., Cooper, W., Zurif, E. and Caramazza, A. (1977). The perception and production of voice-onset time in aphasia. NeuropsychoZ., 15, 371-383. Chomsky, N. and Halle, M. (1968). The Sound Pattern of English, Harper and Row, New York. Garrett, M. F. (1975). The analysis of sentence production. In C. Bower (Ed.), The Psychology of Learning and Motivation: Advances in Research and Theory, Vol. 9, Academic Press, New York. Goodglass, H. and Berko, J. (1960). Aphasia and inflectional morphology in English, J. Speak. Ifear. Res., 3, 257-267. Goodglass, H., Berko, J. B., Bernholtz, N. A., and Hyde, M. R. (1972). Some linguistic structures in the speech of a Broca’s aphasic, Cortex, 8, 191-212. Goodglass, H. and Hunt, J. (1958). Grammatical complexity and aphasic speech, Word, 14, 197-207. Jakobson, R. (1956). Two aspects of Language and two types of aphasic disturbance. In R. Jacobson and M. Halle, Fundamentals of Language. The Hague: Mouton. Kean, M.-L. (1977). The linguistic interpretation of Aphasic Syndromes: Agrammatism in Broca’s aphasia, an example. Cog., 5, 9~46. Kolk, H. H. J. (forthcoming). Where do agrammatic sentences come from?