Preface
In spring of 1996, Henry Gleitman taught his 100th introductory psychology lecture course. This happy event provided the opportunity for Henry and Lila Gleitman’s students and colleagues to reflect on the contributions the two of them have made over their distinguished careers. Such reflection led to a convocation in the spring in Philadelphia; the convocation led to the writing of these essays in honor of Henry and Lila Gleitman. The essays contained in this volume are organized into three parts. Part I contains an essay by the editors, outlining the history of Henry and Lila’s careers, both singly and collaboratively, and the impact they have had on the fields of perception, language, and cognition. Readers who have not had the pleasure of knowing Henry and Lila might want to know a bit about them, and knowing a bit about them will, perhaps, help readers to appreciate the essays that follow. Part II contains essays principally addressing Henry’s contributions as a teacher and scholar. These essays are only slightly modified versions of the addresses presented by Henry’s colleagues during the celebration of his 100th psychology course, and with the exception of the last chapter by Lamm, are organized chronologically by the dates during which the contributors were chairs of psychology at Penn. They include an early history of Henry’s teaching at Swarthmore, his influence on the development of psychology at Penn, and the trials and tribulations accompanying the lengthy gestation of his book, Psychology. Part III principally contains essays from former students of Henry and Lila’s, describing their current research and its origins in the Gleitman “seminar” (described in more detail in the introduction). Attesting to the continuing impact of the seminar, the last two essays are written by current faculty members at Penn, who have benefited from the seminar much as past graduate students have. The essays in Part III, like those in Part II, are organized chronologically, from the earliest students to the latest. This last part is the lengthiest, but still reflects
x
Preface
only some (by no means all) of the impact that Henry and Lila have had on the field, through their students. We hope that readers of this volume will take as much pleasure in reading these essays as we, their colleagues, have had in putting together this celebration.
Part I Introduction
Within the fields of psychology, linguistics, and cognitive science, the names of Henry and Lila Gleitman are well known. However, as the Gleitmans have often said, one can truly understand a particular contribution only within its historical context. We therefore present a brief history of the Gleitmans, both singly and collaboratively, with the hope that understanding this history will enhance the reader’s enjoyment of the remaining contributions. The Early Years Henry was born in Leipzig, Germany, on January 4, 1925. He and his family came to the United States in August of 1939, among the very last Jews to leave Germany. Henry attended City College of New York from 1942 to 1946. This was the City College of no tuition and very high standards, the City College that produced more Nobel Laureates than any other undergraduate institution in the world. It was the City College where everything was hotly debated, everything from world politics to the adequacy of behaviorism. It was, in short, one of America’s foremost homes for the development of intellectuals. Henry took a B.S. in psychology there. Henry’s introductory psychology instructor was Kenneth Clark. But the lecturers who Henry says influenced him most were Gardner Murphy—in personality and history and systems—and the Gestaltist Martin Scherer—in physiological and experimental. Murphy, according to Henry, was an elegant man with a very broad view of psychology, and Scherer was a man with enormous charisma and energy as a lecturer. Henry set himself the task of combining those qualities. Henry’s career at CCNY was a bit checkered. It is true that as a junior Henry won the medal for the best senior psychologist at CCNY, besting Julie Hochberg, fellow one-time physics major, now psychology major. But it is also true that Henry ran into trouble with botany and military science. Henry reports that as an act of defiance he once ate the apple he
2
Introduction
was supposed to dissect, an act that, one imagines, contributed to his poor grade for the course. Henry’s botany has improved considerably over the years under the influence of his (and Lila’s) passion for gardening. And, Henry reports, his military science too has improved; a passion for chess has helped that along. In any event Henry did get a B.S. and he took it to Berkeley. At Berkeley, Henry worked with Edward Chace Tolman. Tolman was, of course, the cognitive behaviorist willing to do battle with Clark Hull on his own turf, in the animal laboratory. There Henry, characteristically, ignored the fluff and rhetoric that often surrounded debates about behaviorism to produce research asking a serious question: Exactly what does an animal learn when some behavior is reinforced? (See Rescorla’s chapter, in which he describes this experiment, and his own refinement of it, in detail.) In any event, Henry’s stay at Berkeley was brief, a mere three years to his Ph.D. During that three-year period, in the summer of 1947, Henry returned to New York to visit his family, and there he taught introductory psychology for the first time in the summer sessions of CCNY and Brooklyn College. Meanwhile, Lila Lichtenberg was born on the Upper West Side of Manhattan on December 10, 1929, less than two months after the market crashed. As a consequence of the crash, the family moved to more modest surroundings in Brooklyn, where Lila attended PS 153. Lila’s father was a self-employed structural steel detailer, and as times became better, the Lichtenbergs moved to the “Casa del Ritz,” which Lila proudly included as the return address on all of her correspondence. As another consequence of financial solidity in the household, Lila attended summer camp in Vermont. Camp Kee-Wa-Kee was the site of Lila’s first honor: She became a member of the Blue Dragons in 1939, an honor that still holds pride of place on her vitae. Lila did not plan to attend college, and stubbornly refused to apply to any. Despite her profound efforts in this direction, she was accepted at both Brooklyn College (through no fault of her own, her scores on a mandatory exam had been sent in by her high school) and Antioch College (which was the compromise as it was work-study). She entered Antioch in 1947. Her mother was delighted because, as she told Lila, “People who go to college can talk to anyone about any subject.” (Events certainly proved her mother right.) The program at Antioch required that students devote half time to outside (nonacademic) work. Although Lila did attend classes, the more memorable parts of her college career were her work. She held jobs as an occupational therapist at the Delaware State Hospital for the Insane (where she attempted to teach amnesic patients to perform plays with marionettes), as a reporter (doing press releases for the European Reconstruction Agency, under
Introduction
3
the Marshall Plan, in Washington, D.C.), and as the editor of Antioch’s college magazine, The Idiom (where she penned some of the more leftwing editorials). After graduating, Lila and another writer for The Idiom (nicknamed “Hyphen” for his compound name) went to New York City as literary lights. Hyphen got a job as an assistant editor at Doubleday. As Hyphen’s female equivalent at Antioch, Lila got the female equivalent of his job: She became a dictaphone operator. Lila reports that she only noticed the discrepancy in job assignments some years later; and at the time, they were both very happy with their “jobs in publishing.” From there, Lila moved up to “Gal Friday” at the Journal of the American Waterworks Association, where she ran the journal. One weekend, Lila joined some friends at Dartmouth, where she met Eugene Galanter, an assistant professor of psychology at the University of Pennsylvania. That weekend, they decided to get married, and did so the following Saturday. They moved into an apartment at 36th and Spruce, and Lila—as a faculty wife—began to take courses in Greek as part of a program in classics. At the time, the great Indo-European scholar Henry Hoenigswald taught courses in both linguistics and classics. Lila took Greek from him, and spent hours translating text. Hoenigswald recognized that Lila loved the parsing of Greek sentences most of all, and encouraged her to work with Zellig Harris in the linguistics department. Following his advice, she became a graduate student in linguistics, working on Harris’s grant, “Tranformation and Discourse Analysis Project” (TDAP). The central problem here was to understand the relationship between sentences in a discourse, and in particular, how any given item moves through the discourse, for example, changing from “new” information to “given.” The fact that a single item could change its function over sentences led to the problem of how one could—in principle—relate the different occurrences of the item to each other over sentences. If “Bill” occurs in one sentence as subject and then shows up as object in another, how could these links be described? Harris’s idea was that each of the relevant sentences could somehow be related to a central, or “kernel” sentence—with “transformations” relating the kernel to each of its realizations. Thus the TDA Project sought to relate sentences to each other using a mechanism that would—in some form—come to play a key role in future advances in linguistics and psycholinguistics. As part of her graduate training, Harris advised Lila to learn how to work the UNIVAC2 computer, which had been donated to the university by Univac in recognition of the contributions of Eckert and Mauchly to the development of a general purpose digital computer. The UNIVAC occupied the entire first floor of the David Rittenhouse Laboratory. It
4
Introduction
also had astonishing computing power for the time, though considerably less than our current handheld calculators. Working on Harris’s project was a group of brilliant graduate students (Lila Gleitman, Bruria Kauffman, Carol Chomsky, and Naomi Sagar) as well as an engineer enlisted by Harris—Assistant Professor Aravind Joshi. Joshi was fascinated with the problem of how to use a computer (designed, after all, for number crunching) to understand language. Harris enlisted Joshi to develop a parser—an automaton that would be able, to a limited degree, to comprehend the running text of a natural language using Harris’s analytic methods of distributional analysis. In doing so, Joshi became the first computational linguist. The parser that emerged from this project, Joshi recently reminded us, still rivals (or outperforms) the best current parser in the field. But independently, Lila was beginning to wonder whether distributional analysis could really properly capture the organization of language or language learning. Noam Chomsky, a recent graduate of Penn who had also studied under Harris, suggested that the entire enterprise was doomed to failure, and he provided Lila with a copy of his recent book to read. She read Syntactic Structures secretly, and clearly recognized how Chomsky’s approach reformulated problems in the organization of language and language learning. The Middle Years On taking his Ph.D., Henry moved back to the East Coast to take up a position as an assistant professor at what was then the Mecca in exile of Gestalt psychology, Swarthmore College. There, Henry joined a faculty with, inter alia, Solomon Asch, Wolfgang Kohler, and Hans Wallach (perhaps the person Henry admires most as an experimentalist). Henry stayed at Swarthmore for fifteen years. While there, he undertook many projects. For one, he worked with Solomon Asch on his famous studies of conformity and independence. For another, he began work with two graduate students (Ulrich Neisser and Jacob Nachmias) on what they saw as a three-part series, a series that they envisioned as the definitive deconstruction of Hullian learning theory (see Nachmias’s chapter, this volume, for more on this topic). And Henry began to grow famous for his learning theory seminars—evening seminars, of course. (Although Henry, like his mentor, believes in the possibility of serious cognitive activity in the minds of rats, he certainly does not believe in the possibility of serious mental activity before noon.) These seminars began in the evening but they ended whenever the topic was exhausted, be that three, four, or five hours after they started. They certainly did not end when Henry was exhausted—Henry does not become exhausted while
Introduction
5
engaged by psychology, poker, or God knows, the theater (see Nachmias and Rescorla on these Swarthmore learning seminars). Henry began something else while at Swarthmore—Psychology, the book. At Swarthmore, Henry made contact with Don Lamm. Don was Henry’s first editor. But before Henry could finish the book—some eighteen years later—Don became President of W. W. Norton. Still, Lamm remained an über editor for the book, always available for good counsel. (See the Lamm chapter for how this relationship was established and how the book came to be.) Also at Swarthmore, Henry met Lila. (And, for those who have read recent work by the Gleitmans, Lila met Henry. They met each other.) Through Gene Galanter (whom Lila had divorced after one year), Lila had become acquainted with a number of faculty in psychology at Penn, and there was a close connection between the Penn and Swarthmore psychology faculties. Henry immediately fell in love with the elegant and brilliant Lila, and they were married. In 1963, Henry and Lila left Swarthmore—not for Penn, but for Cornell. Although this might have been disruptive to Lila’s graduate training, the timing was not all bad, for Lila had begun to leave the Harris fold and had just written her intended thesis—a transformational analysis of conjunction entitled “Coordinating Conjunctions in English” (published two years later, in 1965). Already a distinguished linguist but not yet a Ph.D., she accompanied Henry to upstate New York. At Cornell, Henry continued his investigations of memory. As Henry tells it, he mainly investigated whether a person could remember what the sun looked like without seeing it for an entire academic year. Ithaca, as it turned out, was not Henry and Lila’s cup of tea: The final moment involved a horse and their bedroom window. Henry was willing, therefore, to return to Philadelphia to become professor of psychology at Penn, as well as the chairman of the department of psychology. He would also, of course, continue writing The Book. The Gleitmans returned to Penn in 1965. These were exciting times, generally and specifically, at Penn psychology. In the late 1950s, the trustees and provost had decided to revamp the School of Arts and Sciences. In part because of the influence of Provost David Goddard, the first department to be revamped was psychology. A physicist, Robert Bush, was brought in as chair to do the revamping. Under his leadership, a revolution, a bloody one from some reports, was wrought at Penn in the early 1960s. Bush hired many luminaries—Dottie Jameson and Leo Hurvich, Duncan Luce, Jack Nachmias, Dick Solomon, and Philip Teitelbaum, among others. But as he came to the end of his term as chair there was a need for an appointment that would combine research excellence, administrative acumen, and
6
Introduction
brilliant teaching of undergraduates. Thus was Henry called from Ithaca to become the chairman whose task it was to solidify the revolution. It was Henry, then, who stabilized the department, who gave it its traditions, and who made it into the place that Henry and Lila’s graduate students (who have written most of the essays in this volume) would later find. During this time, Lila and Henry had two daughters, Ellen and Claire. A scholar of learning, Henry was quite sure that babies had no capacities other than eating, sleeping, and crying. But Lila, a scholar of language, recognized in her young children something quite remarkable. They learned language rapidly, with no obvious effort, and in the frank absence of any explicit tutoring. Her close friend, Elizabeth Shipley, was also a young mother, as well as a psychologist trained under Duncan Luce. And she agreed with Lila: There was something quite remarkable about language learning in infants. Lila and Liz began to systematically study their children’s language learning (see chapter by Shipley, this volume), and with Carlotta Smith, won their first grant to support the work. Their first publication on the topic was “A study in the acquisition of language: Free responses to commands” (Shipley, Smith, and Gleitman 1969). The paper, seminal in its theoretical and experimental sophistication, laid out many of the key issues that frame research in language learning today. The experiment itself was among the first to set up conditions for tapping into children’s knowledge of language without relying on spontaneous speech to do so. The method and findings—which revealed greater competence than was evident from children’s spontaneous production—were argued to provide a basis for understanding the organization of language in young children. The theoretical context contrasted strong nativist positions of that time such as Chomsky’s with strong empiricist positions then held by Bloomfieldian linguists and psychologists (see Newport’s chapter, this volume, for more on this contrast). Although the authors leaned toward a nativist stance, they firmly argued that considerably more empirical evidence was needed before understanding the precise interactions between the child’s innate endowment—whether specific knowledge or data-manipulating tendencies or both—and the learning environment that was available to the child. At around the same time, Lila decided to finish her Ph.D. Her friend and mentor, Henry Hoenigswald, convinced her to do her dissertation under Henry Hiz in the linguistics department. She had already published the conjunction paper (her intended thesis), but was summarily told that this would not “count” for a thesis because it had already been published. She turned to study the structure of compound nouns, which she investigated from a theoretical perspective (doing formal lin-
Introduction
7
guistic analyses to characterize the nature of these compounds) and from a psychological perspective (eliciting people’s intuitions about the differences in meaning between pairs such as “black bird-house” and “black-bird house” in order to discover the linguistic structure). (As Lila has taught her students, these two approaches are two sides of the same coin.) When she presented these ideas to her committee in linguistics, she was told that the formal linguistic analyses would make a fine dissertation; but that she would not be able to present the results of her experiments on intuitions, because that was psychology. So Lila presented the intuitions in a format more congenial to the linguist’s ear and eye: sprinkled throughout the text and labeled sequentially with numbers. Thus she received her Ph.D. in linguistics from Penn in 1967. Lila took a position as an associate professor of linguistics at Swarthmore College in 1968, and stayed there for four years, serving as Swarthmore’s linguistics department and teaching an entire generation of budding linguists and psycholinguists, including Lance Rips, Gary Dell, John Goldsmith, Muffy Siegal, Emily Bushnell, Robert May, and Elissa Newport (who, though a graduate student at Penn, drove to Swarthmore three times a week to learn linguistics by taking all of Lila’s courses). This second tour of duty at Swarthmore was more rewarding than the first (as faculty wife), especially since this time she did not have to wear white gloves to Sunday tea. The Modern Era The beginnings of The Modern Era are marked by two events: the publication of Lila and Henry’s first collaborative effort, and the initiation and subsequent flourishing of “The Seminar.” The first collaborative effort was in part a result of the linguistics department’s dictum to Lila: No experiments in the thesis. But Lila had, in fact, conducted experiments, and the data appeared to hold rich information about the organization of compounds. As Henry and Lila discussed and debated some of the results, new questions arose, together with elegant experimental designs and data analyses that were (and still are) Henry’s hallmark. The result was their first joint publication, Phrase and Paraphrase (1970). The developing seminar was a natural outcome of such collaboration. When Henry came to Penn in 1964 there was an ongoing seminar on memory. But by 1970 that seminar had become Henry’s research seminar, and by 1972 Lila had joined it and it became Henry and Lila’s research seminar dedicated to educating graduate students (and first attended by Heidi Feldman, Sandra Geer, Susan Goldin-Meadow, John Jonides, Peter Jusczyk, Deborah MacMillan, Elissa Newport, Marilyn
8
Introduction
Shatz, and Liz Shipley). Throughout the first decade of this seminar, Henry was, of course, still writing The Book—a process that set the standard for all students who would later be engaged in scholarly writing. Some time during the late 1980s, the seminar became the “Cheese Seminar” (since all along, various gourmet cheeses accompanied the discussions of research), and by the 1990s, simply “Cheese.” It was at these evening seminars that many of the contributors to this volume learned how to do psychological research, learned to love psychology, and learned to love triple cremes. (They did not need to learn to love Henry and Lila, since that is innate.) It was during the seminar that students presented budding ideas (always made into a real idea by Henry), possible experimental designs, theoretical puzzles, and job talks. Critically, it was also during these seminars that students learned to ask a good question, and to know what a good answer might be—even if they did not have that answer. Students of Henry and Lila consider the seminar to represent the core of their training, as many of the chapters attest. The learning at these evening seminars has not just been for psychologists, since students in the School of Education, the linguistics department, and, lately, the computer science department have also been welcome. During most years, the research seminar has been a joint project of Henry and Lila, but Henry has not always participated. Illness and the demands of directing a play have sometimes kept Henry away. In those years the seminar has often been a Liz Shipley and Lila Gleitman course. Collaborative effort in research has always been completely natural to Henry and Lila, and it continued during the 1970s, largely as a consequence of the seminar. During the 1970s, both Henry and Lila collaborated with Paul Rozin, a young psychobiologist in the Penn psychology department (see the chapter by Rozin, this volume). For example, in 1971, Henry published a study with Paul Rozin on goldfish memory as a function of temperature during the retention interval. Yes, goldfish memory as a function of temperature during the retention interval. A perpetually pressing issue has been whether forgetting is a matter of the mere passage of time, or is instead the result of interference from the intervening events that time allows. The trouble is, of course, that time and events are closely correlated. Seemingly, what is needed to answer this question is a time machine—a device that can make time go faster or slower while keeping the events that occur constant. Time machines are hard to find; but Henry and Paul realized that biochemical processes are a function of time and temperature. So if two organisms are at different temperatures for the same interval of time, from a biochemical point of view, this is equivalent to time’s moving faster for the
Introduction
9
hotter organism, but experiencing the same events. This experiment cannot be done with humans; changing external temperature for warmblooded animals does not change their metabolism. But cold-blooded animals, like goldfish, can readily be warmed or cooled by changing their water temperature. So Rozin and Gleitman, lacking a time machine, heated and chilled their fish. Clever! Henry recently summarized the results for us: “If you want to be a goldfish who remembers, spend the retention interval in a refrigerator.” The results of this first study were, according to Henry, quite strong, and were published in a scholarly paper. Henry’s hand was evident in an elegant and notable control that evaluated the possibility that failure among the heated fish was due to their brains’ boiling. In this control, fish were heated at 90 degrees for 60 days and were then given three days to readjust. These fish did as well as those who who learned at a cool 60 degrees. But an attempted replication failed: Some of the fish suffered from the Ick, and were unsuccessfully treated for the disease by a research assistant. The dead fish showed no hint of remembering anything. Lila, meanwhile, had moved from Swarthmore to Penn, as William Carter Professor in the Graduate School of Education. There, she collaborated with Paul Rozin on studies of reading—specifically, developing the idea that an orthography based on syllables might be more “accessible,” hence easier to learn, than an orthography such as the alphabet, which requires a highly abstract mapping from sound to individual letter. Their first publication, in 1973, posed the problem of reading as a problem of mapping (or unmapping) orthography to a psychologically appropriate level of phonological representation. This inflamed many educators, who were entrenched in existing methods of teaching reading “for meaning,” which Lila and Paul pointed out was the equivalent of claiming a method of teaching driving “for destination.” Gleitman and Rozin (and Rozin and Gleitman) went on to publish two landmark theoretical and experimental papers documenting the logic of their approach, using the history of writing systems and the psycholinguistics of sound processing as supports. But the bulk of collaboration was done with graduate students. In the early 1970s, Henry and John Jonides worked on mechanisms of item recognition. John was the “senior student” in the seminar, the only person besides Henry who had his own clearly designated chair at the evening meetings. Henry and John were intrigued by the following issue: It had been repeatedly demonstrated (by Ulric Neisser among others) that the visual features distinguishing one item from another have a profound influence on people’s speed and accuracy of recognition. The paradigm that became popular to explore these issues was
10
Introduction
visual search: requiring subjects to search for some prespecified target item (say, the letter X) among an array of other items (say, other letters). Using this paradigm, even the most casual experimentalist could find that when the target item was physically different from its background distractors (say, an X among Os and Cs), search was faster and more accurate than when they were similar (say, an X among Ys and Ks). This result, together with much additional evidence, has been amassed to argue for a featural theory of visual recognition, a theory that is still the leading contender today. Going beyond these results, Henry and John explored the possibility that it was not just physical featural differences between characters that might influence visual recognition. In addition, they hypothesized, categorial membership might be a distinguishing characteristic. Building on a previous result, they confirmed that visual search for a digit embedded among letters was faster and more accurate than for a letter among letters (of course, this experiment could not have passed Henry’s muster had it not been exhaustively counterbalanced within a breath of its life for which items were targets and distractors). They then went on to show that it was not a physical difference between members of the category “letter” or “digit” that differentiated them: A comparable pattern of results obtained when the target, the item “0,” was described to subjects as the letter “O” or the digit “0.” Beyond this, Henry and John went on to explore the intricacies of visual search based on categorial difference in several papers that followed. All of this earned them a reputation for having cornered the market on alphanumeric stimuli, and it earned John his first academic job. During these years, Lissa Newport also began to work with both Lila and Henry. She met Lila, as mentioned above, when Lila taught linguistics at Swarthmore: At the time, Penn did not teach generative linguistics, so Lissa commuted to Swarthmore to acquire appropriate training in linguistics for beginning work with Paul Rozin and Oscar Marin on aphasia. Soon after beginning this arrangement, however, Lissa separated from her then-husband and became an early member of the Gleitman Hotel for Wayward Academics. Though she had at the time never met Henry (who was on sabbatical during her first year in graduate school) and knew Lila only from classes, she was warmly invited to stay at the Gleitmans’ while she searched for a new place to live. Like many who followed her, she found refuge in the Gleitman kitchen and living room, and, through countless hours of warm conversation, was nurtured back from thoughts of quitting school to debates on the structure of the mind and nativist approaches to learning.
Introduction
11
Lissa also read the latest Gleitman, Gleitman, and Shipley grant proposal and grew interested in their discussion of approaching the naturenurture question in language acquisition by studying mothers’ speech to children. The grant proposal suggested that perhaps mothers shaped their speech to children in accord with children’s abilities to comprehend that speech, a suggestion also raised in Shipley, Smith, and Gleitman (1969). If true, they went on, speech to children might provide more well-structured input for language learning than usually believed, and this in turn might change our views of the extent and character of innate knowledge required for acquisition. With this possibility in mind, Lissa, Lila, and Henry began in 1972 to collaborate on a study of fifteen mothers interacting with their young children. Henry, always good with nicknames, christened this interest Motherese, and this became the term used widely in the field for speech to children (until the earnest 1980s and 1990s ruined a good phrase by turning it into the more politically correct caregiver speech). The work progressed slowly: During the first six months, the seminar members heard Lissa reporting, on a weekly basis, “Still transcribing.” But the real problems had to do with conceptualizing the problem of how maternal input could help children learn language—beyond the obvious fact that it provided the model of the child’s native language. During this period, in the early 1970s, a number of dissertations had appeared on mothers’ speech to children, and all of these had shown that Motherese exhibited sentences that were short and overwhelmingly grammatical. On the basis of these facts, many people in the field had concluded that these characteristics meant that Motherese was “simple input” to the child, and by inference, that this simple input must help solve the learning problem. But Lissa’s early analyses kept looking different. While Motherese sentences were indeed shorter, the grammar required to describe them was not particularly simple: Mothers used a wide range of sentence types to their children, including yes-no and wh-questions as well as imperatives, which required a full range of rather complex syntactic transformations to generate. By contrast, mothers’ speech to an adult (Lissa) consisted almost entirely of simple active affirmative declarative sentences, which were the kernel (and relatively less complex) sentences of a transformational grammar. From a grammatical point of view, then, Motherese was not so simple and did not appear to present a new solution to the language acquisition problem. These findings, and a discussion of their significance for acquisition theory, later became Lissa’s dissertation. But Lissa, Henry, and Lila continued on together to ask a more important question: How could one go beyond describing Motherese and
12
Introduction
find evidence about whether, in fact, it produced any changes in acquisition? As usual, this drew on Henry’s remarkable skills in analysis and design, particularly needed in this case because one couldn’t easily bring this problem into the lab and conduct an ordinary psychological experiment. They took two approaches to the question. First, they conducted an experiment asking whether mothers’ frequent tendency to repeat themselves, producing strings of related sentences, might help children to analyze the grammar. Lissa and Henry designed a repetition experiment with a clever analysis to distinguish the benefits of merely having more opportunities to respond from potential benefits to learning over the repeated presentations. The results: no learning. It began to dawn on them that perhaps these negative results were not so negative after all: Perhaps Motherese did not change the problem of acquisition in such a clear way. The second line of work asked whether the individual differences that occured among the fifteen mothers would correlate with differences in their children’s acquisition success over subsequent months. Since the study was originally designed to ask a different question (how did mothers speak to children of different ages and linguistic abilities?), the differences between the children in initial linguistic ability were removed by performing double partial correlations. As the millennium is here, it may be hard for younger readers to appreciate what this meant. Double partial correlations were first performed by using punch cards on a mainframe computer; later, the situation improved substantially by using an extremely expensive hand-held calculator that could add, subtract, multiply, divide—and had one memory, which enabled it to compute simple correlations, which could then in turn be combined to produce double partial correlations. To Henry, Lissa, and Lila’s surprise, the results did not show overall relationships between aspects of maternal speech and their children’s learning: Those mothers who produced the most simplified Motherese did not produce the most gifted learners. However, there were a number of other significant correlations, which Lila and Lissa quickly realized fit into a very different conceptualization—one that also had been suggested by Shipley, Smith, and Gleitman. First, the acquisition of nouns and verbs—the argument structure and open class words for them— did not show effects of variation in maternal input. These seemed to emerge on their own timetable, without strong relations to the details of maternal input. But second, the acquisition of closed class elements, such as verbal auxiliaries—those grammatical elements that vary most across languages—did correlate with aspects of maternal speech. Critically, the aspects of maternal speech that most strongly correlated with
Introduction
13
the learning of verbal auxiliaries were the positions in which these items appeared in Motherese. In Shipley, Smith, and Gleitman, children attended most prominently to certain parts of input sentences, especially sentence beginnings. In accord with this, the Newport, Gleitman, and Gleitman results showed that mothers who produced the most auxiliaries at sentence beginnings—by using more yes-no questions and fewer imperatives—had children whose own sentences were decorated by verbal auxiliaries the earliest. In short, Motherese simplification did not appear to change the character of language learning. Instead, the results turned the motivating force in acquisition back to the predispositions of the child. In many quarters, this was not a popular claim. As Lila has pointed out in recent years, American psychology does not find comfort in nativism. Apparently, as she has succinctly put it, “Empiricism is innate.” But the findings did help to begin a line of research, by the Gleitmans and their collaborators and students, that systematically investigated natural variations in the variables contributing to acquisition. The outcomes of this research consistently revealed that the mind of the child— and only secondarily her input from the external world—formed the most substantial contributions to the acquisition process. In contrast to the Newport, Gleitman, and Gleitman studies, these subsequent lines of work wisely examined more extreme variations in input or in internal variations than had been measured in the double partial correlations of that first work. Newport, Gleitman, and Gleitman’s (1977) work produced the first article published in collaboration with students in the research seminar. Also ongoing during this period was work by the Gleitmans with Heidi Feldman and Susan Goldin-Meadow on the creation of language by congenitally deaf children; and a paper by Feldman, Goldin-Meadow, and Gleitman appeared in 1978. The papers with Jonides, with Newport, and with Feldman and Goldin-Meadow mark the coming-of-age of the first generation of research seminar participants. They also mark Lila’s first work on understanding language acquisition by looking at children for whom the usual inputs to language acquisition are absent—a theme that threads the work of seminar participants over several generations (see the chapters in this volume by Newport; GoldinMeadow; Landau). By 1978, Henry was still, of course, writing The Book. And the seminar had grown to include about a dozen students from psychology and from education. Although Lila’s original appointment was in the School of Education, in 1979 she emigrated to the departments of psychology and linguistics in the School of Arts and Sciences, where she later became the Marcia and Steven Roth Professor. A new faculty member at
14
Introduction
Penn—Liz Spelke—also joined the seminar, with her students Phil Kellman and Hillary Schmidt, both interested in infant perception. The seminar now also included other students who did not work directly with either Lila or Henry (such as Dan Riesberg and Phil Kellman; see their chapters) but who saw the seminar as a critical event in their graduate education. Also present were students working mostly with Henry (Judy Foard; Jerry Parrott; see Parrott’s chapter), and those working mostly with Lila (including a number of the students from education, such as Julia Dutton, Barbara Freed, Pam Freyd, Kathy Hirsh-Pasek, and George Meck). Parrott’s work with Henry concerned a topic dear to Henry’s heart: the nature of a special group of emotions that included humor, playfulness, curiosity, and the appreciation of beauty. Henry was deeply interested in these emotions as they were, after all, what he regularly called up in his capacity as director when he—every several years—directed a play at Penn or elsewhere. Henry and Jerry hoped that experiments on the role of expectation and surprise in simple humor might help them understand something about the more complex types of expectation and surprise found in comedy and drama. This led them to study adults’ surprise and humor responses to unexpected elements of animated events. Discovering that Penn undergraduates’ responses were rather complex, however, Jerry and Henry turned to infants—who they assumed would, at least, constitute a “simpler preparation.” This developmental work fit well within the seminar group at that time, for many of the students were doing research on language learning, especially as framed by the deprivation paradigm. This included the acquisition of language by deaf children and second language learning by adults. Two other students in psychology were also intrigued with the deprivation paradigm, and eventually came to study language learning in children with Down’s Syndrome (Anne Fowler) and children who were born blind (Barbara Landau). Barbara Landau came to graduate school particularly intrigued with the recently published work by Feldman, Goldin-Meadow, and Gleitman on the creation of language by deaf isolates—a case that provided stunning evidence for the emergence of a structured linguistic system with no formal linguistic input (see Goldin-Meadow’s chapter, this volume). Having relatively little background in linguistics at the time, Barbara naively believed that the creation of a language in the absence of linguistic input was not surprising, and could perhaps be explained by considering the rich perceptual and conceptual environment in which the deaf children developed. Specifically, observations of objects and events might allow the children to construct a semantic system that would be the foundation for language. Barbara and Lila had lengthy dis-
Introduction
15
cussions on the issue of whether one could, in fact, construct a language without so observing the world; and these discussions led to the question of how the blind child could learn a language, given that she would have to construct the semantics of a language in the absence of rich perceptual information afforded by the visual system. Lila cautiously agreed that this would be an interesting topic to pursue, and the two set out to study language learning in the blind child. Something curious happened, however. Barbara had recruited three blind children each around eighteen months, but none of them was producing much language. However, what they were doing was equally fascinating: exploring and recognizing people and objects around them, navigating through their environments without hesitation, and interacting with the world in a way that suggested that they were constructing a rich spatial world. How could this occur? Henry was immediately taken with this question, as was Liz Spelke, and with Barbara, they began to study the emergence of spatial knowledge in the blind child. Under the close guidance of Henry and Liz, Barbara’s informal observations of the capacities of the blind child were quickly turned into elegant experiments demonstrating the capacity of the blind child to learn spatial routes and make spatial inferences to travel along new routes in novel environments. These studies culminated in several publications documenting spatial knowledge in a very young child who was blind from birth; these papers reflected the further influence of Randy Gallistel, who guided the authors to understand this knowledge as a geometric system that emerged early in life with or without visual experience. The existence of such a spatial knowledge helped Lila and Barbara partly explain how the blind children learned language (as they did, before long): If the blind child possesses spatial representations, then these could provide the foundation for the development of semantic representations of objects, locations, motions, etc. But perhaps less easily explained were the striking observations that they made regarding a special portion of the blind child’s vocabulary: Among the earliest and most productive words in the blind child’s vocabulary were words such as look, see, and color terms. Intensive experimental investigation showed that the semantic structure of these terms as used by the blind child was quite rich and in many ways quite similar to those of sighted children. For example, the blind child used the term see for her own activity of exploring with the hands (though she used the term for others to refer to visual exploration). It was the theoretical consideration of these phenomena, however, that led Lila and Barbara deeply into questions about the nature of concepts and meanings, and the mechanisms by which they could be
16
Introduction
learned by children. On the issue of concepts, Lila and Henry were also actively collaborating at the time with a postdoctoral member of the research seminar, Sharon Armstrong. Their question concerned the representation of everyday lexical concepts, such as “dog,” “apple,” etc. During the late 1970s and early 1980s, the field had been heavily influenced by the work of Eleanor Rosch and colleagues, who had argued that everyday lexical concepts have “prototype” structure. Rosch’s findings confirmed the observation that people typically can judge members of a category as to their representativeness in the category, or their “goodness.” For example, robins are rated as “better” exemplars of birds than penguins, and these differences across category members seem to have reflexes in processing time as well as explicit judgments. For Armstrong and the Gleitmans, however, the evidence did not logically prove that prototype representations were any more psychologically real than were representations including the necessary and sufficient conditions for membership. They set out to disprove the arguments on logical as well as empirical grounds (and provided as well a devastating critique of featural theories in general). In 1983, they published “What some concepts might not be,” whose burden was to show that the sort of evidence that had been collected to show that the concept “vegetable,” say, has prototype structure (rather than having the classical structure of necessary and sufficient conditions), could also be collected for the concept “even number” (a clear example of a concept with a classical definition). Subjects are indeed willing to judge brussels sprouts to be worse examples of “vegetable” than carrots, but they are also willing to judge 24 to be a less good example of “even number” than 4. If such evidence, the paper argues, is enough to convince you that the mental representation of vegetable is prototypical rather than classical, it should convince you of the same for “even number”! Thus members of the Gleitman seminar, with their leaders, wrung their hands over the prospects of characterizing everyday lexical items in terms of features—prototype or not. But on another front, Landau and Gleitman were still considering the puzzle of how the blind child came to learning the meanings of visual verbs such as “look” and “see.” Acknowledging that they would probably never be able to fully characterize the “meanings” of these terms, Lila and Barbara turned to a somewhat different question: How could aspects of the meanings of a word be learned if the putative conceptual foundation for the meaning was absent in the experience of the child? Theories of semantic acquisition typically assume that semantics is somehow transparently accessible to young children, through perceptual and motoric interactions with the environment. For the case of a word such as “look,” the obvious mechanism for learning would be to
Introduction
17
link the hearing of the word with the experience of looking. But, presuming that knowing the meaning of “look” involves some experience of actually looking, how could the congenitally blind child ever learn its meaning? The theoretical analysis of this problem took several steps. The first was to consider the obvious: The child could have learned the haptic meaning of “see” or “look” because her mother used it only in those contexts in which seeing or looking was possible—that is, when the child was near some object. However, analysis of the contexts in which the mother uttered these words to the child revealed that such simple contextual factors did not distribute themselves in a way that would have allowed the child to group together the visual verbs as distinct from other verbs in the corpus. The next step was to consider the less obvious: that the syntactic contexts in which the verbs appeared could have provided the child with additional information about their meaning. This analysis suggested that the joint use of syntactic and contextual information could indeed result in separation of the visual verbs from all others (as well as group other semantically related verbs together coherently). To this point, the idea that learners could use syntax to discover semantics was anathema in the field, which typically regarded the major mechanism of learning to occur in the opposite direction (i.e., the learner uses semantic categories to project syntactic categories). However, the idea was not novel to Lila: Indeed, its foundation could be seen as a product of her training under Harris. The central idea there was that distributional analysis was a powerful tool for deriving regularities in language. In the case of verbs, the meaning of an individual verb should be derivable from its distribution across all of its syntactic frames—an idea that was evident in Lila’s “Great Verb Game.” In this parlor game, people were given a sentence containing one novel verb and were asked to guess the meaning of that verb. With one sentence, this was quite difficult; but as additional new sentences (with the same nonsense verb) were provided, it became easier and easier, until there was just one answer. The reason, theoretically, is that each verb participates in a unique set of syntactic contexts; hence analyzing the distribution of a novel verb across such contexts should yield a unique answer. The general idea that syntax could aid in discovering the meaning of a verb made a great deal of sense to Landau and Gleitman, who used the idea of such distributional analysis to evaluate whether the sentences that the blind child actually heard could provide the basis for inferring the meanings of the verbs therein. This analysis resulted in a specific theory of how the blind child could acquire verbs such as
18
Introduction
“see”—and, by extension, how any child could acquire verbs such as “think,” “know,” and “believe,” whose immediate perceptual interpretation was not obvious. Although the idea of distributional analyses remained central to their theory, an important insight gained by this study was that groups of semantically related verbs shared sets of syntactic frames. This meant that a distributional analysis of frames could yield a “ballpark” semantic interpretation—that the verb to be learned was a verb of perception, of cognition, or a motion verb, for instance. These ideas were published by Landau and Gleitman in Language and Experience. The theoretical idea of using syntactic context as a mechanism of verb learning—later dubbed “syntactic bootstrapping”—gave rise to numerous empirical predictions, which have recently been investigated by Lila, Henry, and their students (and are reviewed in a 1997 paper by Lila and Henry in Lingua). For example, can young children actually use syntactic context to interpret novel verbs, and can they use these contexts to infer new meanings for old verbs? (Yes, they can, as shown in various papers written by the Gleitmans in collaboration with Letty Naigles, Cynthia Fisher, Susan Rakowitz, and Geoff Hall.) Do adults represent verb meanings in such a way that one can recover semantic structure from the syntactic contexts in which they occur? (Yes, they can, as shown by Fisher with the Gleitmans.) Is the (nonlinguistic) contextual environment for verb learning really such an unreliable base from which to infer meanings? (Yes. It is virtually impossible to predict the presence of any specific verb from the nonlinguistic context surrounding its use; however, such context provides excellent information for predicting the presence of a specific noun; as shown by Gillette with the Gleitmans.) Honors and Awards Finally, there are honors and awards. Henry has won all of the teaching awards for which he is eligible—all those that have to do with teaching psychology (he has not to our knowledge won any awards for the teaching of chemistry or microbiology). He has won the University of Pennsylvania’s award (the Lindback), the School of Arts and Sciences award (the Abrams), and the American Psychological Association’s award (The Foundation Award). Had he not won them, it would have called their legitimacy into question. He is a Fellow of the Society of Experimental Psychologists. And he has been the president of the two divisions of APA he would have wanted to have been the president of, namely, Division 1 (General Psychology), and Division 10 (Psychology of the Arts). It is not coincidental that these two divisions represent
Introduction
19
Henry’s brilliance as a teacher of psychology on the one hand and as a director of plays on the other. Lila has served as president of the Linguistic Society of America, is a fellow of the Society of Experimental Psychologists and the American Association for the Advancement of Science, and was recently named the Fyssen Foundation Laureate (the equivalent of the Nobel laureate for language and cognition). She is currently codirector (with Aravind Joshi) of the Institute for Research in Cognitive Science at Penn, whose origins date to the late 1970s, when the Sloan Foundation decided to stimulate development of the emerging field of cognitive science through Penn’s interdisciplinary faculty. The institute is a Science and Technology Center of the National Science Foundation; it is the only Science and Technology Center grantee of the NSF in cognitive science, and it continues to have computerized parsing of natural language as one focus. So Zellig Harris’s vision in the late 1950s continues to be influential through Lila, among others. Lila is also the editor of An Invitation to Cognitive Science (volume 1), the first attempt to pull together the various strands of research that constitute cognitive science. These awards and honors reflect, after all, what Henry and Lila are committed to, and what they have successfully fostered in an entire generation of students: excellence in teaching and excellence in research. In turn, these reflect the single deepest common thread woven through their careers: Teaching and research, mentoring and collaborating have always been bound together as the foundation from which all else follows. For those of us who have benefited from this foundation, Henry and Lila’s intellectual presence has been an inspiration, and their personal presence has changed our lives. Postscript Henry did finish The Book, with its first publication in 1981. Psychology appeared in its fifth edition in the fall of 1998.
Chapter 1 Der Urgleit Jacob Nachmias We are gathered here to reflect on the contributions of Henry and Lila Gleitman to education at Penn. You have already heard much—and will hear more—about these very considerable contributions from students and colleagues who have had the good fortune to work closely with the Gleitmans in recent decades. I think that the best way for me to make a nonredundant contribution to these proceedings is to capitalize on the fact that, with one probable exception, I have known Henry Gleitman far longer than has anyone else in this room. Thus my remarks are along the lines of a memoir of the early years of Henry Gleitman’s academic career, the Swarthmore period. And in the spirit of those early years, I will entitle my talk “Der Urgleit.” In 1946, Henry went to Berkeley to start his graduate studies. He completed them in 1949, and in the same year, joined the faculty of Swarthmore College. Anyone who has spent even a day in Berkeley will appreciate the strength of character exhibited by our hero when he opted to leave that charmed city across the bay from San Francisco after only three years in order to take a teaching job in the Delaware Valley. So when I first met him a year later, in the fall of 1950, he could not have given more than a couple of the 100-odd psychology 1 courses he has taught to date. I had gone to Swarthmore to study at the feet of the then demigods of perception—Hans Wallach and Wolfgang Köhler. And I did just that, but, as it turned out, I actually spent far more time and probably learned vastly more from two other individuals I had never heard of before: one was Dick Neisser, my fellow Master’s student, and the other was Henry Gleitman. Actually, it was much easier to study at the feet of Henry Gleitman than most people: You did not even have to sit on the floor to do that, for Henry in those days was fond of perching on any horizontal surface, particularly a radiator cover. Henry was a phenomenon at Swarthmore in those days. With very few exceptions the Swarthmore faculty were solid, sensible, and serious—as befits the faculty of a college with strong Quaker traditions. So Henry could best be described as a sort of blue jay among brown owls:
24
Jack Nachmias
He was vastly more colorful and louder. He was full of life, vitality, and many talents: He acted, he directed plays, he sang outrageous German translations of American ballads like “Frankie and Johnnie,” he was a gourmet cook, he was an excellent cartoonist. But above all he taught and he taught brilliantly. I don’t believe I actually heard him lecture at that time, but I did sit in on two of his honors learning seminars. They were without a doubt the most intensive, exhilarating, and exhausting intellectual experiences of my life; nothing before at Cornell or since at Cambridge, Harvard, or Penn came close to them. Each session of the seminars started right after dinner, and went on well into the night, lasting a good four or five hours. In those seminars we studied the writings of the great learning theorists of the era—Hull, Tolman, Guthrie, and their disciples. The word studied does not begin to capture the flavor of what we actually did. We read and reread, we analyzed, we dissected, we uncovered contradictions unsuspected by the original authors—or probably by anyone else in the entire galaxy. We designed crucial experiments, some gedanken, some involving complicated, balanced designs, requiring armies of rats, to be run on ingenious runways or alleys or Skinner boxes. This was serious business: We really wanted to get to the bottom of things. There were no shortcuts, no time limits, no hand waving. But it was also a lot of fun, with lots of laughter, and puns, and banter, and food, and drink, and above all, camaraderie. One of the outgrowths of those famous Gleitman learning seminars was a Psychology Review paper, “The S-R Reinforcement Theory of Extinction” by Gleitman, Nachmias, and Neisser. It was to be the first of a series of papers intended to take apart the entire edifice of Hullian learning theory, postulate by postulate. While we were working on the extinction paper, word reached us that galley proofs of Hull’s latest book—A Behavior System—were available at Yale. The senior author of the GNN1 paper, as we called it, dispatched the two junior authors to look through the galleys to make sure that the latest version of Hullian theory was still subject to the criticisms we were making. Neisser and I traveled to New Haven by a mode of transportation alas no longer available to impoverished graduate students, namely, the thumb. When we got there, we discovered to our relief that the new book did not require us to change a line of our critique. Five years after I left Swarthmore, I returned as an instructor, and Henry and I were now faculty colleagues. But he was still very much my teacher. When I organized my first learning seminar, the memory— as well as the extensive reading lists (updated)—of those legendary seminars led by Henry were my constant guides. But the most important thing I learned from Henry in that period was how to lecture—a skill that alas, I seem to have lost in recent years. I learned by coteaching
Der Urgleit
25
psychology 1 with him. Before then, I had never given a single lecture; my only prior teaching experience had been facing bored MIT undergraduates as recitation section leader for Bill McGill’s introductory psychology course. And here I was teamed up with a man who already had a formidable reputation as a lecturer! Ours was not the usual arrangement, where the course is neatly subdivided between the coteachers. True, Henry had his lectures and I had mine, but because of his somewhat unpredictable commitments in New York at the time—he was a “cold warrior” working for Radio Free Europe—I had to be prepared to take over his lectures at a moment’s notice. Fortunately, the course was tightly organized—we had prepared detailed outlines, which were strictly followed. Timing was everything: Each lecture was meant to last precisely one hour, and the goal was to finish the summary statement just as the bell rang. It was this level of organization that made it possible for Henry, arriving late from New York, to walk into the lecture hall, sit on the sidelines for a couple of minutes to make sure he knew exactly what point I had reached, and then take over from me without missing a beat. Henry was not only my teacher and colleague at Swarthmore, but also my stage director. As a graduate student, I had bit parts in Gilbert and Sullivan operettas, and as an instructor I had a small talking part in Molière’s Imaginary Invalid—yes, the faculty put on plays in those days at Swarthmore. Since Henry did not know how to do anything by halves, participation in a Gleitman production was approximately as time consuming as taking an honors seminar or teaching a course. There were numerous and protracted and quite spirited rehearsals; in fact, one rehearsal was so spirited that I managed to sprain my ankle. However, Henry did succeed in getting his odd assortment of actors to put on quite creditable and memorable productions. There is much more that I could recount about those early years, but I hope that what I have said already helps to round out the picture of one of the two remarkable psychologists we are celebrating this weekend.
Chapter 2 The Wordgleits Paul Rozin My first exposure to a Gleitman was less than auspicious. One Henry Gleitman had been selected by the professors to be the chairman of the Penn psychology department, starting in the fall of 1964. I had just arrived in 1963, and was full of enthusiasm for the wonderful, stimulating, rapidly ascending psychology department assembled by its chairman, Robert Bush. I was a Bushophile. Much to my disappointment, one Henry Gleitman was going to replace my fearless leader. I had met Henry at an EPA party in the spring before the takeover, and was doubtful. I soon appreciated that Henry Gleitman was just the man for the job. Bush had built a fine collection of researchers, and it fell to Gleitman to shape them into fine teachers. Henry quickly elevated the teaching of psychology, particularly of introductory psychology, into a major goal. He did this largely by his own example as a superb teacher; respect for teaching rose in the department. I was converted. I soon realized that I had also gained a wonderful colleague and mentor. We even collaborated on a research project on the decay of memories: Could we slow down forgetting in goldfish if we cooled them down in the retention interval? Our results were mixed, but they led to our coauthorship of a review paper on learning and memory in fish. By Henry’s inspiring example, I became a psychology 1 teacher, a vocation that I proudly practice to this day. Meanwhile, my own main line of research, on learning predispositions in rats, waned, and my desire to do something that might translate more directly into improving human welfare waxed. Henry, Dave Williams, and I led an evening seminar at Frank Irwin’s house in which we all, roughly simultaneously, shed our Skinner boxes for work of diverse sorts on humans. For me, this meant an exploration of why the seemingly easy task of learning to read was hard, and the daunting task of learning to speak was rather easy. The difficulty of early reading acquisition took its toll heavily in the inner city. I took a biological approach: We evolved as ear-mouth speakers, and the visual input was a new invention in our species. This line of
28
Paul Rozin
interest took me to my first intellectual contacts with Lila and a collaboration that lasted over five years, and included the design and testing of a new reading curriculum. Lila and I had a swell time learning from each other, quipping and counterquipping, and, unfortunately, editing out each other’s clever lines from our joint publications. I have never had a more stimulating collaboration or collaborator. Meanwhile, back at the stage, Henry was having a major effect on the rest of my life through my children. He directed a play at the local elementary school, where his daughter, Claire, and my daughter, Lillian, were students. It was HMS Pinafore. Lillian, at age 11, played Josephine, and her younger brother Seth, age 9, was recruited to be a member of the crew. It was a great success and instilled a great love of theater in these two youngsters. So much so that Henry continued as theater coach in frequent meetings of children from the cast in later months and years. My wife, Liz, provided the musical coaching and background. And so was born a love for theater in Lillian and Seth. That led to the many pleasures of lead performances by both in junior high school, high school, and at Penn. The mark of Henry remains: Lillian took a master’s degree in drama at the Tisch School at NYU, and now does cabarets and has written a musical review. Seth is now the artistic director of Interact Theater, a professional group in Philadelphia. Henry made it happen. Henry and Lila continued to be among my best friends, sharing the ups and downs of life, and engaging me in interminable arguments about almost any issue. I haven’t collaborated with either of them for over fifteen years. But that doesn’t matter. They are members of my biological family, and members of my academic family, my department. They have shaped my life as a teacher and scholar, and given direction to two of my children. They blur the distinction between friend and family, and are the best of both. The Gleitmans are more than very good at words (and sentences, too). They are wordgleits, marvelous creatures that utter wonderful word sequences; original (for the most part, never uttered before), pithy, trenchant, and delivered with panache. They generate oral and written words, and are both great at both. How many other couples can make that claim? It would be the opposite of an exaggeration (we don’t have an adequate word for this in English) to say that Lila and Henry were “one in a million.” Let’s take just Henry Gleitman (As Henry would no doubt say at this point, I don’t mean “just” in the sense of “minimally” but rather, “only”). If Henry Gleitman was one in a million, that would mean there would be about 5,000 Henry Gleitman quasi-clones in the world. Surely, there is only one Henry Gleitman. Evidence?
The Wordgleits
29
Well, first of all, Henry knows more about the intellectual accomplishments of the Western world than anyone I have ever met. If all the people in the Western world were obliterated save one, Henry would be the best bet for the lone survivor, in terms of saving as much as possible about what has been done. In my world, there isn’t a close second. By the way, the same holds in spades if we imagine the more imaginable (and to some, more appealing) prospect of the destruction of all living psychologists but one. I’ve heard of a new way to measure importance or distinctiveness. It came from the Kennedy administration, and it was: How many phone calls one would have to make to bring down a Latin American dictatorship. A somewhat parallel measure more appropriate to Henry is how many nontrivial characteristics of a person need be listed to establish him or her uniquely as a human being. This is particularly easy for Henry, making him the prime candidate for being 1 in 5,000,000,000. Consider the following: He’s the only person in the world (I think) who: has taught 100 introductory psychology courses has written an introductory psychology text and can whistle a whole movement of a Brahms symphony published papers separately, but never together, with both Lila Gleitman and Paul Rozin is a director and a student of Tolman; and the list goes on. While doing all this, Henry had time to have five car accidents, two children, play golf, direct perhaps twenty-five shows, and spend six hours a day on the phone, coauthor one book of research and one acclaimed scholarly textbook and many fine articles. And he did most of these things while conversing about weighty intellectual issues. Henry is unique among academics. Poor Lila is a “regular” outstanding academic. She is only one among a few great graduate student sponsors, past president of her field’s major professional society, and one of the few great psycholinguists in the world. Henry has chosen a “nonstandard” path. Thinking of the aims of academe as the creation and transmission of knowledge, we all know that we get paid to do the latter (whether or not we do it well or with dedication), and rewarded for doing the former. Henry has set a standard that few if any can equal on the transmission of knowledge. First of all, he has the critical prerequisite more than anyone else—he has the knowledge! Second, he is dedicated to its transmission. Third, he mobilizes an incredible amount of thought and energy to accomplish the transmission. Fourth, he is great
30
Paul Rozin
at the process of transmission, whether one-on-one, one-on-300, or, for the case of the book, Psychology, one-on-1,000,000 or so. Although Henry doesn’t appear on the New York Stock Exchange (we could all enjoy picking the right nonsense syllable for his three letter symbol), he has been one of the best investments in American history. His profit, or his students’ profit, or perhaps, prophet, can be calculated in terms of income versus expense. I conservatively calculate, from his 100 psychology 1 courses alone, using current dollars, that he has taught some 25,000 students (100*250), which, at Ivy league rates ($2,000/course) generates $50,000,000 in tuition income. The costs in Henry’s salary, whatever it is (was) precisely, are well below the more than $1,000,000/year that would balance this income. A return on investment of over 10:1, for sure, and that doesn’t even count the knowledge of psychology, or enthusiasm for it, that Henry has transmitted. And those who know Henry know that teaching psychology 1 is only one part of a monumental teaching effort. How much effort, you, or your friendly economist might ask. My estimate follows: Gleitman lifetime teaching time Psychology 1: 100 . 13 weeks . 3 hours/week = 3900 hours Wednesday night seminars: 30 years . 30/year . 5 hours = 4500 hours Long colloquium questions: 900 colloq. . 3 min. = 2700/60= 45 hours Drama coaching: 25 shows . 200 hours/show = 5,000 hours Advising: indeterminate but substantial Attending and advising on job talks: 10 hours/year . 30 = 300 hours Speaking on behalf of teaching at Penn faculty meetings: (2 hours/year . 30 years) = 60 hours Surely, this total of 13,805 hours is an underestimate, but it gives an idea of the magnitude of the contribution. This polymath, polyglot, polished but not polish (but close!) person can play almost any role, say, for example, Louis XVI. Henry’s true home should be at the head of one of the great courts of early modern Europe (see figure 2.1). Unfortunately, born when he was, Henry must be content to be the king of introductory psychology. And king he is. His image appears not only in his psychology text, but in others (see figure 2.2). Not satisfied with his own and Lila’s eminence in psychology, Henry has introduced into The Book other signs of his lineage: Some are well known, but the fifth edition promises more (see figures 2.3, 2.4).
The Wordgleits
Figure 2.1. Henry Gleitman as courtier.
31
32
Paul Rozin
Figure 2.2. Konrad Lorenz and his ducks. (This is actually a picture of Konrad Lorenz, a dead ringer for Henry Gleitman.)
The Wordgleits
Figure 2.3. The Gleitman family.
33
34
Paul Rozin
Figure 2.4. In an act of modesty, the identity of the child, grandchild Phillip, is largely obscured, but the Gleitman visage somehow comes through.
But the audience for Henry, his colleagues in psychology and college students, is and has been much more limited than it has to be. New versions of Psychology, for children and other deviants, may yet be forthcoming. Henry is not just a superb orator, a highly educated person, and the quintessential psychologist. He is also a master at experimental design. His work with Jonides on Os and zeros is one of many examples. This work, and related work by others, indicates that an object can be selected from an array of categorically different objects in a rapid, parallel search. However, objects sharing a common category are typically scanned serially. This important idea can be used to determine Henry’s own true category. To illustrate, one can ask how long it takes to find Henry in arrays of different types of objects. We have done so (of course, running balanced trials, with Henry’s image located at randomly different positions in different arrays), with arrays of such varied things as garden equipment, fire hydrants, and reptiles. Most telling and most critical are the results from the two arrays presented below. Henry is detected by parallel/rapid scan when embedded in an array of photos of professional basketball players (figure 2.5), but he is hard to detect, and merits a serial search, when embedded in an array of great psychologists (figure 2.6). This and related comparisons lead us to the inevitable conclusion that Henry is a great psychologist.
The Wordgleits
Figure 2.5. Can you pick out Henry Gleitman from the Philadelphia 76ers?
Figure 2.6. Can you pick out Henry Gleitman from the eminent psychologists?
35
36
Paul Rozin
Henry Gleitman belongs, as the great twentieth-century introductory psychology text author, in a chimeric relation to his nineteenth-century predecessor, William James (figure 2.7). Who am I, Paul Rozin, to say and know all this about Henry and Lila Gleitman? My credentials are impressive: 1. I am the only person who has published separately with Henry and Lila, but never together. 2. I was promoted to Associate Professor with tenure under the chairmanship of Henry Gleitman. 3. Lila Gleitman was brought to the Penn psychology department as professor, from the Graduate School of Education at Penn, under my chairmanship. 4. Henry Gleitman is the theater father of two of my children: he was their first teacher and director of theater and instilled a love for theatre in them that became a main theme of their lives. 5. Lila Gleitman studied language development in two of my children, and reported that young Seth, when asked: “Two and two is four: Is there another way to say that?” received the response: “One and three is four?” 6. Lila and Henry, Claire and Ellen, have been quasi family members to me and my family for some thirty years. So, with all this contact, and all this affection (and mutual roasting at various celebratory events), what can I say about two of the most intense, indefatigable, informed, and intelligent people in the world? As John Sabini notes, Henry has a saying: “If it isn’t worth doing, it isn’t worth doing well.” (The reverse also holds.) This reflects the intensity of both Henry and Lila (no halfway commitments here). There’s a parallel to this that I’d like to put forward: “If it isn’t worth feeling strongly, it isn’t worth feeling at all.” These are passionate people: When they watch TV sports, nurture orchids, eat at Sagami Japanese restaurant, relate to friends or to family or to students, there is an intensity, an enthusiasm that is rarely matched. That’s why it’s great to be their student, their friend, and yes, even their orchid, house, or television set. There are more sides to the Gleitmans than one can convey, even in a book. Lila, alone, is comfortable as both figure and ground, and thrives, along with her field, on temporary states of ambiguity, linguistic or visual (figure 2.8). I stand in a mix of awe and affection as I contemplate them and their swath of influence and interactions on this earth.
The Wordgleits
Figure 2.7. Henry Gleitman and William James.
37
38
Paul Rozin
Figure 2.8. Some of the many sides of Lila Gleitman.
Chapter 3 Multiple Mentorship: One Example of Henry Gleitman’s Influence Robert A. Rescorla Henry Gleitman has inspired a large number of undergraduates, graduate students, and colleagues in psychology. But few have been as fortunate as I in interacting with Henry in all of these relationships. Indeed, I suspect that, with one or two exceptions, I have had more different academic relationships with Henry than has anyone else. Consequently, I am pleased to have the opportunity to reminisce about some of the things I have learned from Henry in those relationships over the course of nearly forty years. For me, as for thousands of others, the first interaction with Henry was in introductory psychology. I entered that course in the spring semester of my freshman year, 1958—Swarthmore did not allow firstsemester freshmen to take it—eager to develop my expertise in the likes of Freud and Jung. I was initially appalled at having to attend a large lecture course with almost one hundred other students. My other courses that term ranged in size from four to twenty students. But, like many others, I quickly became completely engaged in the course as Henry brought to it his famous enthusiasm, his ability to highlight the essence of a concept, and his way of making the concepts memorable by a turn of phrase. I recently looked back over my notes from the course, a total of 71 pages for the 38 lectures. It was a full service psych 1, covering topics ranging from the “nervous system” to “emerging patterns in our society.” Of course in those days we had to use someone else’s book; Hilgard was the text and the bias was decidedly on experimental psychology. We dispensed with the nervous system in two lectures. We spent seven lectures on sensation and perception and eleven on learning. The latter did not count the two additional lectures spent on forgetting and the five devoted to the topic of motivation, which emphasized acquired motives. Personality was dealt with in three lectures, as was social psychology. The topic of Freud was not officially a part of the class at all; rather that topic was reserved for three extra evening lectures, which were something of a spectacle for the whole campus.
40
Robert A. Rescorla
Three particular things caught my eye as I reviewed these notes: First, the topic of language was sandwiched into two lectures between sections called “complex learning” and “theoretical issues in complex learning”—hardly a forecast of Henry’s future emphasis. Second, my notes were especially fuzzy about a series of experiments on avoidance learning apparently done at Penn, which I labeled in the margin as “Sullivan’s dogs.” Third, my notes clearly indicate that even then Henry’s grandmother was the font of all wisdom in psychology. My only disappointment in the course was Henry’s evaluation of my term paper. I wrote on Köhler’s Mentality of Apes, believing that I had captured the essence of the book in a few short phrases. Henry’s evaluation was even briefer: “Too discursive.” I was so shocked that anyone could think me wordy that I completely changed my writing style to the point were some would say that it is now telegraphic and cryptic. Of course, if I had known Henry as well then as I do now, I might have responded with a homily about pots and kettles. Thankfully for all of us, this paper has been lost in my archives. This course diverted me from my intended path toward the Methodist ministry. It set me on the way toward becoming an experimental psychologist. One of my offspring has commented that rarely since the days of the Roman coliseum have so many Christians been saved from a terrible fate. Two years later I showed up as an advanced student in Henry’s honors seminar on learning. This seminar was famous for its exciting, spirited discussions, its long meeting hours, and its attendance by other faculty such as Jack Nachmias. We frequently argued Spence and Tolman, Hull and Mowrer from 7:00 in the evening until well past midnight. We read such secondary sources as Hilgard’s Theories of Learning and relevant portions of Osgood’s classic Experimental Psychology. But we also read an incredible number of original papers in original journals by Spence, Lawrence, Bitterman, Kreshevsky, Meehl, Sheffield, Crespi, Tolman, Miller, Mowrer, Solomon (by now I had learned to spell his name), Harlow, Guthrie, Estes, Asch, Melton, Underwood, Postman, Rock, etc. It was a tremendously deep and wide-ranging seminar. It met fourteen times and I took almost four hundred pages of single-spaced, typed notes. Obviously in the two years since I had taken introductory psychology, Henry had learned a great deal more worth taking notes on! It was this course that introduced me to the excitement of careful experimental design and the importance of close logical reasoning. It set me on my choice of specialty within psychology. I still remember writing a paper on Spence’s explanation of transposition and being excited by the appearance of Mowrer’s 1960 book. I also remember being capti-
Multiple Mentorship
41
vated by the experiments that Dick Solomon had done. It was that which led me to go to Penn to work with Dick. Probably more than any other class, this seminar changed my life’s course. That summer, Henry gave me my first taste of real research. I worked with him, Bill Wilson, and another undergraduate (Maggie Herman) on delayed response learning in infant monkeys. Every day I hopped aboard my motor scooter, the standard-issue form of transportation for a Swarthmore undergraduate, and sped over to Bryn Mawr to run the monkeys. That summer, Henry taught me the importance of attending to every detail in the design and execution of experiments. We spent hours arguing over how to run the experiments and build the equipment. He also taught me the importance of understanding the species with which you are working. I developed a deep respect (and fear) of fifteen-pound infant rhesus monkeys who carry the deadly virus B. I still remember the time when one got loose while I was transporting him to the experimental room: he ran down the hallway to the right and I ran down the hallway to the left, in search of someone willing to catch him. I also learned that rats are not just furry miniature humans. My first encounter with rats came that summer when some were delivered to Henry’s lab for studies he was beginning on forgetting. The colony room seemed very hot to me and when I looked at the rats I became truly alarmed about their health—they each seemed to have large tumors projecting from under their tails. I ran to Henry’s office to summon him to see that they got proper medical attention, only to be more than a little embarrassed as he noted that the males of many species were so equipped, though he admitted to the special endowment of rats in this regard. That summer produced my first publication, with Henry as the first author and me as the last: “Massing and within-delay position as factors in delayed response performance.” So now I found myself in a new relationship with Henry, as co-author, and I learned two more lessons: In a serial position of four authors, if you are not first it pays to be last if you want to be noticed; and last authors tend to have little clout in decisions about writing. It was several years before I had my next relationship with Henry, as my first experience being a teaching assistant. In the meantime, I had become a graduate student at Penn and he was recovering from a short sojourn at Cornell, doing penance by accepting the chairmanship at Penn. He had been brought to Penn to renovate the undergraduate program in psychology and to re-instill enthusiasm for teaching in the department, both of which he did, with effects that last to this day. But as a graduate student I had little interest in such matters. I was supported on an NSF graduate fellowship (these were the post-Sputnik times when
42
Robert A. Rescorla
we all lived well) and saw my task as becoming the best researcher I could. I deeply resented having to waste my time learning to teach. I even considered making a formal challenge to the departmental requirement that we all serve as teaching assistants, no matter what our source of support. But in the end Henry prevailed. Just as Henry got me started down the road as a researcher, he got me started down the road as a teacher. It was as a TA in his class that I began to experience the pleasure of teaching others. I also had, of course, the opportunity to observe from a new perspective the master teacher. I like to think that I picked up a few tricks from him—although I never did master the art of holding a class’s flagging attention for fifteen minutes by waving a cigarette in my hand, the filter end pointed away from my mouth, acting as though at any minute I might light that end by mistake. Henry knew how to hook us on teaching. After the class was over, he confided to Michael Lessac and me that the undergraduates had praised us highly, something I suspect he still says to all his TAs. My next relationship with Henry was having him as a member of my thesis committee. Dick Solomon was the main advisor and the newly appointed assistant professor Paul Rozin was the other member. I have always been grateful for a committee that basically left me alone. But Henry taught me two important points in that context: (a) make your dissertation a coherent story that focuses on one primary point and makes it clearly; and (b) do not include extraneous material. The committee in fact insisted that I drop four of the six experiments I had wanted to include in my thesis and instead write up just the two central ones. Since then I have come to realize that in many ways the presentation of one’s work is as important as the work itself, if you want to influence the thinking of others. That was the lesson Henry was trying to teach me then. Of course, we can judge the speed with which I learned this by the fact that my dissertation, on inhibition of delay, remains to this day a widely unread paper. When I got my degree, I took a position at Yale. With Henry at Penn, we communicated only occasionally, although I still recall one occasion when I called him while he was briefly in the hospital. I was struck by two features of that interaction. First, he wanted to talk psychology from his hospital bed; as I recall he had recently written a paper on getting animals to understand the experimenter’s directions, which offered ways for analyzing the structure of an animal’s associative knowledge. He insisted on getting my comments on that paper. Second, he could only spare me a few minutes because so many of his colleagues from the Penn department were in the room visiting. I think that they were all terrified that something would go wrong with Henry and they would end up having to teach psych 1.
Multiple Mentorship
43
It was fifteen years later that I returned to Penn to be a faculty member and Henry’s colleague. I was unprepared for the stimulating intellectual atmosphere that I encountered. At that time it was a place of such a high level of interaction that I had to have an automatic doorcloser installed to get any work done. One of the people who fostered that interaction, of course, was Henry. He was always ready to talk about experiments you were doing. He was always concerned to maintain the sense of community in the department. A few years after I returned to Penn, I had the opportunity for a role reversal with Henry. Just as he had been my chairman when I was a graduate student, I became his chairman in 1986. The first lesson he taught me in that context was that any thoughts of building a department are secondary to the necessity of keeping the faculty you have. No sooner was I chair than, much to my horror, he and Lila received an attractive offer from another institution. I have always considered it one of the main unacknowledged achievements of my chairmanship that they decided to remain at Penn. But Henry also greatly helped me in the daily tasks of being a chair. He taught me about my responsibilities to maintain the quality and atmosphere in the department. He would regularly come into my office to be sure I was aware of problems and issues that could adversely affect the department. He taught me the importance of listening to my colleagues with a new ear. I will always be grateful for the wise council he gave me, some of which helped prevent me from making terrible mistakes. Now I am in yet another relationship with Henry, as his dean. From this vantage point, I can see the commitment that he has to the institution and to its intellectual life. I can see the contribution he makes not only to the psychology department but also to theater arts and to the institution as a whole. I can also see that my own decision to accept the position of dean of the undergraduate college was clearly heavily influenced by Henry’s example as a dedicated teacher and citizen of academia. So I have seen Henry from many viewpoints: He taught me my first psychology course, my first learning course, gave me my first research job, my first publication, supervised me in my first teaching position, was a member of my thesis committee, was a member of the faculty the first (and only) time I was chair and the first (and assuredly only) time I have been dean. From each of these I have learned something about Henry and about myself. But in addition to these general influences on my thinking and career, Henry has had some quite specific influences on my research. I want to describe briefly two experiments in order to make that point. The first experiment is one that Ruth Colwill and I performed several years ago, Colwill and Rescorla (1985). The issue addressed by the
44
Robert A. Rescorla
experiment was the nature of instrumental learning. When, for instance, a rat learns to press a bar for food reward, what is it that the animal learns? This is a complex problem with many different pieces, but one issue that stands out has to do with the role of the reward. One can contrast two classic alternative roles, as indeed Henry pointed out in his learning seminar those many years ago. The alternatives are that the reward serves as a condition of learning or that it serves as part of the content of learning. On the second account, the animal learns that bar pressing produces food, what one might describe as a responseoutcome association. This is the most obvious thing that he might learn, but not the one proposed by many classical theories. Those theories instead saw the food not as part of the content of learning but as a condition of learning. In that view the food serves to stamp in associations between the response and antecedent stimuli. In effect, in that theory, the occurrence of food after a response has been made in the presence of a stimulus stamps in that S-R association. Or as Henry put it in 1960, the reward serves as a catalyst, helping the animal fuse two other events. One way to address that issue is to ask about the impact of changes in the value of the food after the instrumental learning has occurred. If the animal has learned that lever pressing produces food one would then expect learning that food is not valuable would have an adverse effect on lever pressing. On the other hand, if the animal learns an association between some stimulus and the lever press, an association previously certified by food, there is no encoding of the food in the learning; consequently, subsequent changes in the value of the food should have little impact. Colwill and I used this logic to construct a relatively elaborate experiment, the design of which is shown at the top of figure 3.1. We trained rats to make two responses, lever press and chain pull, each earning a different outcome, a small pellet or liquid sucrose. Naturally the animals made both responses. The question was whether they had the response-outcome associations or the outcomes had simply served as catalysts helping them learn associations between the responses and antecedent stimuli. To find out, we divided the animals into two groups and changed the value of one of the outcomes in each group. Our change device was the emetic agent LiCl. We simply gave the rats one particular outcome (either pellet or sucrose) and then administered LiCl so that they felt ill. Such a procedure is well documented to reduce the attractiveness of the food. Then we brought the rats back into the situation and allowed them to make a nonreinforced choice between the lever and chain. If they know what response leads to what outcome and they know that one outcome is unattractive, then they should be more enthusiastic about the other response.
Multiple Mentorship
45
Figure 3.1. Design and results of experiment identifying the presence of associations between responses (R) and outcomes (O). Rats were trained to earn two different outcomes by making two responses. Then one outcome was devalued by pairing with LiCl and the animals were given a choice between the responses. Responding is shown during an extinction test after devaluation. (After Colwill and Rescorla, 1985.)
46
Robert A. Rescorla
The bottom half of figure 3.1 shows that this is exactly what happened. That figure shows responding over the course of the extinction test. The data are separated for response whose outcome was devalued by LiCl and those for which the outcome was left valuable. Although prior to devaluation the responses were made with equal frequency, after devaluation the responses whose outcomes were paired with LiCl were immediately depressed. That finding is important in its own right—it tells us a good bit about what is learned. It means that the outcome plays a role beyond that of a catalyst and is an actual participant in the learning itself. But the result is more important for the role that it can play as a valuable analytic tool, allowing us to measure the state of associations after a wide range of treatments. We have been successfully exploiting that tool in the exploration of instrumental learning and Pavlovian conditioning for the past decade. For instance, using this tool, one can show that these originally learned associations remain intact through such manipulations as extinction produced by reward omission. So this first experiment has proven to be quite important. The second experiment is conceptually quite like the first. In this experiment, rats were trained to choose between a left and right goal box in a T-maze, shown in figure 3.2. The rats got the same food whichever
Figure 3.2. Floor plan of the maze used by Tolman and Gleitman (1949). Rats were trained to enter two distinctively different goal boxes by making left or right choices. Then one goal box was devalued by pairing with shock and the animals were given a choice of turning left or right.
Multiple Mentorship
47
goal box they entered, but the goal boxes were arranged to be distinctively different from each other. So one might say that the left and right responses led to distinctively different outcomes, different goal boxes. One can then ask whether the animal has learned those associations between the response and the goal-box outcome. This can be answered using the same logic as in the previous experiment, by changing the animal’s attitude toward one of the goal boxes. For instance, one might place the animal directly in one goal box and apply electric shock. Then one could bring the animal back into the choice situation and allow him to go left or right. When these treatments were carried out, 22 out of 25 of the rats chose the nonshocked side on the first choice trial. Since they could not see the goal boxes when they were making the choice, this must mean that they knew which outcome followed which response. Clearly the animals knew which goal boxes followed which responses. As it happens, this second experiment was not done in my lab in the 1980s but instead in Tolman’s lab in 1947. In fact, this experiment was Henry’s dissertation, published as Tolman and Gleitman (1949). The methods Henry had at his disposal were more primitive, but the logic is the same as our experiment from almost forty years later. One might legitimately say that our experiment is little more than a refinement of Henry’s thesis. As usual, he saw the issues clearly and identified how to separate them. He lacked only the technology. I am fond of saying that most of the really important ideas are old ideas. For that reason, I routinely advise my students to read the works of certain major contributors who I think had the best perspective on the learning process. For my part, I regularly reread the books of Pavlov, Konorski, and Kohler. They are full of ideas that are well worth stealing. But it is now clear that I also reread Gleitman and have greatly profited by stealing from him. Henry has not only had a broad impact on my attitudes and my career, he has also been responsible for my pursing certain specific ideas. For all of this I am deeply grateful. Acknowledgment The writing of this chapter was supported National Science Foundation grants BNS-88-03514 and IBN94-04676. References Colwill, R. M. and Rescorla, R. A. (1985). Post-conditioning devaluation of a reinforcer affects instrumental responding. Journal of Experimental Psychology: Animal Behavior Processes 11:120–132.
48
Robert A. Rescorla
Hilgard, E. R. (1956). Theories of Learning, second edition. New York: Appleton-CenturyCrofts. Köhler, W. (1925). The Mentality of Apes. Trans. by E. Winter. New York: Harcort, Brace. Osgood, C. E. (1953). Method and Theory in Experimental Psychology. New York: Oxford University Press. Tolman, E. C. and Gleitman, H. (1949). Studies in learning and motivation: I. Equal reinforcements in both end-boxes, followed by shock in one end-box. Journal of Experimental Psychology 39:810–819.
Chapter 4 Some Lessons from Henry and Lila Gleitman John Sabini I came to Penn in 1976. Henry was on leave at the time and Lila was in the School of Education, so neither had anything to do with my being hired—a fact they have found more than one occasion to remind me of over the years. It was not long after I was hired that I came to learn a lot about teaching from Henry and Lila. I wasn’t lucky enough to be a graduate student of theirs, as were many of the other contributors to this volume, but I was mentored by them nonetheless. Let me tell you some of the things I have learned first from Henry, then from Lila. One thing I learned from Henry is that “If a thing isn’t worth doing, it isn’t worth doing well.” (I know; this looks like a typo, but it isn’t.) The typical occasion on which Henry utters this is when someone presents a proposal to carry out a very complex and elegant experiment, a welldesigned experiment, but an experiment designed to answer a question of monumental unimportance. Henry believes that psychology isn’t chess. Our task as psychologists isn’t to produce appealing designs; that is for fabric-makers. Henry reminds us that our job is to figure out how it all works, how the mind and even how the soul works. If an experiment won’t help us do that, then no matter how beautiful its design, the experiment isn’t worth doing. Henry has very strong views about teaching as well as about research. One thing he thinks (and says, vehemently!) is that people pay too much attention to the A+ student. We all love to teach the A+ student, especially if we can win that student over to our preferred kind of research. Indeed, we brag about and feel satisfied about having taught that student and having captured him or her for our team. And Henry has much to be proud of in that line. Most of the contributors to this volume were (graduate) students of Henry’s—won over by him. And, indeed, just at Penn there are two former undergraduates of his on the faculty—one the president of a large, ivy-league university. The other is a former associate dean and member of the National Academy of Science. But these are not the people, Henry would be the first to tell us, our teaching should be aimed at. For one thing, people like this barely need
50
John Sabini
teachers at all; they are perfectly capable of learning on their own. All these folk need is a campus map with the library on it (or, maybe, just a connection to the Internet) and they will find their own way to the truth. No, it is with the C student that we can make a difference. The C student, Henry tells us, is perfectly capable of learning absolutely nothing in a course. Worse than that, our C students can learn nothing and still persuade themselves they have learned something, leaving them worse off than they started. But the C student might also learn a lot in a course if the instructor teaches a course for them. It is with the C student, not the A student, then, that the marginal utility of a good teacher is clearest. Sure, it is more fun to teach the A student. And it is very rewarding to turn that A student’s head, to turn her from the field she was headed for toward psychology. But, as Henry is fond of asking, “So, there is one more brilliant psychologist in the world. But there is one fewer brilliant playwright (or whatever that brilliant psychologist would have become had she not become a psychologist). Is the world really better off with one more brilliant psychologist and one fewer brilliant playwright?” (Anyone who thinks Henry would answer that one yes, has never met him.) These beliefs about teaching are of a piece. Henry has, as one might not expect from a native of Leipzig, an essentially Jeffersonian attitude toward teaching: The point of our educating the C student is to make him or her a better citizen, a person better able to understand the newspaper, in particular, better able to understand the Science Times article on the latest discovery about the brain, or identical twins, or social interaction. Our educational efforts ought to be aimed at making our students psychologically literate. It is not necessary, however, Henry might tell us, to make them psychologically creative. I have learned a lot in many ways about creativity from Henry. Henry—one of the handful of most creative people I have met—does not have an unambivalent attitude toward that aspect of the human psyche. He isn’t actually fully opposed to creativity; he’s just suspicious of it. And most importantly he believes that a scholar’s first obligation, most important obligation, is to master what the past masters have passed on. Henry knows that the truly creative types, the Newtons, Einsteins, Darwins, and Helmholtzs are, of course, what it’s all about (as he might say). But, sadly, most of us can’t be a Newton, Einstein, Darwin, or Helmholtz. Still, as vital to the life of the mind as those folks are, so too were the medieval monks; the ones who copied the classical texts—perhaps without the foggiest understanding of the language they were copying. And, more optimistically, we all can do that. And we can teach our students to do that. The first thing, then, we must teach our students, Henry might tell us, is to love and, because we love, pre-
Some Lessons from Henry and Lila Gleitman
51
serve our intellectual heritage. Once we and they are fluent in Latin, or Greek, or Hullian learning theory, then maybe, just maybe it is time for an original thought. So Henry does appreciate creativity; he thinks it has its place, a lofty place, perhaps the loftiest place. But fostering creativity isn’t the only thing a teacher does, or even the most important thing a teacher does. Henry and I have both written textbooks. I learned a lot about that from Henry too. First, I learned to take my time. It took me ten years to write the first edition of my book. I was fortunate, though; my publisher didn’t grow impatient. After all, they had also published Henry’s text, so they thought my book was a rush job! (Indeed, once they got it, they held on to it for a while to let it age, I suppose, before releasing it.) I learned from Henry that a textbook is a Renaissance cathedral. Like a Renaissance cathedral it is meant for all comers, all who want to enter, so long as their hearts are pure. It is a place where the greatest theologians can come, but also where the humblest peasants can worship; each should find in its great expanse something to sustain his faith. Indeed, a well-written textbook should educate all who read it, the instructor as well as the student. A textbook should also, like a cathedral, be richly and colorfully decorated. It should delight the eye as well as the mind. And Henry has made sure that his text is a delight to the eye. Henry will also remind us, though, that as important as the art is, it is the last thing that goes into the cathedral. It is certainly the art that catches the eye of the tourist, but what the cathedral really is is a structure. For Henry it is the structure of a thing that really matters. As it is with a cathedral, the most important thing about a textbook is its structure. Henry is, of course, a marvelous stylist—despite not being a native speaker (as is his wont to remind us). But the last thing Henry would do is create a book by writing a collection of apercus and then looking for a way to arrange them! No, no, no. “A thousand parts architecture before a single ounce of decoration,” might as well be Henry’s motto. The hundreds of conversations we have had about our books were never about style, and rarely about the exact content—though some were. Most were about the structures of our books. Getting that right; that was the thing. If you have gotten the structure right then the reader will be able to grasp the book and the field. Henry’s text and Henry’s lectures are, of course, thoroughly up-todate. It is in a way a little odd that they are up-to-date because Henry doesn’t actually think it is all that important that they be that way. You see, the kiddies, as Henry refers to psych 1 students (I will have more to say about that in a moment), don’t really need to know what is up-todate. They need to know something else; they need to know the deep
52
John Sabini
questions. The deep questions, Henry would tell us, change only very slowly if at all; the up-to-date answers change every day. By the time the last make-up exam is given in this year’s psych 1 course, Henry might point out, the up-to-date answers are already out of date anyway. Now, Henry has nothing against up-to-date answers (just as he has nothing against creativity). It’s just that he believes we must not be misled that we have succeeded in our educational mission if we have exposed our students to the up-to-date answers our field offers, if we haven’t shown them how these answers relate to the deepest questions our field would like to answer. This attitude of Henry’s might, one could imagine, lead someone into trivializing contemporary research. But this is never what Henry does. A positively delightful experience is to have Henry explain your work to someone else. If you are lucky enough to have this happen to you, you will learn that you all along spoke prose—and so wittily, so elegantly, so profoundly. Henry will, I promise you, explain how your work reaches to the deepest possible issues, how it bears on nothing short of the nature of the human soul. He has an utterly astonishing capacity to do this. And it is one reason (though only one reason) his students have been so successful. As many of the contributors to this volume know well, there is no place this talent is more on display than at the practice job talks that Henry and Lila have for each of their fledgling scholars. Practice job talks are an institution in which the student about to go on the academic job market gives his or her dissertation talk to an audience composed of Henry, Lila, other students, and an assortment of faculty. The faculty members are each assigned a role to play. One of us might be assigned the role of, say, parallel distributedprocessing professor, another the role of cognitive neuroscientist, and so on. In any event, the student delivers his or her job talk to the assembled guests. Then, when the student is done, the fun begins. Henry and Lila now give the talk that the student should have given. Now mind you, the talks the students give are usually very good; these are, after all, very good students working with superb mentors. But the talks are never as good as they could be. Henry needs to fix them. And he fixes them this way: (1) They always need to be simplified. There is always too much stuff in them. You know, the stuff that goes into the unread parts of journal articles. The stuff that shows off what a good, conscientious scientist one is; the stuff that shows one is, as Henry puts it, house-broken. The audience doesn’t really need to hear that. (2) The real point of the talk is often a bit buried and surprisingly often never stated. It needs to be found and exposed. The listener must never lose contact with what the main point here is. And (3) the souffle that is
Some Lessons from Henry and Lila Gleitman
53
being baked needs to be lightened. Humor, clever visuals (often in the form of one of Henry’s cartoons), a bit of performance needs to be whipped into the mix just to pick it up, to make it more graceful. These are the ways that Henry the director fixes the play. I have never seen Henry fix someone’s psychology 1 lecture, but I am sure the process would be identical. This is just as it should be. Henry believes, of course, that there are differences between a lecture to a professional audience and one to a psychology 1 class. He knows that you can‘t give the same talk to both audiences; but he believes, I think, that those differences are much less significant than we usually think. Erving Goffman pointed out that it is common for people in “service industries” (like teaching) to disparage their customers. It is common in universities, for example, to hear faculty complain about students. And authors are almost as likely to grumble about their readers as they are to vilify their long-suffering publishers! But this I have never heard Henry do. I have never heard Henry complain about undergraduates, not once in twenty years. Undergraduates are, Henry would remind us, to be respected, not disparaged. We are not here to judge their motives in taking our courses or to question their intelligence, sincerity, or integrity. It is, after all, our calling to teach them. If we aren’t called to this, we shouldn’t be in universities. It is not good for our students for us to demean them, and it isn’t good for ourselves. If we turn teaching undergraduates into a matter of casting pearls before swine, then what have we turned ourselves into? As Goffman didn’t point out, the tendency on the part of service professionals to denigrate their customers is—at least in the case of undergraduate teaching—as unwise as it is unwarranted; it is twice cursed. Eventually I became the chairman of the psychology department and as such received many gifts from Henry. One thing I received was the department itself. Henry came to Penn in 1964 as chairman. He came here in the wake of dramatic changes in the department wrought by the legendary Robert Bush. Bush moved and shook, but it was Henry who stabilized and solidified the department and its administration. But this paper is about teaching, not administration. What Henry as a teacher did for me as a chairman is this: When I took over our department it was in very high favor with the dean’s office. And we were in the dean’s good graces in large measure because we had a reputation for being a department that took undergraduate teaching seriously. (We had a reputation for doing that long before universities discovered their undergraduates and the apparently obscure fact that undergraduate tuition pays faculty salaries.) Why did we have that reputation? Because of Henry.
54
John Sabini
I am convinced that a necessary condition of a department’s taking undergraduate education seriously is that at least some of its most intellectually respected members be highly visible undergraduate teachers. If you have someone who has the respect of his or her colleagues as an intellectual who also teaches highly visible, introductory courses and who treats doing this as Henry treats it, as a calling, then you have a chance of having a department that treats undergraduates seriously. Now certainly in our department Henry has not been the only person to play this role, but he has been the Olivier of it. Thank you Henry. Now, what have I gotten from Lila? In my career I have had no more stimulating experiences than coteaching various graduate seminars over the years with Lila. That will come as a surprise to none of my coauthors in this volume. Lila, like Henry, is a great teacher. But since Henry is also a director it is a lot easier to figure out how he does it than how Lila does it. All you have to do in Henry’s case is listen in on the instructions that Henry the director gives Henry the performer. It is harder to see what Lila is up to. However, I have watched for so long now that I have a few hints. First off, any intellectual encounter with Lila is suffused with a particular spirit; that spirit is that you and she are now going to come to understand something (whatever it is you are discussing) together. She is on your side. Always. It is always the same. You and Lila are students together and together you will master this question, whatever it is. Lila can pull this off because she has a talent all good teachers have, but she has it in spades. The talent is this: As you and Lila think about some problem, Lila will convince you that this is the very first time she has thought the problem through. Oh, I don’t mean she claims this, and she would no doubt deny it if asked point blank. It’s just that she makes you feel it is true. I swear to you that every time I hear Lila explain why you can’t understand the acquisition of nouns just by saying that people point at a rabbit and say “rabbit” I believe it is the first time she is discussing it with someone. Henry is the only director in the family, but he is not the only performer. How did I wind up coteaching various and sundry seminars with Lila, since after all I’m not a psycholinguist? Well, to answer that I must tell you about the relationship between teaching and the rest of Lila’s life. And I have to start with a rule I had to make up for Lila when I was chair of the department. For the rest of the department there was a rule specifying the minimum number of courses a person was to teach; with Lila I had a rule about the maximum. I needed that rule because the number of courses Lila teaches is directly connected to the number of
Some Lessons from Henry and Lila Gleitman
55
good ideas she has, and she has so damned many good ideas! Here’s how the ideas and the courses are connected. If Lila has thought some topic through really well and thinks she has something really developed to say about it, then it is something on which she wants to do an undergraduate course. As we all know, undergraduates test ideas in ways no other audience can, precisely because they haven’t as yet bought into the prescribed professional way of looking at the world. But, of course, though exposing our ideas to undergraduates is useful to us, our students aren’t here to be useful to us. So it is only our well-developed ideas that they need to hear. (After all, as Henry might say, they could be taking a course on Shakespeare instead!) So as far as I can tell, then, for Lila undergraduate courses are places where she can share her most worked-out good ideas with (very junior) colleagues. There are some of us—Henry, I think, and myself certainly—who could be gotten to teach all sorts of things that we have no actual interest in; we might well view it as an interesting technical problem to figure out how to do it, or something like that. But that is not Lila. Oh, true, Lila could teach things other than the psychology of language. She could, for example, teach a course on bridge bidding—though I think she thinks that would be a course on the psychology of language. (And she could perhaps be induced to give a course on orchids!) But I cannot see Lila teaching something in which she had no interest. For her teaching undergraduates is too deeply connected to the rest of her intellectual life for that. (But neither is Lila of the self-indulgence school of undergraduate education, the school that thinks that undergraduate courses are for the edification of the instructor. Nor does she believe that the chance merely to be in the same room as the great genius instructor is educating enough, that there is no real need to prepare lectures!) Lila’s teaching also goes on, as all of the contributors to this volume know, at the Gleitman weekly research seminar. That is where all (or almost all) of the research programs discussed in this book were launched. Since other people have written about them in this volume, I am sure I needn’t. But perhaps I could say a word about how one comes to be a student of Lila’s. Of course some students come to Penn specifically to work with Lila because of her international reputation. But that is only how some of her students become her students. Others come to work on all kinds of other topics. They might come to work on clinical or social psychology. But, nonetheless, they go for a meeting with Lila because, maybe, they have been sent there by the graduate czar or czarina. And Lila says
56
John Sabini
something like this to them: “So, you want to work on, say, moral thinking. Well that’s certainly a nice topic, but, for me, I can’t imagine how anyone could find anything interesting except language acquisition. I know other people seem to find other things interesting but . . .” Now the poor student asks, “Well, what’s so interesting about language acquisition? I mean we all know how that works. An adult says the word rabbit and points to a rabbit.” At this point the poor student is doomed. Lila will now say, “Yes, yes, of course you are right. It must be just as you say. It couldn’t be any other way. But there is this one little problem that Quine pointed out. . . .” And we know what happens after that—an incurable obsession with language follows. It is important to stress that Lila doesn’t just pull the rug out from under the student; she also shows them in her own work how we can learn about how language is acquired. All praise to Quine for pulling the rug out, but that is enough for a philosopher; for Lila the scientist and for her students, there must be more. The rug must be replaced with the firmer stuff of a research program. So every student who goes to see Lila is at grave risk of becoming a student of language acquisition. The rest of Lila’s teaching is in the form of graduate seminars that are usually cotaught with one or more colleagues. These seminars come to pass for one of two reasons; either Lila decides that there is some topic our graduate students need to know about, and therefore we need to teach them, or Lila has a conversation with you about a topic near to language. If she has had a conversation with you about such a topic, then you are at risk of coteaching a Graduate seminar with her on that topic. (I think this has been especially true with me because having such conversations for Lila is very much like teaching a course.) So now you can, I think, see why there is a maximum number of courses for Lila. Lila is constantly engaging others intellectually—about language (or bridge, or orchids), and it seems the natural thing that this intellectual engagement be shared with students and colleagues. Casual conversations evolve into courses. So what is it that I have learned from Lila? Two things. First, how not to have two careers, an intellectual career and a teaching career. And second, the little linguistics and psycholinguistics I know. Thank you Lila.
Chapter 5 Gleitology: The Birth of a New Science Donald S. Lamm My role here is an unusual one. I appear before an audience of learned practitioners of a discipline that in little over a hundred years has developed an immense wingspread. And I address you fully aware that IQ controversies or no, your intellectual prowess places you way out on the right tail of the bell curve—right tail, let me emphasize, graphically, not ideologically, speaking. Humility in such company should be the order of the day for one whose highest achievement has been to serve as paymaster to those with a gift for giving words to ideas. I should listen, not speak. But on this occasion, something compels me just this once to overrule my innate modesty, to play hedgehog among the foxes. Now is the time to reveal that one big thing I know. For I have been witness to the birth of a new science. It is a science that until this day has had no name, though hundreds of thousands have read its laws, axioms, and postulates. Its domain ranges from the slime to the sublime; its actors, coursing from A to Z, may be as hard to detect with the unaided eye as the amoeba or the zebra, vasodilating in the African sun. While rooted in scientific method, it partakes deeply of philosophy, literature, drama, sculpture, and painting. Until now, I alone in the world have known the new science by its one and only name. Others watched it evolve over sixteen long years; some even made significant contributions to its development. But, today, it is appropriate that the name of this new science be made public. I give you Gleitology . . . and a very short history. The first sightings of Gleitology go back to Greenwich Village, New York, in the mid-1950s. There, for what in Swarthmore circles passed for a Bacchanale, was the founder himself, Henry Gleitman, wreathed not in grape leaves but in smoke. As the mere escort of a Swarthmore graduate, I was entitled to only a brief audience. The founder produced a phoneme that I would later identify as “omm.” I was quickly parceled off to a Gleitman acolyte from Wesleyan University.
58
Donald S. Lamm
Years were to pass, seven to be exact, before I was to hear that phoneme again. But this time I was a man with a mission. After numerous discussions with George A. Miller, the magic number seven-plus-orminus-two guru, and, more significantly, an advisory editor to Norton in psychology, the name Gleitman appeared at the top of a short list of potential authors of an introductory text. “I know Gleitman from some articles,” said Miller, and thereupon produced a Scientific American offprint on “place learning in animals.” The piece read extraordinarily well, but what, I asked Miller, did this have to do with a psychology textbook? Miller instantly replied: “Gleitman’s psych 1 course at Swarthmore is reputed to be the best in the country.” That was enough to send me off on a semi-wild goose chase. Efforts to reach the great Gleitman when I was on an editorial trip to Philadelphia were unavailing. No doubt with his usual prescience, he had decided that the real test is not having the mountain come to Mohammed but having the mountain try to find Mohammed. He had, in fact, taken a year away from Swarthmore and was teaching at Cornell. I trekked up to Ithaca to find him. Our meeting at Cornell is the one Henry Gleitman considers the alpha meeting, describing the Greenwich Village encounter as something from his “primordial past.” It started badly. Within minutes Henry revealed that he had more suitors than Odysseus’s Penelope, publishing suitors that is. And even while disparaging them all for myopia and assorted mental maladies, he set before me the Gleitman equivalent of the MMPI. The test consisted of two fairly thick blue notebooks, the syllabi for the best psych 1 course in the land. “You see,” Gleitman intoned, “there is so much to psychology that it would require a two-volume textbook to encompass the whole field.” I knew that the wrong response would have been to say “impossible,” yet, in publishing terms, that would have been the right response. Should I, perhaps, have invoked the shade of William James? In his long drawn-out struggle to produce the first psych 1 textbook, James had railed intermittently at his publisher, Henry Holt, for not putting at least some of the work in type while he struggled with what he called the “demon.” Most likely fearing that James would insist on any prematurely typeset material being printed as volume one of the work, Holt would have none of it: “I will not set a word until I have it all.” In the 1890s, as in the 1960s, students were not likely to buy, let alone read, a multivolume text. The better part of wisdom was to dodge the question. I knew that a team of psychologists at the University of Michigan that seemed to agree on very little had convinced their publisher to help resolve their
Gleitology: The Birth of a New Science
59
differences by putting out a two-volume edition of their introductory psychology text. (It turned out to be a colossal failure.) I bought time with Professor Gleitman by agreeing that in the best of all possible worlds any textbook he wrote should mirror his course syllabus. That was not the end of the test. “What do I do about Freud?” the professor asked. And then, in what I would discover was trademark Gleitman behavior, he supplied the answer to his own question. Freud was not in his syllabus. Even though Freud was recognized as a genius, doubts were widespread in the profession whether he should be admitted to the pantheon of great psychologists, whether, indeed, he was a psychologist at all. It turned out that Freud figured in the Gleitman course but was, literally, taught under cover of darkness, in two or three evening classes where attendance was wholly voluntary. I decided to chime in with an answer of my own. This time I could respond without hedging. “Freud must be in the text.” (What else could I say as an editor at Norton, the publishers of the standard edition of the Complete Psychological Works of the man we irreverently referred to as “Old Whiskers”?) My reply produced a second Gleitman “Omm,” a clear signal that it was time to leave. He would think things over. I was almost out the door, when Professor Gleitman (still not Henry to me) uttered a sentence that I would hear often again over the next fifteen years, “Suppose the emperor has no clothes?” Then as now, playing dumb comes easily to me. He went on, “Apparently George Miller thinks that I may be the best psych 1 teacher in the country. What happens if I write a textbook that does not live up to my reputation?” I must have mumbled something semi-witty such as “Isn’t it up to the publisher to dress the emperor?” I received my third “omm” and was gone. Somewhere, as yet unretrieved, in the Norton archive in Columbia University is a letter I wrote to Professor Gleitman after the Ithaca meeting. I asked if he would kindly send me a copy of that two-notebook syllabus to share with George Miller. To the best of my recollection I added some airy persiflage about the great loss psychology would suffer if the Gleitman lectures resided only in the collective memories of his students. The notebooks arrived two weeks later, a revealing note attached. It said, “Here they are.” George Miller, who had previously shown signs of indifference toward the textbook component of his editorial advisership, never sent me a thorough critique of the Gleitman syllabus. He did remark in a handwritten note from Oxford University, where he was on leave, that the syllabus covered the waterfront, without a psychological pebble unturned. Was this the sine qua non of a successful text, he wondered. Perhaps it was magic in the classroom that made Gleitman a standout.
60
Donald S. Lamm
A year later Henry Gleitman came to the University of Pennsylvania as department chairman. That appointment was not likely to prompt a decision from him to write the text. Still, on a trip to Philadelphia, I decided to try my luck again, this time performing that single act that is a distinctive trait of publishers: I invited Professor Gleitman to lunch. As it happened, he was pressed for time and chose to turn the tables on me, taking me to the faculty club. (I should point out that the psychology department at the University of Pennsylvania had put out a guide to eating in Philadelphia, rating restaurants with letter grades. It had a very brief preface, to wit, “For purposes of comparison, consider the Faculty Club a ‘D.’” Grade inflation was not unknown even in 1965.) Soon we were on a first-name basis. Over the pièce de résistance, potato chips, we hit on the formula that managed to break Henry’s resistance. I’d like to claim that it sprung entire from my brain. In truth, it was Henry who suggested that, while he had no time to write a textbook, maybe he could do the equivalent for psychology of Richard Feynman’s acclaimed lectures in physics. Now here was something to work with. I remember seizing on the idea and proposing that we bug Henry’s classroom with a recording device. That spurred Henry on; in his best Cecil B. DeMille manner, he raised the stakes to cameras and film. We decided on voice recording as an initial move. Now came a revelation. It would be a risky and costly business to record all eighty or so Gleitman lectures and then prepare transcripts of them. So, while agreeing in principle to the arrangement, I decided to build in a safety factor, selecting a six-foot five-inch, two hundred twenty pound sales representative who, unbeknownst to Henry, would sit in on one of his lectures. This inconspicuous espionage operative came back with a review, in effect, of a Broadway production: “Boffo smash! Gleitman’s definition of behavior nearly moved me to tears. And the high point of the lecture occurred when, after a tactical pause, he intoned, in his German-accented English, ‘Consider the rat.’ Seconds later, he dashed around the stage of the packed lecture hall, imitating a rat navigating a T-maze grid only to get shocked as it neared its goal, the food powder. The whole class broke out in applause.” That report was enough to convince my colleagues on the editorial board. A contract was drawn up on March 29, 1965, committing Norton inter alia to the expenditure of $2,000 to cover the recording and preparation of transcripts of the Gleitman lectures. This time there was no hesitation of Henry’s part. He signed the contract. The experiment began. I am unaware that any undergraduates in the psych 1 course during the 1965 fall semester knew that almost every word of their dynamic lec-
Gleitology: The Birth of a New Science
61
turer was being recorded for posterity. “Almost” must be stressed, since Henry, with his propensity for scampering around the stage, managed occasionally to venture out of microphone range. The transcript of his lecture on insightful behavior, for example, disintegrated into a number of sentence fragments, apparently the result of Henry’s strenuous efforts to portray one of Wolfgang Köhler’s chimpanzees on the island of Tenerife using the eminent Prussian psychologist as a “climb-upon-able” in a desperate attempt to grasp an otherwise unreachable banana. Despite such setbacks, a substantial body of lectures had been transcribed by the end of the semester. I must admit that the transcripts were something of a disappointment. Stunning passages of intellectual discourse, entertaining descriptions of experiments, even the occasional groan-evoking pun could not mask the discontinuities and digressions of extemporaneous speech. Henry acknowledged a new-found empathy for the sometimes garbled syntax in the transcript of a Dwight David Eisenhower press conference. Perhaps, he mused, the lectures were proof merely that good storytelling with a Leipzig accent accounted for his reputation. This would not be the only time a touch of Gleitman despondency clouded the enterprise. We agreed to a pause in the proceedings, since aside from the obvious fact that the transcripts would require considerable doctoring to serve as a textbook, there were other demands on Henry’s time: notably, recruiting psychologists for the University of Pennsylvania to establish his department as one of the best in the nation. The pause lasted for nearly four years while an onion-skin set of the transcripts curled and faded on a radiator in my office. Never was the project abandoned; instead, at a crucial moment between egg rolls and moo shu chicken at the Mayflower Restaurant in Philadelphia, Henry stated there was simply no alternative: He would have to write the textbook from scratch. Oh, he added, perhaps he might steal an occasional glance at the transcripts but they would not constitute much more than elaborated chapter outlines. We agreed that a second experiment should be undertaken, this time a whole chapter, perhaps as much as a summer’s worth of labor, duly compensated for. A few dining rituals were necessary before Henry actually sat down with lined pads and typewriter. At one such meal, he observed that the first chapters of introductory psychology textbooks carried a lot of baggage—clumsy efforts to define psychology, brief synopses of fields within the discipline, listings of careers in and outside academic psychology, truncated histories of the science that inevitably opened, according to Henry, “in the beginning was Wilhelm Wundt.” He had
62
Donald S. Lamm
decided that his would be the first introductory psychology textbook without a first chapter. It was an illusion that I had to accept and even to foster, though murmuring that perhaps in imitation of some mathematics texts there might be a chapter 0. A much happier note was struck when Henry said that he was determined to find overarching themes for his book, themes that would demonstrate to students and colleagues what, in fact, made psychology hang together. That task would not be easy. In 1969, Henry had delivered an address to division 2 of the American Psychological Association in which he spoke frankly about psychology as a discipline with many perspectives: “In teaching the introductory course we sometimes prefer to blur the distinctions and sweep the differences under a rug. But surely this distorts the subject matter. . . . If psychology is a frontier science, let us present the frontier as it is, with its brawls and its barrooms and even its bordellos.” Nonetheless, Henry persevered and delivered early in 1970 the presumed first chapter of his text, “The Biological Bases of Behavior.” It was a mere 170 pages long, tracing the history of investigations into “why,” as he put it, “men and beasts behave as they do.” From Descartes to von Helmholtz to Sherrington and, ultimately, contemporary figures, Henry spanned the field, pausing en route to deliver lively asides on such topics as the copulating behavior of the praying mantis as evidence of disinhibition. Length apart, my colleagues and I were convinced that Henry’s draft chapter contained most of the ingredients for a successful textbook. The academic reviewers confirmed our impression. While all the reviewers commented on the extraordinary length of the chapter, one going so far as to credit Henry with creating “the finest textbook in physiological psychology” he had ever read, it was Professor Allen Parducci of UCLA who put it best: “Whatever you do,” he said in a telephone follow-up to his written report, “keep that man writing.” The instrument for doing just that was a new contract, drawn up on April 28, 1970, with considerably more payment for the author up front—and no reference to tape-recorded lectures. One clause in the contract stood out from the standard boiler plate: “The publisher will utilize no fewer than 20 academic consultants to review the entire manuscript or portions thereof.” Little did I know that the academic reader count would reach 86 over the ten years Henry worked on the first edition. With the time of testing behind us, Henry wrote two chapters on learning in fairly rapid order. The files reveal no serious setbacks to progress, although Henry did grouse in one letter, “Very deep in Pavlov.
Gleitology: The Birth of a New Science
63
What an unpleasant Russian peasant trying to be an echt Deutscher scientist.” Then came a serious bump in the road. One reviewer harshly criticized the amount of history, what he referred to as “psychology yesterday,” in the learning chapters. That remark threw the author and, to some extent his editor, for a loop. For one of the hallmarks of the text was to be its emphasis on the evolution of psychology, an approach that Henry would eventually explain by a metaphor in his preface, “a river’s water is much clearer when it is taken from its spring.” Over many exchanges in phone, letter, and personal visits, a decision was reached to modify, not eliminate, the historical component. At no point was the thematic structure of the book in danger. But if God is in the details, then He ordained that some of the coverage of psychology in its earliest decades would have to give way to recent research. At one point in reorienting the project, Henry wrote, “I’m beginning to understand that the relation between author and publisher has virtually a psychiatric status. I wonder how Shakespeare ever managed to write a single play without a kind-hearted Norton editor to cheer him on (or was there one?).” For all the stürm und drang set loose by the severest critic of the earliest chapters, it was a heady discovery that the pioneers of psychology did not have to be sacrificed en masse, that Henry’s text would still be distinctive in showing that a science not only builds on the work of its founding figures but that it also profits at times from adventures down blind alleys. While it would still be eight years before the manuscript was completed, the hallmarks of Gleitology were in place. Over the long haul through sensation and perception, memory, cognitive thinking, personality, intelligence, psychopathology, and more, Henry’s endeavors were constantly supported by close readings of the developing manuscript from first-rate psychologists. Inevitably, revisions were called for, and, while Henry had yet to penetrate the mysteries of word processing, he had a secret weapon in his writing armory: the stapler. Blocks of copy would be moved about with the adroit manipulation of scissors and stapler. (Henry never was one to sink to the level of a glue pot.) Still, there was more to writing and revising than mechanical aids. Two reviewers took on roles that went far beyond encouragement. Paul Rozin, a University of Pennsylvania colleague, was an agent provocateur almost from day one. During yet another culinary moment, this one at a Philadelphia restaurant called the Frog (and rather overdedicated to its amphibian motif), Professor Rozin unwrapped the key to the long-postponed opening chapter. The notion was to introduce what Henry was to call the many faces of psychology through the subject of
64
Donald S. Lamm
dreams, a subject with a rich research component and also a significant appeal to any reader’s experience. Along with Henry’s former student, John Jonides of the University of Michigan, Professor Rozin also developed a study guide that became a key ancillary to the text. The other reviewer thoroughly dedicated to the enterprise was Professor Lila R. Gleitman. Balancing her own career in linguistics with the raising of two daughters, the planting and pruning of flora in the Gleitman greenhouse, and the feeding of visiting fauna in the Gleitman manse, Lila became a collaborator in the fullest sense in the development of the manuscript. Her name appears in the text as coauthor of the chapter on language; her influence was far more pervasive and, when called for, subversive. For Lila was the one person I could enlist in periodic campaigns to convince Henry that a textbook’s success often turned on decisions on what to leave out. The assistance Henry received from various quarters did not alter one prevailing fact: The book that finally appeared in 1981 was stamped throughout as a solo accomplishment. Henry was the writer as complete impresario, creating unusual, sometimes whimsical schematic drawings for the artist to render in final form, assisting in the selection of all the halftone illustrations, and ultimately suggesting that the cover and dust jacket art feature a sculptural masterpiece that, in his eyes at least, bore a close resemblance to Henry himself: Michelangelo’s David. (By publication date, I myself had come to take on the aspect of a hyperphagic rat depicted in a halftone in the third chapter of the text.) Publication was anything but an anticlimax. The book was greeted with immense acclaim, backed up by well over two hundred adoptions in its first year. Even though it placed far greater cognitive demands on its readers than most competing textbooks, Gleitman’s Psychology found a home not merely in every Ivy League college but also in state universities, liberal arts colleges, and, occasionally, in community colleges. It caused a number of other publishers to commission “high-end” textbooks, thereby helping to achieve the ultimate aim of Gleitology—to raise the standards of instruction in the introductory course. And, despite all the newly bred competition, Gleitman’s text remained the only one to demonstrate that there was cohesion both within psychology and between psychology and other fields of inquiry. A successful textbook not only spawns imitators, it takes on an afterlife. The first edition of Henry Gleitman’s Psychology was followed a year later by a briefer edition entitled Basic Psychology, tracing the same trajectory as William James’s Psychology in which a truncated version (dubbed by the publisher “Jimmy”) appeared shortly after the grand work (or “James”) itself. While acts of compression had troubled Henry
Gleitology: The Birth of a New Science
65
when writing the original text, no such concern hampered the rapid completion of Basic Psychology. And, consistent with the etiology of Psychology, the crucial decisions as to what and where to cut in order to create “Hank” (Norton’s code name for the briefer edition) were made over a meal in a Chinese restaurant. Three more editions of both versions of Gleitman’s Psychology have since appeared. What began as an American textbook has now become a world textbook, with substantial course use in the United Kingdom, Scandinavia, the Netherlands, Germany, Israel, Australia, and in universities elsewhere around the globe. And while each revision entails substantial additions and alterations, the book continues to exhibit three qualities—passion, power, and elegance—extolled in Henry’s dedication of the first edition: To three who taught me: Edward Chace Tolman, to cherish intellectual passion Hans Wallach, to recognize intellectual power Lila Ruth Gleitman, to admire intellectual elegance As always with Henry Gleitman, no one could have put it any better.
Chapter 6 Children’s Categorization of Objects: The Relevance of Behavior, Surface Appearance, and Insides Elizabeth F. Shipley To say I am grateful to the Gleitmans for a major part of my education is an understatement. Henry, as a new but not novice teacher at Swarthmore, revealed challenges, paradoxes, and the sheer fun of psychology to this former physics major, as he still does today—especially with those questions that begin “I’m puzzled.” Lila, as a new mother, revealed the complexities of language and first language learning, as well as gaps in received wisdom, to me as a mother of preschoolers, as she still does today—often with devastating humor. I thank them both. As, guided by Lila, I looked at young children learning to talk, questions from my undergraduate days resurfaced: Why do we partition the entities in the world as we do? Why are rabbits special in some way but not things smaller than a breadbox? What are our beliefs about the relations among classes of things? What does it mean that Floppsy is both a rabbit and an animal? More generally, I began to wonder how children’s psychological categories of physical objects develop and what determines for a child which classes of objects are categories and which are not. I have found possible answers to these questions in Nelson Goodman’s (1955/1983) insights on induction, answers which I will sketch here. See Shipley (1993) for a more extensive discussion. First, what are psychological categories of physical objects? They are classes of objects characterized by three psychological properties: (i) Category labels are used for object identification, for instance, as an answer to the question “What’s that?” (see, e.g., Anglin 1977; Brown 1958; Shipley, Kuhn, and Madden 1983). (ii) Categories act as the range of inductive inferences. When we are told a rabbit has a secum we are more likely to extend the property of secum-endowment to other rabbits than to other things smaller than a breadbox (see, e.g., Carey 1985; Gelman 1988; Holland, Holyoak, Nisbett, and Thagard 1986). (iii) Category members seem to have a deep resemblance, they belong together and form what Murphy and Medin (1985) called a coherent class, a class they characterize as “sensible,” “comprehensible,” “informative, useful, and efficient” (p. 289). I will use the term category to refer to classes of physical objects with these properties.
70
Elizabeth F. Shipley
A developmental account of categories should provide answers to at least three interrelated questions: A. Why do members of a category act as the range of inductive inferences? B. What gives coherence to a set of category members? C. What determines whether or not a child considers an object to be a member of a specific category? In this chapter I will outline answers to these questions based upon psychological essentialism (see, e.g., Gelman, Coley, and Gottfried 1994; Gelman and Medin 1993; Medin and Ortony 1989), then consider answers to the first two questions derived from Goodman’s (1983) concept of entrenchment, and finally report two experiments relevant to entrenchment and the category membership question. Psychological Essentialism Current popular answers to these questions invoke psychological essentialism. Psychological essentialism must be distinguished from the philosophical position that categories have real essences. Psychological essentialism involves a belief in deep, perhaps unknown, common properties possessed by all members of a category. These properties constitute the essence of the category. For example, a belief that an animal’s kind, whether it is a tiger or a lion, is determined by its DNA is a psychological essentialist belief. Psychological essentialism also includes the belief that the essence of category members causally accounts for their more obvious properties. The appearance of a tiger and its ability to learn might be attributed to its DNA. Belief in a common essence is said to underlie inductive inferences over members of a category and the attribution of coherence to the set of category members (see, e.g., Gelman, Coley, and Gottfried 1994; Gelman and Medin 1993; Medin and Ortony 1989). Induction from a sample to a category is supported by the inherent similarity, the essence, among members of the category. Coherence reflects the common underlying essence. Coherence is further enhanced by beliefs in causal relations between the essence and more obvious properties of category members. Finally, the belief that an entity possesses the essence of a specific category accounts for a person’s assignment of the entity to that category. What kinds of things have psychological essences? Do only biological kinds, such as dogs and roses, the most popular candidates for essencebased categories, have essences (Atran 1995)? Do natural kinds other than biological kinds, such as gold and water, have essences (see discussion in Malt 1994)? Can artifacts have essences (see summary in
Children’s Categorization of Objects
71
Malt and Johnson 1992; Bloom 1996)? If induction is necessarily mediated by essences, then the fact that preschool children make inductive inferences as readily over artifact kinds as over biological kinds (Gelman 1988) suggests artifacts have essences—at least for young children. Carey (1995) maintains that for the young child everything with a label has an essence because essentialism “derives from the logical work done by nouns” (p. 276). She claims “the child has a default assumption that count nouns are substance sortals” (pp. 276–277) and every substance sortal has identity criteria. Hence, for Carey, it follows that for every count noun known to a child the child has identity criteria that specify the properties that must be unchanged for an entity to maintain its identity as an instance of a particular substance sortal. For Carey, these properties constitute the essence of things named by the count noun for that child. Note that Carey’s definition of essentialism does not specify a role for a causal relation between deep properties and surface properties, although such could be included among identity criteria. Work by Smith, Jones, and Landau (e.g., 1996) on the importance of shape for physical object identification is relevant to Carey’s (1995) position on psychological essentialism. The Smith et al. studies offer strong evidence that for preschool children label assignment is tied to shape for novel nouns and some novel objects. If all count nouns have essences then the essence of at least some novel objects with novel labels consists of the shape of the object, a conclusion at odds with the spirit of most writings on psychological essentialism. However, the Smith et al. findings are consistent with a less extreme version of essentialism provided the child is granted a bias to use shape for label assignment when no other information is available. Possibly the shape bias could prompt the child to look for other similarities among entities with the same label and hence be the key to the discovery of the essence underlying a category (Landau 1994). In brief, some versions of psychological essentialism, those that give learning an important role in the content of the essence, are consistent with known facts on children’s novel label learning. Work by Keil (1989; Keil and Batterman 1984) with young children on discovery and transformation of properties of physical objects indicates that if essence determines category assignment, then it changes as the child grows older. For instance, children were told about a raccoon that had a smelly sac surgically implanted and its appearance altered to resemble a skunk. Then children were shown pictures of a skunk and a raccoon and asked what the altered animal was, a raccoon or a skunk. Kindergarten children believed it was now a skunk, 4th-graders believed it was still a raccoon, and 2nd-graders were undecided. As the
72
Elizabeth F. Shipley
child grows older the essence apparently changes from characteristic properties to something inherent in the animal. If, for the child, everything labeled with a count noun has an essence, as Carey (1995) proposes, then the theoretical issues of interest are (a) what is the essence of members of a specific category, that is, what properties are tied to identity change and identity persistence, and (b) how and why does the essence change? However, if not all referents of count nouns have an essence then an additional theoretical issue is (c) the characterization of those categories whose members have a psychological essence. Finally, (d) there is the theoretical issue of how a category acquires an essence. For adults, empirical work probing the nature of psychological essentialism has cast doubts on the relevance of the essentialist position. For instance, Malt and Johnson (1992) found that both function and more superficial properties influenced category assignment of artifacts. For the natural kind water, Malt (1994) asked college students to judge different fluids on four dimensions: percent H2O, typicality as an instance of water, similarity to other fluids, and acceptability as a type of water. She reports that in addition to the expected essence, chemical composition (H2O), the source, location, and function of a fluid determine the classification of the fluid as water. Kalish (1995) found adults unwilling to grant that biological considerations could assign entities to a kind of animal category absolutely. For instance, sixty percent of his college undergraduate subjects thought it possible that an animal could not be proved to be of any specific kind. Such a finding is inconsistent with the essentialist belief that each individual animal has an essence that is unique to its kind. Of course, the weakness of a psychological essentialist position for adults does not dismiss its possible usefulness in conceptualizing children’s understanding of categories. In brief, psychological essentialism appears to account for the three psychological properties of categories for children. The possession of a common essence could account for children’s willingness to make inductive inferences over all members of a category based upon information about a small sample. Further, the possession of a common essence could account for the coherence of the set of members of the same category and the assignment of an entity to a category. For most psychological essentialist positions it is necessary to flesh out the notion of essentialism with the naive theories, ideas, and beliefs that could constitute the basis of the essence in order to account for specific identification judgments. That is, for a child to decide if Billy, an animal, is a sheep or a goat the child must marshal her knowledge of sheep and goats and compare that knowledge with her knowledge of Billy. Such knowledge of a category is considered central to the essence by most of those ad-
Children’s Categorization of Objects
73
vancing an essentialist account of children’s category development (see, e.g., Gelman et al. 1994; Gelman and Medin 1993; Gelman and Wellman 1991; Keil 1989). Entrenchment Another way of accounting for a person’s readiness to make inductive inferences about a category and belief in the coherence of category members is to focus on the history of properties attributed to members of a category by that person. That is, one might look to the history of inductive inferences about the category. Nelson Goodman (1955/1983), in his analysis of the inductive inferences people actually make, introduced a concept he called “entrenchment.” Classes of objects can have entrenchment and properties of objects can have entrenchment. The greater the entrenchment of a class, the more readily it acts as the range of an inductive inference. A newly learned property of a brown dog will more readily be attributed to other dogs than to other brown animals because the class of dogs has greater entrenchment than the class of brown animals. The greater the entrenchment of a property, the more readily it is extended from a sample to a class exemplified by the sample. Observing a marmot, a kind of animal, sitting on a newspaper and eating grass, we are more willing to attribute eating grass than sitting on newspapers to other marmots because the property of eating grass is better entrenched than the property of sitting on newspapers. For Goodman, classes and properties gain entrenchment from their involvement in inductive inferences. Making the inference Dogs bark enhances the entrenchment of the class of dogs and the property of barking; making the inference Sheep eat grass enhances the entrenchment of the class of sheep and the property of eating grass. A person’s greater readiness to make inferences about dogs than about brown animals can be attributed to the greater number of inductive inferences that person has made about dogs compared to brown animals. Similarly with the properties eats grass and sits on newspapers, we have made more inferences about eating grass than about sitting on newspapers. Goodman proposed that the relative entrenchment of one class or one property compared to another depends upon the number of times an actual projection has been made about a specific class or a specific property. For example, the relative entrenchment of the category dog, compared to the class brown animal, depends upon the number of times a person has made the inductive inference Dogs bark, plus the number of times he has made the inference Dogs are loyal, plus the numbers of times he has made each of any other inferences about dogs, compared
74
Elizabeth F. Shipley
to the number of inferences, again in the token sense, he has made about brown animals. Goodman’s use of frequency of responses (actual projections) is consistent with the psychology of the 1950s in which response frequency was the premier parameter. However, the concept of “more inferences” can be interpreted in several ways. Given the difficulty of determining when someone actually makes an inductive inference, I have suggested that the number of actual projections in the type sense provides a more useful measure of relative entrenchment (Shipley 1993). That is, the number of different properties a person has attributed to dogs is the primary determiner of the relative entrenchment of the category dog for that person. This proposal leaves the contribution of the number of tokens of each projection unspecified. By this assumption, the greater the number of different properties attributed to members of a class, the greater the entrenchment of the class, and hence the more readily the class serves as the range of an inductive inference. This assumption makes a wellentrenched category correspond to what Markman (1989) has called a richly structured category. If the projections of properties over a class of individuals were the only source of entrenchment of the class, then all entrenched classes would be familiar classes and inductive inferences would not be made over unfamiliar classes. However, even young children readily make inductive inferences over some types of unfamiliar classes, such as a novel kind of animal (Davidson and Gelman 1990). Goodman’s proposal on the role of entrenchment in induction can account for such phenomena via inherited entrenchment. Consider general inferences such as Each kind of animal has a characteristic diet. This kind of general inference is called by Goodman an “overhypothesis.” It is an inductive inference over such hypotheses as Dogs eat meat, Sheep eat grass, and Horses eat hay. Making such a general inference leads to the entrenchment of what might be called a “parent kind” kind of animal and a “parent property” characteristic diet. A parent kind has kinds as individual members. For instance, the parent kind kind of animal has as one member the class of dogs and as another member the class of horses. A parent property has as individual instances specific types of the property; thus the parent property characteristic diet has as individual instances individual diets such as eats meat and eats hay. The entrenchment of a parent class is inherited by its members; the entrenchment of kind of animal is inherited by each kind of animal, both by familiar kinds such as the class of dogs as well as by completely unfamiliar kinds such as the class of marmots. The entrenchment of a parent property is inherited by each of its instances; the entrenchment of characteristic diet is inherited by eats meat and eats hay,
Children’s Categorization of Objects
75
as well as by such unfamiliar diets as eats bamboo. The inheritance of entrenchment means that a novel kind of animal, such as the marmot, is an entrenched class for a person who has projected over-hypotheses about kinds of animals but knows nothing about marmots except that they are a kind of animal. Similarly, novel instances of properties of familiar types, such as an unfamiliar diet, become entrenched properties via inheritance. I have suggested (Shipley 1993) that the classes of physical objects a person considers categories are well-entrenched classes for that person in Goodman’s sense of entrenchment. Our well-documented willingness to make inductive inferences over categories comes from their entrenchment. Our belief that members of a category form a coherent class comes from our readiness to make inductive inferences over the class because of its entrenchment. Our use of a category label to identify an object carries with it the properties that the object possess by virtue of its category membership; the use of a category label for identification is informative of past projections (Brown 1958). How can entrenchment account for the child’s acquisition of category knowledge? First, it must be emphasized that the entrenchment position presupposes that children believe what they are told by others about the various labeled classes in their world. So, for instance, children told Dogs bite people will project the property of biting onto all dogs, even if they have never seen a dog bite a person. Thus the pronouncements of authorities enhance the entrenchment of the mentioned classes and properties. In addition, the entrenchment account presupposes that the child is biased to apply a name given to an object to other objects. It is necessary for a class to be labeled in some way by a person in order to serve as the range of an inductive inference for that person and to thereby acquire entrenchment. The shape bias literature (e.g., Landau 1994; Smith et al. 1996) attests to the existence of this bias for completely novel objects (also Markman 1989). How might a child acquire an entrenched category? Let us imagine a child encounters an ambiguous-appearing entity such as a sea cucumber (an irregular cylinder that resembles animal waste or an industrial by-product) in its natural habitat and is told its label but nothing more. Even if the child is unable to identity it as some kind of plant, animal, substance, or artifact, the child should be willing to apply that label to similar appearing entities (Landau 1994), but be reluctant to attribute properties of this individual sea cucumber to other things called sea cucumbers because sea cucumbers have no entrenchment; for example, the child should be unwilling to conclude that sea cucumbers are typically found underwater on the basis of one being observed underwater.
76
Elizabeth F. Shipley
Now suppose the child is told a few properties of sea cucumbers, Sea cucumbers are heavy, have potassium inside, are used to make soup, properties the child will project over all sea cucumbers. As a result the class sea cucumber will gain entrenchment and the child should be more willing to make inductive inferences about sea cucumbers, for example, more willing to conclude that sea cucumbers are typically found underwater. If the child decides sea cucumbers are a kind of animal, whether on the basis of authority or observation, then the class sea cucumber will inherit entrenchment from the parent class kind of animal and the child’s willingness to make inductive inferences about sea cucumbers will increase further. Her willingness to make the specific inference Sea cucumbers are found underwater will be even greater if she has projected over-hypotheses about characteristic habitat, such as Each kind of animal lives in a special place. Thus the child’s history of inductive inferences can account for the transformation of a class of labeled objects into a category capable of supporting inductive inferences. The projection of over-hypotheses can account for a greater readiness to support some inductive inferences rather than others. Finally, the experienced coherence of a category can be explained as a readiness to make inductive inferences over the category. How does a child decide that a novel object belongs in a specific category? For the entrenchment account, an additional assumption is necessary: The properties previously projected over the category are used to assign a novel entity possessing those properties to the category. Thus a novel object would be identified as a sea cucumber if it has the properties previously attributed to sea cucumbers, that is, is heavy, has potassium inside, and is used to make soup. Keil’s findings that children’s identification judgments change with age can be accounted for by changes in the relative entrenchment of different properties of the test objects. The hypothesis that the possession of entrenched properties previously projected over a category determines the identity of an ambiguous object as a member of the category is tested in Study 1. Entrenched Properties and Identification: Study 1 In the section above I attempt to argue that category entrenchment is a plausible alternative to psychological essentialism when induction and category coherence are considered. However, the essentialist position has strong appeal when the child’s task is to decide the identity, the category membership, of an object. From an entrenchment perspective I propose that those properties that have contributed to the entrenchment of a category for a child are the most important properties in the
Children’s Categorization of Objects
77
child’s identification of an object as a member of the category. This study tests this hypothesis with animal stimuli. First, it is necessary to determine the properties young children attribute to different kinds of animals in order to select properties that are likely to be entrenched for young children. We can learn something of a child’s beliefs about a category by asking. In a preliminary study 12 three-year-olds and 12 five-year-olds were asked to explain to a puppet, who claimed to be from another planet, certain “earth words” such as “dog,” “monkey,” and “animal.” The children’s responses primarily consisted of mention of properties. These properties were scored as either surface appearance properties, apparent in a guide book picture of a category member (fur, a tail), or behavioral properties, not apparent in every observation of a category member (barks, eats meat), hence necessarily projected properties. The latter are necessarily projected, because not every dog encountered by a child has been observed to bark yet the children’s reports are in the form of generic statements “Dogs bark,” not statements limited to their experience such as “I’ve heard some dogs bark” or “Sometimes some dogs bark.” Behavioral properties predominated in the children’s responses, even though we were liberal in our counts of surface appearance properties. It should be noted that even three-year-olds can readily supply surface appearance properties when explicitly asked, for example, “What do dogs have?” Of the properties mentioned by three-year-olds, 83% were behavioral, and for five-year-olds, 66% were behavioral. This finding suggests that young children regard behavioral properties as more important than appearance in determining the nature of an animal. It also Table 6.1 Properties mentioned by child informants. Informants 3-year-olds
5-year-olds
Properties
#Ss
Ave.
#Ss
Ave.
Diet Habitat Locomotion Sound
8 8 10 8
2.2 1.1 3.4 1.9
11 10 12 9
2.7 1.5 3.8 1.3
Note. #Ss is number of subjects out of 12 who mentioned a specific type of property. Ave. is the average number of animals who were said to possess the type of property, given that a property was mentioned.
78
Elizabeth F. Shipley
indicates that behavior is more important than appearance in determining entrenchment, providing we measure entrenchment by the number of different properties projected over a category. While the reports of behavioral properties are necessarily inductive inferences, reports of appearance may, or may not, be inductive inferences. They may be mere summaries of past observations; when we look at a dog we see fur. Certain types of behavioral properties were attributed to animals of different kinds by the majority of children (table 6.1). For the most frequently mentioned types of properties, diet and locomotion, if a child mentioned that type of property for one kind of animal, he or she mentioned it for other kinds of animals as well. In addition to diet and locomotion, habitat and sound were often mentioned. For instance, three-year-olds reported that dogs bark and cats meow. Pilot work indicates young children know these two properties are properties of the same type. Told “Horses neigh,” “Lions roar,” “Dogs bark,” and then prompted with “And cats?” the children respond “Cats meow.” Such a pattern of responses suggests preschool children have organized knowledge of kinds of animals and their properties that can be considered over-hypotheses. It should be noted that properties such as these, along with perceptual properties, were included by Keil (1989) in his “discovery studies” as characteristic features and could have guided the younger children’s identity judgments. Using these types of properties, as well as behaviors unique to a specific kind of animal, for example, wags his tail when happy, we selected three behavioral and three appearance properties for each of 12 kinds of animals and formed 6 pairs of animals. Pretesting with four-year-olds established that each triad of properties, behavioral or appearance, identified the intended animal of the pair. A puppet who went on a trip and encountered various individual animals formed the context for the six specific trials. On each trial the child was told of an animal that looked like one kind of animal but acted like another kind of animal and was asked to identify the animal: “The puppet saw an animal that acts like a tiger. It eats meat like a tiger, and it roars like a tiger, and it climbs trees like a tiger. But it looks like a camel. It has humps on its back like a camel, and long eyelashes like a camel, and a long neck like a camel. Remember, it acts like a tiger but it looks like a camel. What do you think it is? Is it a tiger or a camel?” Each child considered six ambiguous animals and judged the identity of each one. In addition to camel-tiger, the pairs cat-dog, cow-pig, duckmonkey, chicken-elephant, and horse-snake were used with all subjects. Over subjects, the kind of animal mentioned first (e.g., tiger or camel) and the kind of animal whose appearance was described (e.g., tiger or camel) were counterbalanced. Within subjects, the order of the six pairs
Children’s Categorization of Objects
79
was randomized. For each subject behavior was mentioned first for three pairs, and appearance was mentioned first for the other pairs. We ran three-year-olds and four-year-olds in two different conditions: in one condition photographs of the two alternatives were present on each trial, and in the other no pictures were present. The children selected consistently on the basis of behavior, not appearance: 65% of the choices were based upon behavior. An ANOVA on the number of behavior choices with age, sex, and picture conditions as factors yielded no significant factors and no significant interactions. No child of the fifty-six who participated selected more frequently on the basis of appearance than behavior. Seventy percent selected more frequently on the basis of behavior. Within each of the four groups, three-year-olds and four-year-olds with and without pictures, a sign test on the number of subjects making a majority of behavior choices was significant at the 0.01 level or better. Behavior-based choices predominated when only first trials were examined (64%) and for each of the six pairs considered separately. We also ran 12 four-year-olds with different pairings of the stimulus animals based upon the frequency of choice in the original conditions. (The least frequently selected animals, duck and chicken, were paired, as were the most frequently selected animals, etc.) For five of these six new pairs the majority of choices were based upon behavior. In brief, the identification of ambiguous animals on the basis of behavior rather than appearance is a robust phenomenon. In sum, Study 1 shows that for preschoolers behavioral properties that can be considered entrenched are sufficient for deciding the identity of an animal when compared to appearance. The role of appearance versus other kinds of properties in the determination of the identity of an object has been studied more with artifacts than with natural kinds (see, e.g., Gentner 1978; Keil 1989; Keil and Batterman 1984; Kemler-Nelson 1995; Smith et al. 1996). With artifacts, the question has been which determines identity: appearance or function? The results have not been consistent and two recent carefully controlled studies with preschool children report apparently contradictory results (Kemler-Nelson 1995; Smith et al. 1996). Smith et al. point out a difference between the two studies that suggests a reconciliation of the findings by consideration of entrenchment. When the novel object’s function is a property that has been attributed to other objects and hence could have entrenchment (draws lines, makes sounds), function determines identity (Kemler-Nelson 1995); when the function of the novel object has no history that could lead to entrenchment (a part’s movement forms an arbitrary pattern), appearance determines identity (Smith et al. 1996). Thus work on children’s identification of artifacts
80
Elizabeth F. Shipley
indicates that functional properties determine identity when these are entrenched properties, but not otherwise. This result suggests that with animal stimuli behavioral properties will be more likely to determine identity the better entrenched they are. Preliminary results from a study with unfamiliar animals show entrenched properties are more effective than nonentrenched properties, consistent with the artifact data. Projected properties or essence: Study 2 We have shown that young children use behavior rather than appearance to determine the identity of an individual animal when the two types of information conflict. At least two interpretations of this finding are possible. The first is that children’s projected beliefs about category members are sufficient for identification. The second possibility is that the behavioral properties are taken as evidence by the child of the underlying essence of the animal, and hence essence determines identification. To evaluate this second interpretation of the data, that a commitment to essentialism underlies the children’s choices, we went on to investigate the effect of the experimenter’s identification of the animal’s insides upon the child’s identification of the animal. The importance of the insides of an animal in the determination of identity from the perspective of psychological essentialism has been argued by Gelman and Wellman (1991) who found that the hypothetical removal of the insides of an animal changed the child’s identification. We first examined the effects of insides upon identification when information on insides conflicted with behavior. Twelve four-year-olds participated in this condition. The same context for the task, a puppet on a trip, and the same pairs of animals were used as in the original study. Each question specified that the animal had the insides of one kind of animal and listed three internal constituents of the animal kind. The specific internal parts were pretested to ensure that four-year-olds believe them to be inside rather than on the outside of animals, and that they believe them to be inside the specific kind of animal they were attributed to. As in Study 1, the child was asked to judge the identity of an animal reported by a traveling puppet: “The puppet saw an animal that acts like a tiger. It eats meat like a tiger, and roars like a tiger, and climbs trees like a tiger. But it has the insides of a camel. It has the brain of a camel, and the lungs of a camel, and the bones of a camel. Remember, it acts
Children’s Categorization of Objects
81
like a tiger but has the insides of a camel. What do you think it is? Is it a tiger or a camel?” Again, which animal of a pair had behavioral properties mentioned and which animal was mentioned first were both counterbalanced over subjects. For each subject, behavior was mentioned first on three trials, and insides were mentioned first on the other three trials. Behavior was more important than the insides of an animal in the determination of identity. Sixty-five percent of the children’s choices were based upon behavior, not internal parts (difference from chance p<0.02). This suggests that the child’s use of behavior to determine identity is not due to a belief that behavior reflects the interior nature of the animal, which in turn determines identity. Thus this pattern of responses is inconsistent with the child’s being an essentialist who believes the essence resides in the insides of animals. In addition, we examined the child’s identifications when appearance and insides were in conflict. Twelve different children participated. We found that appearance was also more effective than insides in the determination of identity. Seventy-two percent of the choices were based upon appearance (difference from chance p<0.002). This finding is apparently contrary to the results of Gelman and Wellman (1991) and, again, contrary to the position that children locate the essence in the insides of an animal. The failure to identify on the basis of insides rather than appearance has several possible interpretations. One is that children consider all bones, all blood, etc., as equivalent, so that the blood of a camel and the blood of a tiger are the same. Being told that an animal looks like a tiger but has the blood of a camel provides no reason to identify the animal as a camel. A second possible reason for a lack of effect of insides is that the preschool child is generally ignorant of insides. Children do have some knowledge of insides. However, most studies of preschool children’s notions about insides indicate children believe different ontological kinds, such as animals (e.g., a sheep) and artifacts (e.g., a machine), have different insides. However, there is little data suggesting children believe instances of different categories of the same ontological kind, such as two animals—a camel and a tiger—have different insides. For instance, R. Gelman (1990) asked preschool children what was inside various animates, such as a mouse and a person, and inanimates, such as a rock and a doll. Similar answers—blood, bones, various organs—were given to each of the animates. Inanimates of different kinds also elicited similar answers, answers that were very different from the answers concerning animates. Simons and Keil (1995) also provide data suggesting young children believe different ontological
82
Elizabeth F. Shipley
kinds, such as animals and artifacts, have different insides, although the children do not know what is inside the different ontological kinds. Gelman and Wellman (1991) focused on specific kinds in their studies of insides and essences. Preschool children considered individual entities before and after their insides were removed and were asked if the entity with insides removed was the same kind and had the same properties as the preoperated entity, for example, “Was it still a dog?” and “Could it still bark?” The children answered in the negative. However, this study may have revealed sensitivity to ontological distinctions rather than kind distinctions. A dog with its insides removed seems more akin to a collapsed balloon or an empty costume, a two-dimensional object, than to a three-dimensional living animal. From this perspective, the indifference of our subjects to insides does not conflict with the Gelman and Wellman data although it casts doubts on their conclusion that the insides of an animal contain its essence. Discussion and Summary I began with the commonly recognized characteristics of psychological categories of physical objects: serving as the range of inductive inferences, possessing psychological coherence, and having a label that is used for identification. I sketched how the psychological essentialists account for these aspects of categories. Then I indicated that Goodman’s (1983) concept of entrenchment provides an alternative account of induction and coherence. To explain what determines children’s identification judgments I suggested that they are based upon the best entrenched properties attributed to members of a category. Study 1 supports this hypothesis and Study 2 discounts a possible essentialist explanation of the Study 1 results. Given that the entrenchment position and the essentialist position seem comparable in their ability to account for these three characteristics of categories, what are the meaningful differences between the two positions? An essentialist account, which physically locates the essence, for instance, in the insides of biological kinds (Gelman and Wellman 1991), differs from an entrenchment account, which focuses on the history of inductions, which may or may not concern insides. As Study 2 indicates, children apparently do not locate the essence of different kinds of animals in their insides. Of course, psychological essentialists could claim either that the insides are not the locus of the essence of an animal or that it was a mistake to assume that children assign any location to the essence of an animal, and hence Study 2 is not a test of the general essentialist position.
Children’s Categorization of Objects
83
An essentialist account, which relies upon an innate essence to unify a category (Atran 1995), is different from an entrenchment account, which relies upon the projection of beliefs. In the former case one would only need to know that a class is a kind of animal to grant that class an essence. However, a history of inductive inferences, both inferences about specific kinds and inferences about the parent kind kind of animal, is required for a novel kind of animal to be an entrenched category. But what of psychological essentialist accounts, which tie essence to knowledge that is not necessarily innate? It may be that the essentialists’ noninnate knowledge that is relevant to a category’s ability to constrain induction and account for identification is equivalent to projected hypotheses about classes of individuals and projected over-hypotheses about classes of classes. To support such a conjecture requires analysis of the types of information deemed relevant to an essentialist account and examination of what classes gain entrenchment from various beliefs, and perhaps experimental work as well. This analysis has not been done, nor is this the place to attempt it. The entrenchment account has four apparent advantages over essentialist accounts. The first advantage is that it subsumes both categories and properties under the same theoretical approach. Entrenchment accounts for which category most readily acts as the range of an inductive inference, as can essentialism, but it also accounts for which properties are most readily projected, for instance the example given above of eats grass rather than sits on newspapers. The second advantage is that it provides an explanation of the development of categories via the acquisition of entrenchment through inductive inferences and inheritance. The third advantage of the entrenchment position is that it uses hierarchies of parent kinds and individual kinds to make categories of novel classes. Thus a novel kind of animal such as the gerenuk will be considered an induction-supporting category by someone who knows nothing of the gerenuk except that it is a kind of animal. The fourth advantage of the entrenchment position is the power it gives to hierarchies with parent kinds, together with over-hypotheses, to support inductive inferences about specific types of properties from minimal information. For instance, the diet of gerenuks can be inferred from observation of one gerenuk eating leaves. Given the comparability of psychological essentialism and entrenchment in accounting for the three primary characteristics of categories of physical objects and the apparent advantage of the entrenchment position in some additional respects, at this time the entrenchment position would seem to merit further elaboration as an alternative to psychological essentialism.
84
Elizabeth F. Shipley
Acknowledgments This work was supported in part by NSF Grant BNS-8310009 and NSF Grant SBR-9414091. I am grateful to the children, teachers, and parents of Trinity Cooperative Day Nursery, and Swarthmore Friends School. Thanks are due Barbara Shepperson for thoughtful assistance. Correspondence should be addressed to the author at Department of Psychology, University of Pennsylvania, 3815 Walnut St., Philadelphia, PA 19104 or
[email protected]. Part of this work was presented at the 1996 meeting of the Psychonomic Society in Chicago. References Atran, S. (1995) Causal constraints on categories and categorical constraints on biological reasoning across cultures. In Causal Cognition, ed. D. Sperber, D. Premack, and A. J. Premack. Oxford: Clarendon Press. Anglin, J. M. (1977) Word, Object, and Conceptual Development. New York: Norton. Bloom, P. (1996) Intention, history, and artifact concepts. Cognition 60:1–29. Brown, R. (1958) How shall a thing be called? Psychological Review 65:14–21. Carey, S. (1985) Conceptual Change in Childhood. Cambridge, MA: MIT Press. Carey, S. (1995) On the origin of causal understanding. In Causal Cognition, ed. D. Sperber, D. Premack, and A. J. Premack. Oxford: Clarendon Press. Davidson, N. S. and Gelman, S. A. (1990) Inductions from novel categories: The role of language and conceptual structure. Cognitive Development 5:121–152. Gelman, R. (1990) First principles organize attention to and learning about relevant data: Number and the animate-inanimate distinction as examples. Cognitive Science 14:79–106. Gelman, S. A. (1988) The development of induction within natural kind and artifact categories. Cognitive Psychology 20:65–95. Gelman, S. A., Coley, J. D., and Gottfried, G. M. (1994) Essentialist beliefs in children: The acquisition of concepts and theories. In Mapping the Mind, ed. L. A. Hirschfeld and S. A. Gelman. Cambridge: Cambridge University Press. Gelman, S. A. and Medin, D. L. (1993) What’s so essential about essentialism? A different perspective on the interaction of perception, language, and conceptual knowledge. Cognitive Development 8:157–168. Gelman, S. A. and Wellman, H. M. (1991) Insides and essences: Early understanding of the non-obvious. Cognition 38:213–244. Gentner, D. (1978) What looks like a jiggy but acts like a zimbo? A study of early word meaning using artificial objects. Paper presented at the Stanford Child Language Research Forum, April 1978. Goodman, N. (1955/1983) Fact, Fiction, and Forecast. New York: Bobbs-Merrill Company, Inc. Holland, J. H., Holyoak, K. J., Nisbett, R. E., and Thagard, P. R. (1986) Induction: Processes of Inference, Learning, and Discovery. Cambridge, MA: MIT Press. Kalish, C. W. (1995) Essentialism and graded membership in animal and artifact categories. Memory and Cognition 23:335–353. Keil, F. C. (1989) Concepts, Kinds, and Cognitive Development. Cambridge, MA: MIT Press. Keil, F. C. and Batterman, N. (1984) A characteristic-to-defining shift in the development of word meaning. Journal of Verbal Learning and Verbal Behavior 23:221–236.
Children’s Categorization of Objects
85
Kemler-Nelson, D. G. (1995) Principle-based inferences in young children’s categorization: Revisiting the impact of function on the naming of artifacts. Cognitive Development 10:347–380. Landau, B. (1994) Object shape, object name, and object kind: Representation and development. In The Psychology of Learning and Motivation, volume 31, ed. D. L. Medin. Orlando:Academic Press. Malt, B. C. (1994) Water is not H2O. Cognitive Psychology 27:41–80. Malt, B. C. and Johnson, E. C. (1992) Do artifact concepts have cores? Journal of Memory and Language 31:195–217. Markman, E. M. (1989) Categorization and Naming in Children. Cambridge, MA: MIT Press. Medin, D. L. and Ortony, A. (1989) Psychological essentialism. In Similarity and Analogical Reasoning, ed. S. Vosniadou and A. Ortony. New York: Cambridge University Press. Murphy, G. L. and Medin, D. L. (1985) The role of theories in conceptual coherence. Psychological Review 92:289–316. Shipley, E. F. (1993) Categories, hierarchies, and induction. In The Psychology of Learning and Motivation, volume 30, ed. D. L. Medin. Orlando:Academic Press. Shipley, E. F., Kuhn, I. F., and Madden, E. C. (1983) Mothers’ use of superordinate category terms. Journal of Child Language 10:571–588. Simons, D. J. and Keil, F. C. (1995) An abstract to concrete shift in the development of biological thought: the insides story. Cognition 56:129–163. Smith, L. B., Jones, S. S., and Landau, B. (1996) Naming in young children: A dumb attentional mechanism? Cognition 60:143–171.
Chapter 7 Mechanisms of Verbal Working Memory Revealed by Neuroimaging Studies John Jonides In 1970, when I had the privilege of beginning my work with Henry Gleitman, the study of cognition was in the midst of a vital period. The reasons for this vitality were many, but among them was the vision that perception, memory, language, and thinking could be understood by decomposing cognitive processes into their essential elements. The tools for this decomposition were chronometric analysis and the analysis of patterns of errors that subjects made in various tasks, both of which were being applied to a host of problems in the study of cognition. Henry and I embraced these developments and used these techniques to study processes of item-recognition and short-term memory in a series of studies that we conducted under the watchful eye of Henry and Lila Gleitman’s Thursday evening (and often late-night) research seminar. The strategy of understanding complex cognitive processes by decomposing them into their simpler parts continues to be widely taught and learned as an essential skill for students of cognition. At the time of my graduate education, implementing this strategy most often involved the careful design of behavioral experiments that would reveal underlying processes in the chronometric or errorful data of subjects. In the hands of clever psychologists such as Henry and Lila, this strategy yielded insights into a wide variety of cognitive processes. In the ensuing years, the strategy has been broadened while still maintaining its essence. The first significant broadening involved the widespread use of computer models as ways of further defining and refining conceptions of elementary processes and how these combine to yield cognition. The second significant broadening is the one that is the current focus of much interest among students of cognition: the use of neuroimaging techniques to reveal brain processes that mediate cognitive processes. These techniques allow us to go well beyond merely localizing processes to various brain areas; in combination with behavioral data on normal and brain-injured humans and in combination with suitable data from invasive studies of animals, neuroimaging studies can ac-
88
John Jonides
complish the same goal that we had in 1970: to decompose cognitive processes into their elementary components. In short, neuroimaging techniques provide another modality of data for understanding cognition. To illustrate how the use of neuroimaging has enriched the study of cognition, I shall briefly review the results of a program of research conducted by Ed Smith, me, and our colleagues in recent years. We have been investigating what is now called working memory. The reason for focusing on working memory is simple: Various lines of evidence make it clear that working memory is an essential component of cognition in that it participates in such skills as problem solving, reasoning, categorization, and language comprehension. Indeed, an individual’s working-memory capacity (if measured in the right way) has been found to correlate with a large variety of complex cognitive skills (see, e.g., Carpenter, Just, and Shell 1990; Daneman and Merikle 1996; Kyllonen and Christal 1990). Furthermore, it has been shown repeatedly that a decline in working memory with normal aging and with various brain pathologies is an important predictor of declines in performance in problem solving and reasoning, showing again that working memory is critical to cognitive life (see, e.g., Salthouse 1993). In short, the phenomena of working memory are examples of the sort of “big effects” that the Gleitmans advocated as worthy topics for intense research. We define working memory in a canonical way. It is the memory system that keeps active a limited amount of information for as long as one continues to work with that information. The information in working memory is easily and readily accessible, and it is subject to frequent updating or substitution by new information. Finally, the feature that discriminates the concept of working memory from earlier conceptions of “short-term” or “primary” memory is that information held in working memory is subject to processing in various ways by what Baddeley (1986, 1992) and others have called executive processes. It is in this sense that working memory goes beyond mere storage; it involves the manipulation of information that is stored as well. Our research group has devoted itself to several issues concerned with working memory; here we review research concerned with two of these. One concerns the architecture of the working memory system itself and whether this system consists of several subcomponents concerned with storage, rehearsal, and manipulation of information that is stored. On this issue, much of our research has concentrated on working memory for verbal information. A second issue is whether working memory is a single system or whether it consists of several subsystems each tied to the processing of different sorts of information.
Mechanisms of Verbal Working Memory
89
The Architecture of Verbal Working Memory In his influential model of working memory, Alan Baddeley (1986, 1992) proposed that working memory for verbal information consists of three components. One is a buffer responsible for the temporary storage of verbal codes. A second is a rehearsal mechanism that recirculates information in the verbal buffer for the purpose of preventing decay or interference. A third is a set of processing mechanisms, collectively called the central executive, that are capable of manipulating the information stored in the buffer (including rehearsing it). Evidence for a dissociation between storage and rehearsal processes has come largely from behavioral studies of normal and brain-injured subjects. Although this evidence is compelling, it is not decisive (see Jonides et al. 1996 for a review), and so we have conducted experiments using neuroimaging techniques to try to identify the subcomponent processes of working memory. One experiment makes use of a paradigm that recruits all the components of working memory (storage, rehearsal, and executive), the “n-back” task. In this task, subjects are presented a series of single letters; as each is presented, subjects must decide whether it matches the one that was presented n items back in the series. In an experiment from our laboratory reported by Awh, Jonides, Smith, Schumacher, Koeppe, and Katz (1996), n was set to 2. A schematic of the memory condition of the task is shown in the top panel of figure 7.1. The panel illustrates that letters were presented for 0.5 sec. each with 2.5 sec. intervening between successive letters; subjects engaged in the memory task for a continuous period of approximately 60 seconds while they reclined in a PET scanner. Note that successful performance in this task requires storing in memory a constantly changing set of at least two letters, the “oldest” of which must be compared with the currently presented letter. Thus the task requires both storage of verbal information as well as processes that must update this information continuously. Of course, performance in this memory condition also includes processes of perception and response, processes that were not the targets of our interest in this experiment. The typical strategy for eliminating the effects of such processes on brain activations in a neuroimaging experiment is to test subjects in a second condition that includes only these ancillary processes, by hypothesis, and then to subtract the brain activations from such a control condition from the condition of interest. We recognize that this “subtraction” strategy has earned some welldeserved criticisms, and we address this issue below. Nevertheless, we followed this strategy by also testing subjects in a memory control condition, shown in the second panel of figure 7.1. In this condition,
90
John Jonides
Figure 7.1. Schematics of the three tasks used in the experiment by Awh et al. (1996).
subjects were presented with a series of letters, but they responded positively only if each letter matched a fixed target letter for which they searched on that trial (say, “P”). Activations from this control condition were subtracted from those in the memory condition. The results revealed activation in anterior parts of the cortex: in Broca’s, supplementary motor, and premotor areas. In addition, there was activation in posterior parietal cortex in the superior parietal lobule and supramarginal gyrus. The anterior sites that were activated in this task have been claimed to be part of a circuit (including the cerebellum, which also showed activation) that is responsible for rehearsal (Paulesu, Frith, and Frackowiak 1993); indeed, this is quite sensible in that some or all of these sites are involved in the production of explicit speech as well. The posterior sites have been implicated in storage and selective attention processes required in this task. How can one confirm these putative functions? Paulesu et al. (1993) provided some evidence for the rehearsal function of the anterior sites by showing that these sites were
Mechanisms of Verbal Working Memory
91
also activated in another task in which judgments of rhyming were required, judgments that presumably also engage internal languageproduction mechanisms (see the chapter by Reisberg in this volume for additional evidence about such judgments). Furthermore, the site we found activated in the supramarginal gyrus of parietal cortex is the most common site of damage in patients who suffer deficits in verbal memory span (McCarthy and Warrington 1990), suggesting that it may be involved in storage of verbal material. Also, the site in the superior parietal lobule has been implicated as a region involved in shifts of attention from one item of material to another in many tasks, as shown by a parallel between this site and sites that are activated in tasks in which explicit shifts of attention are required (see Awh and Jonides 1999 for a review). So there is circumstantial evidence for the functions that we claim. Beyond this, Awh et al. (1996) included a second rehearsal control condition in their experiment that allowed them to test for the rehearsal function of the anterior sites. It is schematized in the third panel of figure 7.1. In this condition, subjects were again shown a sequence of individual letters, and they were instructed to rehearse each one silently until the next one appeared; then they were to rehearse the next one, and so on. This condition requires internal speech, and so subtracting the activation due to this condition from that due to the memory condition should diminish activation in anterior sites but not in posterior ones if the anterior ones in the Memory condition are, indeed, responsible for rehearsal. In fact, this is just what Awh et al. (1996) reported. Thus it appears that experiments using neuroimaging technology can provide corroborating and converging evidence for claims rooted in behavioral data as well as provide information on the brain areas that are the substrates of working memory. Note that the experimental rationale underlying the application of neuroimaging techniques owes a great debt to the experimental rationale first articulated by Donders (1868) for the study of reaction times, a rationale discussed and debated at great length in our research seminars some one hundred years after Donders described it. Donders argued that if one could construct two tasks such that the first contained all the processing components of the second plus an additional one, then the difference in reaction time between the two tasks should be a relatively pure measure of the time required for the additional process required by the first task. In a similar fashion, much neuroimaging research has relied on subtracting the activation of one or more control tasks from an experimental task of interest, as illustrated above, to reveal activation due to processes of interest that are required by the experimental task. Just as the rationale due to Donders has been called
92
John Jonides
into question, so also can one raise questions about the validity of assuming that the activations due to selective processes can be subtracted out of a neuroimaging experiment without affecting activations due to other processes (see Jonides et al. 1997). The issue here is much the same one that occupied us in the 1970s: finding a way to isolate in data the effects of certain variables that have an impact on experimental performance. In the 1970s the data that concerned us on this score were reaction times and errors; in neuroimaging research they are patterns of brain activation that result from some task. One experimental strategy that goes beyond the subtraction method relies on parametric variation of some variable of interest (in some ways, mimicking the strategy first applied by Sternberg (1966, 1969) to the analysis of chronometric data). We have implemented this strategy for the study of the components of verbal working memory in several experiments (Jonides et al. 1997; Braver et al. 1997; Cohen et al. 1997). Our work is based on the paradigm illustrated in figure 7.2. The figure reveals various versions of the n-back task, in which n is varied systematically. In the most demanding version, shown at the top of the figure, subjects must decide whether each letter matches in identity the one that appeared 3-back in the series. We also included 2-back and 1-back versions in other conditions, and we included a 0-back condition similar to the condition used by Awh et al. (1996) as a control; in the 0-back condition, subjects were provided a single letter-target at the beginning of a trial, and they were to respond positively any time that letter appeared anywhere in the series that was presented on that trial. The design shown in figure 7.2, therefore, implements a systematic variation of task load in a working memory task. We collected PET scans for each of the conditions shown in figure 7.2, and we also collected PET data for a baseline control condition not shown in the figure in which a series of letters was presented, and subjects simply responded with a keypress when each letter was shown; there was no memory requirement at all. This baseline control condition served as a way to subtract from each of the memory conditions the activation that was due to idiosyncratic differences in brain activity among subjects. The main comparisons in the experiment, however, did not rely on subtraction methodology; rather they relied on a comparison of activation in various regions across variations in task load. Figure 7.3 reveals that, as expected, increases in task load produce a decrement in performance. This decrement appears as an increase in response time as well as a decrease in accuracy. The decrement in performance is accompanied by strikingly parallel changes in brain activation with task load at each of several sites, as shown in figure 7.4. The data in this figure were accumulated by taking regions of activation and deacti-
Mechanisms of Verbal Working Memory
93
Figure 7.2. Schematics of the tasks used by Jonides et al. (in press).
vation (that is, where the control condition shows less activation than the experimental condition and where it shows more activation, respectively) that had been previously identified with verbal working memory tasks (from the studies of Awh et al. 1996 and Schumacher et al. 1996). We then found levels of brain activation in the present data for each of these regions. The average activation in each of these regions was then plotted as a function of task load, and this is what is displayed in figure 7.4. The most striking feature of the data in this figure is that there is an overwhelming tendency for brain activation to increase
94
John Jonides
Figure 7.3. Reaction time and error data from the experiment of Jonides et al. (in press) plotted as a function of memory load.
monotonically as task load increases, and for deactivation to decrease monotonically with task load (confirmed by statistical analysis: Jonides et al. 1997). Thus there is reason to suspect that the activations and deactivations that are shown in figure 7.4 are systematically related to the memory requirements of the task. Note that these changes occur in many regions, as the functions in figure 7.4 reveal. The presumption is that increases in activation reflect increased brain activity that is required by a task; likewise, decreases in brain activation in selected regions reflect requirements for inhibition of brain activity in those regions. The specific account of which regions increase and which decrease is beyond the scope of this review. Here we merely highlight that brain activation in many regions is systematically related to the requirements of the task. Of course, one might argue that the variation in brain activation with task load shown in figure 7.4 is merely a reflection of increased overall effort as task load increases, and not selectively related to processes having to do with the memory requirements of the tasks per se. This argument is laid to rest by examining other areas of the brain that should
Mechanisms of Verbal Working Memory
95
Figure 7.4. Brain activations (above the horizontal line) and deactivations (below the line) for memory-related areas plotted as a function of memory load from the experiment of Jonides et al. (in press). Each function corresponds to one region of activation (labeled by its number according to the system of Brodmann or by anatomical structure). The identified regions are those that showed significant activation in a previous study using a similar task (see text).
not be recruited by memory processes: occipital areas that are involved in visual processing, somatosensory areas that are not relevant to the task, and primary motor areas whose activation should not vary with task load. The activations for these regions as a function of task load are displayed in figure 7.5, which shows that there is no systematic variation in brain activation in these regions as the memory task increases in difficulty. The contrast between the functions in figure 7.5 and those in figure 7.4 suggests that the areas identified in figure 7.4 do, indeed, reflect memory-sensitive processes that are active during these tasks. Note that the outcomes of parametric studies of working memory validate findings reported using the subtraction methodology based on the logic of Donders. That is, the various regions of activation shown in figure 7.4 and in other studies using parametric techniques are just the ones, by and large, that show significant activation in subtraction paradigms such as the one described above from our laboratory (Awh et al.
96
John Jonides
Figure 7.5. Brain activations for motor, visual, and somatosensory areas plotted as a function of memory load from the experiment of Jonides et al. (in press). The areas shown in the figure were identified by placing regions-of-interest on primary visual, motor, and somatosensory areas of the brain and calculating the activations in those regions.
1996). Thus, although one must exercise caution in using subtraction logic and in interpreting the results of such experiments, the outcomes of experiments on working memory that have used this logic seem to be replicated in experiments with parametric experimental manipulation of relevant variables. Another contribution of the parametric method is that it provides an opportunity to examine details of the “dose-response” curves that result, such as those shown in figure 7.4. We have exploited this property of parametric designs both in the experiment described above and in experiments using this paradigm with functional MRI as the imaging modality (Braver et al. 1997; Cohen et al. 1997). Functional MRI measurements permit a more detailed examination of brain activation patterns because they provide somewhat greater spatial resolution as well as an opportunity to examine the temporal dynamics of processes within a single experimental trial. Consider, for example, an experiment in which we varied memory load in an n-back task while stretching out the retention interval on each
Mechanisms of Verbal Working Memory
97
Figure 7.6. Activations from a region of extrastriate occipital cortex as a function of the time of recording during retention interval (from the experiment of Cohen et al., in press).
trial so that we could examine the dynamics of activation in various brain regions during that retention interval (Cohen et al. 1997). The design is much like that displayed in figure 7.2, but the delay interval between successive letters in each condition was increased to 10 seconds. This permitted us to collect four scans of the entire brain, each one occupying a 2.5-second interval during the time between letters. Thus level of brain activation could be assessed four times during the interval when subjects were engaged in the working memory task. The value of this technique is revealed by examining the results of the experiment. Examine first figure 7.6, which shows the activations that were obtained in a region of extrastriate occipital cortex. The four functions in the figure correspond to the four conditions of memory load (0-, 1-, 2-, and 3-back). Each function has four points, each corresponding to an activation for one of the four recording intervals during the retention period. Note the four functions lie fairly close to one another and that there is little systematic effect of memory load. However, the four functions are also all noticeably bowed. This can be taken to mean that as time passed during the retention interval, the amount of activation increased in this region of occipital cortex and then declined. Why should this be so? The most reasonable interpretation is that these functions all reveal the activation that was caused by presentation of a stimulus letter. The activation rises over the course of the first 7.5 seconds of the
98
John Jonides
Figure 7.7. Activations from a region of dorsolateral prefrontal cortex as a function of time of recording during the retention interval (from the experiment of Cohen et al., in press).
retention interval because the hemodynamic signal that is recorded by fMRI (corresponding to the neural signal that is tied to the increase in blood flow) is delayed relative to the neural event that causes it. By this interpretation, the functions in figure 7.6 reveal that fMRI recordings can be sensitive to early processes of encoding, and that these processes are not particularly sensitive to the memory load of the task. Examine now the activations in figure 7.7. This figure shows functions analogous to those of figure 7.6, but for activations recorded in a region of dorsolateral prefrontal cortex in the right hemisphere. Note that the pattern of data in this figure differs markedly from that in figure 7.6. First of all, the functions are not at all bowed in shape. That is, activations for all levels of memory load appear to be quite steady throughout the retention interval, not transient, as they are in occipital cortex. Notice also that in prefrontal cortex there is a dramatic effect of memory load on activation level. As the figure indicates, the 2-back and 3-back tasks resulted in substantially more activation throughout the retention interval than did the 0-back and 1-back tasks. The sustained nature of this activation and its sensitivity to memory load suggest that this area of prefrontal cortex is somehow involved in the maintenance of representations in working memory. This may be via direct storage of information or via some more indirect role. For example, regions in
Mechanisms of Verbal Working Memory
99
prefrontal cortex may serve as pointers to other regions of the brain (possibly posterior regions) that are the sites of information storage. Whatever the specific role of these prefrontal structures, it is clear from data such as those of figure 7.7 that the activation of prefrontal cortex during the n-back task is not transient in nature. It might have been so if prefrontal cortex were involved strictly in executive functions such as updating the contents of working memory or temporally tagging letters as they are presented. In both of these cases, one would have expected the activation to be transient in nature, quite different from the data that were obtained. These data indicate the potential that neuroimaging studies have for extending our knowledge of cognitive mechanisms. In connection with the sort of behavioral data that one can gather from working memory studies, neuroimaging data are proving helpful in specifying the various component mechanisms that contribute to complex cognitive phenomena. Differing Subsystems of Working Memory A central issue in the study of working memory has been whether it consists of a unitary processing system or whether it is composed of several subsystems. One line of evidence that has been illuminating about this issue comes from studies of brain-injured patients. There is evidence, reviewed in detail by Jonides et al. (1996), among others, that there are multiple subsystems of working memory defined by the type of information that is maintained. For example, there are reports of patients who have deficits in verbal working memory with no deficits in working memory for spatial information; by complement, there is a report of a patient who has a deficit in spatial but not verbal working memory. In support of this distinction, there is also evidence from strictly behavioral studies of a dissociation between subsystems of working memory defined by the type of information that is processed. The behavioral technique that has been used to provide evidence for this claim involves experiments in which subjects engage in one or another working memory task while a second (presumably interfering) task is performed. The logic of these experiments is this: If secondary tasks can be found that require the use of one or another internal code for information that is processed, then they should selectively interfere with a primary working memory task to the extent that that primary task makes use of the same code. For example, a secondary task that engages a phonological code should interfere with verbal working memory if verbal working memory also requires the use of a phonological code; but it
100
John Jonides
should not interfere with spatial working memory if that subsystem uses a code that is not phonological. Likewise, a secondary task that makes use of a spatial code should interfere with spatial working memory but not verbal working memory. Various experiments have implemented this rationale (see, e.g., Meudell 1972; Salthouse 1974; Logie, Zucco, and Baddeley 1990; Logie 1995), and they have led to the view that working memory is, indeed, composed of several subsystems defined by the type of information that is processed. These lines of neuropsychological and behavioral evidence are not immune to criticism, however. Experiments on neuropsychological populations about the dissociation of different working memory subsystems are limited to precious few patients, who are often tested on tasks that may not purely recruit one or another working memory subsystem. As for selective interference experiments testing normal subjects, it is often difficult to justify the assumption that a secondary task has a truly selective interfering effect on a primary task. There may be several sites of interference (see Jonides et al. 1996 for a detailed discussion). Consequently, we sought a line of evidence from neuroimaging studies of working memory that might converge with the behavioral evidence to address the issue of whether working memory is best conceptualized as a set of subsystems rather than a single system of information processing. The details of our experiments are reported elsewhere (Jonides et al. 1993; Smith and Jonides 1995), but it is instructive to examine one example to see how neuroimaging evidence can strengthen the case for separable working memory subsystems (Smith, Jonides, and Koeppe 1996). Consider the pair of tasks illustrated in figure 7.8. Each panel shows schematics of the events on typical trials of the spatial and verbal memory conditions respectively; both conditions involve a 3-back task, with the nature of the memorandum differing between conditions. In the spatial memory condition, subjects saw a stream of letters with the letters appeared in varying locations on the screen. The subjects’ task was to answer positively (via a button-response) if a letter’s location matched the location of the one that appeared three back in the sequence; if not, they were to answer negatively (via another buttonresponse). Similarly, the verbal memory condition also required subjects to match the current stimulus to the one that was 3-back in the sequence; however, in this case they were to match letters on their identities regardless of their spatial locations. Thus, in both tasks, subjects had to keep in working memory information about several previous stimuli, they had to match the current stimulus against the one that appeared 3-back in the sequence, and they had to update the contents of their memories with each succeeding stimulus presentation. The major
Mechanisms of Verbal Working Memory
101
Figure 7.8. Schematics of the memory tasks used by Smith, Jonides, and Koeppe (1996).
difference between the conditions was in the sort of information that was stored in memory: In the spatial memory condition it was locational information, and in the verbal memory condition it was identity information (storing a visual code for each letter would not suffice in the verbal memory condition because the case of the letters haphazardly varied from upper to lower). We collected data for each of these two memory conditions while subjects reclined in a PET scanner. The scanning recorded all the brain areas that were activated during any portions of these tasks. To focus on the processes specifically involving memory, we also tested subjects in control conditions whose activations were then subtracted from those of the memory conditions. In the spatial control condition, subjects were shown three locations on the screen prior to a sequence of stimuli such as those shown in the top panel of figure 7.8, and they were to respond positively anytime any letter appeared in any of these positions; otherwise, they responded negatively. Likewise, in the verbal control condition, they were shown three letters, and they responded positively anytime any of these appeared. The brain activations that resulted from the subtractions of the control from the memory conditions revealed both substantial overlap in
102
John Jonides
activations between conditions and substantial differences. Generally speaking, there was bilateral activation in both spatial and verbal tasks in both anterior and posterior regions of the brain. Beyond this, though, there was evidence that the spatial task activated some structures in the right hemisphere more than in the left; in a complementary way, the verbal task activated regions in the left hemisphere more than in the right. Other than this noticeable difference between the tasks, there was considerable similarity in the regions activated. There was clear activation in two regions of posterior parietal cortex, one more lateral than the other, similar to some of the results described above. Also, there was activation in dorsolateral prefrontal cortex, also similar to results presented above. And there was evidence of activation in inferior frontal gyrus in the left hemisphere in the verbal task, an indication of the involvement of verbal rehearsal in this task. For present purposes, our interest in this experiment is in its demonstration of different patterns of activation as a function of whether the material to be retained was verbal or spatial in nature. Although the overall circuitry revealed in these two conditions was similar, there was, as noted above, a dissociable pattern of activation with spatial material engaging the right hemisphere more than the left and verbal material engaging the left more than the right. Note that this pattern was obtained in an experiment in which the physical stimuli were nearly identical in the two memory conditions, rendering it unlikely that the different patterns of activation were a function of perceptual processing. We conclude that the different patterns are instead a reflection of different underlying circuitry for spatial than verbal working memory. This result, then, confirms and extends the data from neuropsychological and behavioral experiments and adds currency to the hypothesis that working memory is best characterized as a set of subsystems, each responsible for the processing of different sorts of information. A Closing Reflection Experiments such as those I have described are in the forefront of the news in cognitive psychology these days. They are there because neuroimaging techniques make it seem as if the often vague and ephemeral constructs of psychological theory can now be displayed in neural tissue. There is a certain excitement in being able to palpate something that was previously only imaginable, to see a functioning process on a computer display of brain activation where previously that process was only inferred from patterns of reaction times and errors. This is a lively development in the science. However, there are reasons beyond this to
Mechanisms of Verbal Working Memory
103
value data from neuroimaging laboratories, and these reasons are mere extensions of the ones that guided the discussions in the Gleitmans’ Thursday evening research seminars (which the clock suggested were endless, but which ended all too soon each week). Neuroimaging techniques can be applied, as our research suggests, to the identification of components of cognition and to the detailed description of the architecture of these components. In this way, it is valuable to conceive of neuroimaging data as an additional modality of insight into the phenomena of cognition, one that can supplement and enhance the behavioral study of normal and brain-injured humans. Acknowledgment This research was supported by a grant from the Office of Naval Research and by a grant from the National Institute on Aging. References Awh, E. and Jonides, J. (1999) Spatial selective attention and spatial working memory. In The Attentive Brain, ed. R. Parasuraman. Cambridge, MA: MIT Press. Awh, E., Jonides, J., Smith, E.E., Schumacher, E.H., Koeppe, R.A., and Katz, S. (1996) Dissociation of storage and rehearsal in verbal working memory: Evidence from PET. Psychological Science 7:25–31. Baddeley, A. D. (1986) Working Memory. Oxford: Oxford University Press. Baddeley, A. D. (1992) Working memory. Science 255:556–559. Braver, T. S., Cohen, J. D., Nystrom, L. E., Jonides, J., Smith, E. E., and Noll, D. C. (1997) A parametric study of prefrontal cortex involvement in human working memory. NeuroImage 54:49–62. Carpenter, P. A., Just, M. A., and Shell, P. (1990) What one intelligence test measures: A theoretical account of the processing in the Raven Progressive Matrices Test. Psychological Review 97:404–431. Cohen, J. D., Perlstein, W. M., Braver, T. S., Nystrom, L. E., Noll, D. C., Jonides, J., and Smith, E. E. (1997) Temporal dynamics of brain activation during a working memory task. Nature 386:604–608. Daneman, M. and Merikle, P. M. (1996) Working memory and language comprehension: A Meta-Analysis. Psychonomic Bulletin and Review 3:422–433. Donders, R. C. (1868) Over de snelheid van psychische processen. Onderzoekingen gedaan in het psyiologish laboratorium der Utrechtsche Hoogeschool: Tweede Reeks 2: 92–120. Jonides, J., Reuter-Lorenz, P., Smith, E. E., Awh, E., Barnes, L., Drain, M., Glass, J., Lauber, E., Patalano, A., and Schumacher, E. H. (1996) Verbal and spatial working memory. In The Psychology of Learning and Motivation, ed. D. Medin. Jonides, J., Schumacher, E. H., Smith, E. E., Lauber, E. J., Awh, E., Minoshima, S., and Koeppe, R. A. (1997) Verbal-working-memory load affects regional brain activation as measured by PET. Journal of Cognitive Neuroscience 9:462–475. Jonides, J., Smith, E. E., Koeppe, R. A., Awh, E. Minoshima, S., and Mintun, M. A. (1993) Spatial working memory in humans as revealed by PET. Nature 363:623–625. Kyllonen, P. C. and Christal, R. E. (1990) Reasoning ability is (little more than) workingmemory capacity?! Intelligence 14:389–433.
104
John Jonides
Logie, R. H. (1995) Visuo-spatial Working Memory. Hillsdale, NJ: Lawrence Erlbaum Associates. Logie, R. H., Zucco, G. M., and Baddeley, A. D. (1990) Interference with visual short-term memory. Acta Psychologica 75:55–84. McCarthy, R. A. and Warrington, E. K. (1990) Cognitive Neuropsychology: A Clinical Introduction. San Diego: Academic Press. Meudell, P. R. (1972) Comparative effects of two types of distraction on the recall of visually presented verbal and nonverbal material. Journal of Experimental Psychology 94:244–247. Paulesu, E., Frith, C. D., and Frackowiak, R. S. J. (1993) The neural correlates of the verbal component of working memory. Nature 362:342–344. Salthouse, T. A. (1974) Using selective interference to investigate spatial memory representations. Memory and Cognition 2:749–857. Salthouse, T. A. (1993) Influence of working memory on adult age differences in matrix reasoning. British Journal of Psychology 84:171–199. Schumacher, E. H., Lauber, E., Awh, E., Jonides, J., Smith, E. E., and Koeppe, R. A. (1996) PET evidence for an amodal verbal working memory system. NeuroImage 3:79–88. Smith, E. E. and Jonides, J. (1995) Working memory in humans: Neuropsychological evidence. In The Cognitive Neurosciences, ed. M. Gazzaniga. Cambridge, MA: MIT Press, pp. 1009–1020. Smith, E. E., Jonides, J., and Koeppe, R. A. (1996) Dissociating verbal and spatial working memory using PET. Cerebral Cortex 6:11–20. Sternberg, S. (1966) High-speed scanning in human memory. Science 153:652–654. Sternberg, S. (1969) Memory-scanning: Mental processes revealed by reaction-time experiments. American Scientist 57:421–457.
Chapter 8 A Nativist’s View of Learning: How to Combine the Gleitmans in a Theory of Language Acquisition Elissa L. Newport An extremely prominent fact of my graduate training is that I worked with both Henry and Lila Gleitman. The three of us collaborated, during my years at Penn and beyond, on a series of studies of mothers’ speech to young children (christened “Motherese” by Henry), and the Gleitmans signed my dissertation as my joint advisors. Together, along with a number of other honored teachers at Penn, they forged the concepts I still hold dear about honor, integrity, citizenship, and scholarship in academic life and in my department. They have remained my dear friends and mentors for more than twenty-five years. I look forward to and demand at least another twenty-five with them. Most pertinent for writing this chapter, as I look back over the research I have done since graduate school, I discover that I have spent the ensuing years thinking about my research like both Lila and Henry, and have conducted on a long-term basis two apparently quite different lines of research, one which derives most clearly from the thinking I began with Lila and the other which derives most clearly from the thinking I began with Henry. As a product of the two of them, I of course think these lines of work are related; but I do confess that they are styles of research typically performed by investigators on different sides of the field and with different views of language acquisition. In the present chapter I will present them as the “Henry” part of my work and then the “Lila” part of my work. I hope by the end it will become clear that they are indeed relatable and related, and that the Gleitmans have also succeeded in teaching me something about integrating their views. The Problem of Language Acquisition, and Two Views on Its Solution As we all know, the problem of language acquisition is as follows (Chomsky 1965): Natural languages are large (and hierarchically organized) combinatorial systems. The learner’s task is to figure out the basic elements, and then to learn which combinations of these elements are permitted. The difficulty, of course, is how one learns, by observing
106
Elissa L. Newport
a subset of the permitted combinatorial, which parts of the unobserved combinatorial space are permitted and which parts of the unobserved combinatorial space are not. The usual accounts of the solution, typically offered from benches on opposing sides of the field, are these: Claim A (stated in current wording, but nonetheless an old view): There are rich statistical regularities in natural languages, and humans absorb these input statistics remarkably readily. Claim B (likewise an old view stated in current wording): There are too many, and also too few, regularities present in the input to explain learning. Learners bring to the task a strong set of advance biases, leading them to “acquire” certain types of combinatorial patterns even when they are not present in input, and to fail to acquire other types of combinatorial patterns even when they are present in input. Because these accounts are usually maintained by investigators with opposing points of view, they are typically offered as though they were in conflict with one another. Alternatively, however, they might both be true, and might therefore demand an integration with one another. Over the last twnety-five years I have been attempting to study each of these processes, in separate lines of work: one a long-term program of experimental research using an artificial language learning methodology in the lab, and the other a long-term program of research on critical periods and creolization in natural sign language acquisition. In the present chapter I will overview our most recent findings in each of these lines of research, and then attempt to indicate how these findings might ultimately be integrated with one another in a coherent picture of language acquisition. Statistical Learning Most models of language acquisition have approached the problem by considering how a learner might form a rule from a single sentence at a time. As we all know, given a single example string from the language, any open-minded learner could hypothesize a huge (indeed, infinite) number of potential rules (Chomsky 1965; Gold 1967). The usual solution to this problem is to propose that there is relatively light learning from each example, and relatively heavy advance constraints on what the rules or principles of the language might be. One example is the notion of “triggering” of the setting of linguistic principles. However, given a large and representative corpus of relevant sentences from the language (like that used by a good field linguist), the problem changes somewhat (cf. Harris 1951; Maratsos and Chalkley
A Nativist’s View of Learning
107
1980). Distributional information involves patterns of syllable, morpheme, and word sequencing—sometimes quite complex patterns exhibited over different parts of the corpus—which linguists have traditionally used to identify the structures of a language. Recently this has been called statistical information (cf. Charniak 1993; Saffran, Newport, and Aslin 1996)—for example, calculations of which sequences of sounds appear together recurrently, or which linguistic contexts form a recurring set of alternatives for the same items. Even with a distributional corpus, the learning problem does not change in principle: There are still infinitely many generalizations consistent with a linguistic corpus, just as there are with a single instance. But such information does provide a potentially richer input base for a learner suitably predisposed to use it. Given a corpus, a learner who is innately endowed with a constrained and structured set of analytic techniques might be able to use that corpus to reduce the alternatives greatly. There are two important reasons this approach to language acquisition has not been extensively pursued until the last few years (see, e.g., numerous critiques of Maratsos and Chalkley 1980). First, there has not been an adequate theory of how to constrain such a learner, or even an adequate argument showing that such a theory could be constructed without looking essentially identical to the theory required for the single-sentence learner. (Unfortunately, the present chapter will not resolve this problem.) Second, it has seemed on the face of it somewhat implausible to imagine that a very young language learner could store a large linguistic corpus in a relatively raw form, preserving it for use in various types of distributional analysis during acquisition, or could compute the rather complex statistics from this corpus that would be required to reduce the acquisition problem. However, in the last few years my colleagues Richard Aslin, Jenny Saffran, Toby Mintz, and I, as well as a number of other researchers in the fields of language acquisition and computational linguistics, have begun to show that human learners—even eight-month old infants—can perform surprisingly complex statistical analyses of language data, even from brief exposures in the lab, and also that they are extremely selective in the types of analyses they perform. In this section I will overview our results on a first distributional problem, that of word segmentation. However, a great deal more work is required before we can determine how far an infant is capable of progressing through acquisition using an approach of this kind. Statistical Learning and Word Segmentation In a series of studies (Saffran, Newport, and Aslin 1996; Saffran, Aslin, and Newport 1996; Saffran et al. 1997; Aslin et al. 1998, 1999), Jenny
108
Elissa L. Newport
Saffran, Richard Aslin, and I began to study this question by focusing on a problem in language acquisition that clearly involved learning, that innate knowledge could not solve, and for which a distributional or statistical analysis could, at least in principle, provide an important contribution, if learners were capable of performing it. These studies have examined the problem of word segmentation: how does the learner determine, from the apparently continuous stream of speech, which sequences of sounds form the words of the language? Part of the answer involves the use of prosodic and rhythmic information, as well as silence at the ends of utterances (Aslin, Woodward, LaMendola, and Bever 1996; Brent and Cartwright 1996; Christiansen, Allen, and Seidenberg 1998; Jusczyk, Cutler, and Redanz 1993; Mehler, Dupoux, and Segui 1990; Morgan and Saffran 1995). However, such cues are not always available for use in initial segmentation (Aslin et al. 1996). Several investigators (Chomsky 1955; Harris 1955; Hayes and Clark 1970; Goodsitt, Morgan, and Kuhl 1993) have noted that this problem might be solved by keeping track of relative consistency in the sound sequences. This observation can be converted into a statistical form: Learners might compute the conditional probabilities between sequential syllables (called transitional probabilities; cf. Miller and Selfridge 1950; Goodsitt, Morgan, and Kuhl 1993; Christophe, Dupoux, Bertoncini, and Mehler 1994; Saffran, Newport, and Aslin 1996). Over a speech corpus, those sequences with relatively high conditional probabilities are likely to be inside words, and those with relatively low probabilities are likely to be the accidental juxtapositions of sounds at word boundaries. Following an important study by Hayes and Clark (1970), we asked whether human learners were capable of performing such computations. Word segmentation studies Our initial study involved presenting adults with an artificial language (Saffran, Newport, and Aslin 1996; see also Hayes and Clark 1970). The language consisted of trisyllabic “words,” concatenated in random order and spoken by a speech synthesizer with no prosodic or acoustic markers of word boundaries, to create an unbroken twenty-one-minute corpus. Although transitional probabilities varied both within and between words, the transitional probabilities inside words were relatively high, while those spanning a word boundary were relatively low, as is the case for real languages. After exposure to the corpus, subjects were given a series of two-alternative forced choice items, each containing a word from the language and either a nonword or a part-word (depending on the experimental condition). Nonwords were three-syllable sequences made of the same syllables used in the language, but in an
A Nativist’s View of Learning
109
order that did not occur in the exposure corpus. Part-words were threesyllable sequences consisting of two syllables in the correct positions and order, and a third syllable that did not occur in that word in the corpus. Subjects were to choose which of the alternatives in each item sounded more familiar. Subjects in both the nonword and the part-word conditions performed significantly and substantially above chance, suggesting that adults not only can acquire syllable order, but can also segment a stream of syllables into groups based on the distributional characteristics of the corpus. In a second study we asked whether five- to six-year old children could perform the same task (Saffran, Newport, Aslin, Tunick, and Barrueco 1997). To prevent them from getting bored during familiarization, we asked them to color on the computer, using a program called KidPix, and we merely played the speech stream in the background, with no instructions to learn or even listen to the sounds. For comparison, adults were given the same exposure. After one or two twenty-oneminute coloring sessions, both the adults and the children performed significantly above chance on a word-nonword forced-choice task. Thus children can also segment words from fluent speech based solely on statistical information from a continuous corpus. Moreover, this process can apparently proceed implicitly, without subjects’ attention directed at the speech stream or the analytic process. We have also conducted a series of three studies on statistical learning in eight-month-old infants (Saffran, Aslin, and Newport 1996; Aslin, Saffran, and Newport 1998), chosen because this is the age at which word segmentation in natural language acquisition is underway. In our first study, infants were exposed to a simplified corpus of trisyllabic nonsense words (with transitional probabilities inside words of 1.0 and those across word boundaries of 0.33), presented continuously for only two minutes. Then, using the preferential listening methodology (Jusczyk and Aslin 1995), we tested each infant with two words from the language and two nonwords (made up of familiarization syllables in a novel order). Our results showed that infants listened differentially to words versus nonwords, indicating that they could discriminate between them. Because the individual syllables in words and nonwords occurred with equal frequency in the familiarization corpus, the results cannot be due to subjects’ discriminating the frequency of individual syllables, but rather must be due to their discriminating syllable order: Infants must be noting that the syllables in nonwords never occurred in that order in the familiarization corpus. In our second study, we asked whether eight-month-olds could perform the more difficult task of discriminating words from part-words. In this study, the familiarization corpus was like that in the first study, but the test items consisted of
110
Elissa L. Newport
words and part-words. Moreover, part-words in this study were more difficult to discriminate from words than in our previous adult work. Here, part-words consisted of the final syllable of one word and the first two syllables of another word. Thus these part-words had in fact occurred in the familiarization corpus. They differed from words in having transitional probabilities of 0.33 and 1.0 (as compared with 1.0 and 1.0 for the words). Infants in this second study also listened differentially to the part-words as compared to the words. Thus eight-month-olds do not merely note whether a syllable sequence occurred or not, but apparently can perform an analysis of the statistics of the language corpus. This second infant study does not, however, demonstrate precisely what statistic the infants are computing, and whether in particular they are capable of computing conditional probabilities among sequential syllables. Because the words of the corpus were each presented with equal frequency, the part-words formed by their junctures were all less frequent than the words. Infants therefore could have been responding to trisyllabic frequencies, rather than trisyllabic conditional probabilities. (Either of these would be quite impressive, but conditional probabilities would be more structurally informative in real language learning.) To pursue this issue further, we conducted a third study of eight-month-olds (Aslin, Saffran, and Newport 1998), in which the test words and part-words were matched in frequency of presentation during familiarization, and differed only in conditional probabilities. We achieved this by creating a corpus in which two of the four words presented in the familiarization corpus were more frequent than the other two. This resulted in a corpus in which the two part-words (formed by the juncture between the high-frequency words) occurred with the same frequency as the two low-frequency words. Nonetheless, the transitional probabilities within words were still higher (1.0 and 1.0) than the transitional probabilities within the part-words (0.50 and 1.0). Our results showed that infants continued to discriminate between the words and the part-words, demonstrating clearly that they can compute transitional probabilities and can use them to segment multisyllabic words from fluent speech. Segmentation in other modalities and domains The problem of segmenting elementary units out of a complex and apparently continuous array is a problem that occurs not only in language, but in other domains as well. Moreover, statistical solutions— examining which notes recurrently occur together to form a melody, or which parts of a visual scene move together against a background— might also be applicable in these domains. My colleagues and I have therefore devised nonlinguistic materials, structured with the same sta-
A Nativist’s View of Learning
111
tistics as our speech streams, but composed of quite different basic elements, and presented them to adults and infants, to see whether the same types of computations can be performed as readily in these domains, or whether these extraordinary computational abilities are instead restricted to or are specialized for speech. Our results show quite comparable outcomes, in both adults and infants, using nonlinguistic tone sequences (Saffran, Johnson, Aslin, and Newport 1999), and visual and visuomotor sequences (Asaad 1998; Hunt and Aslin 1998). In sum, then, human learners show quite surprising abilities to compute and keep track of detailed aspects of sequential materials. These abilities appear not only in adults and children, but even in young infants of language-learning age; and they appear for rapidly presented linguistic materials, as well as for musical and visual sequences. Expanding Statistical Learning to Other Aspects of Language We are just beginning to ask how this statistical approach might be extended to other aspects of language. The particular computations we have examined thus far—transitional probabilities between adjacent syllables—would be extremely helpful in word segmentation, but would not be adequate for learning other aspects of natural languages (Chomsky 1955, 1957; Newport and Aslin, in preparation; Saffran, Aslin, and Newport 1997). Our subsequent question is therefore not whether this precise approach can be extended, but rather whether human learners can also compute other complex aspects of linguistic sequences that might be relevant to the acquisition of syntax and morphology—for example, statistics concerning the formation of word classes (Mintz, Newport, and Bever 1995, and under review), longdistance dependencies (Newport, Calandra, and Aslin, in progress), and hierarchical structure. Much further research is required before we can say whether the approach we have taken in our word segmentation work is limited to such low-level, early parts of the acquisition process, or rather whether it can be extended as well as to other higher-level problems. Moreover, as one expands this approach, it is critical that it be integrated with appropriate constraints, so that one can explain why learners do not always learn the regularities of their input, as well as why they sometimes do. This leads directly to considering the second line of research I have been conducting. Non-Statistical Learning? Although statistical, or distributional, information may be extremely helpful to at least certain parts of language acquisition, it is not the case
112
Elissa L. Newport
that learners acquire statistical information in a simple or slavish way. First, as noted earlier, there is an infinite number of statistical properties that might be computed or learned from input, and no learner could acquire all of these. Second, even for those properties highly pertinent to the structure of a language, children do not merely reproduce the statistics of their input. In some cases, children use these statistics to build a rulelike, nonstatistically organized output. In other cases they produce regularities not present in their input at all. Examples of these discrepancies between input and output come from a second line of research, on “creolization” in the acquisition of reduced and inconsistent sign language input. Studying Language Acquisition Using the Natural Experiment The previous line of work I have described descends in fact from both Lila and Henry. Lila, a student of Zellig Harris, began working in linguistics as a collaborator on the first computational linguistics project, and has transmitted to me a love of distributional analyses and of mechanisms of acquisition. But designing miniature language studies and thinking about learning theories is a love I acquired primarily from Henry. On the other hand, my second long-term research program initiates most clearly with Lila. As many have noted in this volume, a number of us began, in our graduate work with Lila, to pursue our naturenurture questions by seeking unusual “natural experiments” of acquisition—natural deviations of input or internal state that might shed special light on how these variables affect the course of acquisition. This is a method I have continued in my own work, for example, in seeking subjects who have been exposed to their primary language at varying ages (Newport 1990), or who have learned their language in infancy but only from an unusual source (Newport in press). In perhaps surprising contrast to what I have argued above, these studies have always suggested that internal state, and not linguistic input, is the dominant controller of the course of acquisition. Examined in more detail, these studies provide an important modification of, and integration with, the studies of statistical learning described above. Natural Experiments of Linguistic Input Children virtually always acquire their primary language from speakers who are fully fluent in the language. This means that their input is highly regular and systematic: Even though it may be remarkably difficult for a learning theory to explain how the regularities could be uniquely reconstructed from this input, there are rules and patterns un-
A Nativist’s View of Learning
113
derlying linguistic strings to which the learner is exposed. A statistical approach to this type of learning, then, is an approach that hypothesizes that the rules may be helpfully revealed by computing some input statistics. In contrast, in ongoing work we are observing children who are acquiring their primary language entirely from speakers who are themselves neither fluent nor native users of the language. In some cases the input they provide to their children is thus truly statistical: Morphological rules of the language are used only probabilistically, and many inconsistent errors are made. In other cases the input omits certain natural language properties. The outcomes of acquisition in these circumstances show quite clearly that, although children may use input statistics to learn parts of their language, they do not reproduce the input statistics in their own output. Rather, the architecture of their output grammar is sharpened and systematized. Our subjects are congenitally and profoundly deaf children who are acquiring American Sign Language (ASL) as their primary language. All of them are exposed to some form of ASL from birth or shortly after. However, because their families vary in their proficiency in ASL (and because we study children who have little or no input to sign language outside of their families and teachers, whom we study), the children’s input to ASL may be very reduced and inconsistent. Our findings to date concern two particular types of input variation. In one line of work we have observed the acquisition of the morphemes of ASL verbs of motion. All of the parents use these morphemes to some degree, but vary in the consistency with which they use morphemes in their required contexts. When they err, they either omit the required morphemes or replace them with ungrammatical forms. Studies of the children’s acquisition of this morphology allow us to see the effects of input inconsistency on the acquisition of these language-specific structures. In a second line of work we are observing the acquisition of syntactic and morphological rules that are not just specific details of ASL but form part of the universal patterns and principles of all languages. Here, as it happens, the parents vary not only in the consistency with which they use the structures, but also in whether they themselves exemplify or violate these linguistic universals. This work allows us to ask whether children must be exposed to these structural principles at all in order to observe them in their own productions. Inconsistent input to language-specific morphology Our first work on this topic has been a case study of a deaf child, whom we call Simon, acquiring ASL as his native language from his parents (Newport 1999; Ross and Newport 1996; Singleton 1989; Singleton and
114
Elissa L. Newport
Newport 1994 and under review). Simon is the only congenitally deaf son of two deaf parents; both parents were first exposed to ASL in their late teens and now use it as their primary language, with each other and Simon. Simon attends a school where none of the teachers or other students knows ASL; the school uses a form of Signed English, which does not contain the morphology or syntax of ASL that we have studied in Simon, and all other students in the school have hearing parents who do not know ASL. Simon’s parents’ friends are also nonnative learners of ASL. In short, Simon’s only input to ASL is from his parents. We have filmed this family’s signing since Simon was two years old, but our first analyses focused on Simon’s performance, compared with that of his parents, at a time when he should have completed his acquisition of ASL, at age 7:11. Simon, his mother, and his father were each tested for their production of the morphemes of ASL verbs of motion. Simon’s performance was also compared with that of deaf children of his age who have native signing parents, and his parents’ performances were compared with adult native signers and late learners of ASL. In native ASL, verbs of motion involve producing a large number of morphemes in combination, and these verbs are therefore difficult for both late learners and young children to acquire. Each of the morphemes does, however, have a set of obligatory contexts, and is produced by native signers in a highly regular and systematic way. Simon’s parents sign like other late learners: They use virtually all of the obligatory ASL morphemes, but only at middling levels of consistency. On relatively simple morphemes (the movement morphemes of ASL), they average 65–85% correct usage. In contrast, Simon uses these morphemes much more consistently (about 90% correct), fully equal to children whose parents are native ASL signers. Thus, when input is quite inconsistent, Simon is nonetheless able to regularize the language and surpass his input models. On more difficult morphemes (the handshape classifiers of ASL), where his parents were extremely inconsistent (about 45% correct), Simon did not perform at native levels by age 7; but even here he did surpass his parents. Ross and Newport (1996) examined Simon’s development over time on the same morphology studied at age 7:11 by Singleton and Newport. This analysis examined Simon’s acquisition of the morphology of verbs of motion from age 2:6 through 9:1 (both earlier and later than Singleton and Newport’s analyses), again compared with children of the same ages receiving fully native input. For movement morphemes (where Simon’s input was moderately consistent), Simon matched children receiving native input throughout development; his use of these morphemes exhibited no developmental delay, and no reduction in consistency or complexity. For handshape classifiers (where input consistency
A Nativist’s View of Learning
115
was lower), Simon began his acquisition process normally, but reached an asymptote, well below normal usage, at age 4:6 that continued unchanged until 9:1. Ross (in preparation) has suggested that Simon in this portion of ASL may be forming his own system, somewhat different than ASL but nonetheless systematic. To examine a greater range of inconsistency in linguistic input, Ross and Newport (in progress) have begun to study deaf children acquiring their sign language from hearing parents. These parents have learned to sign only slightly before their child, and their fluency in the language is often extremely limited. In one study we have compared these children and their parents to native signing families, on the same morphology as was studied in Simon. The full range of subjects we have studied thus ranges from native input (for control subjects) through moderately consistent input (for subjects with deaf late-learning parents, like Simon, and also some with hearing late-learning parents) to extremely inconsistent (for example, one child, Sarah, receives input from her hearing mother which is only 15% consistent on movement morphemes and 8% consistent on handshape classifiers). A summary picture of the data is shown in figure 8.1. As can be seen there, all of these children perform at native or near-native levels on movement morphemes. On the more difficult handshape classifiers (where parents are often extremely inconsistent), the children are not fully native, but their consistency substantially exceeds that of their input. What is the mechanism by which children overcome such high degrees of inconsistency in their input? Two hypotheses seem possible. One hypothesis is that children know, innately, that natural language morphology is deterministic, not probabilistic, and acquire this morphology in accord with this knowledge. An alternative hypothesis is related to the distribution of mappings between form and meaning in the input data. Whereas, say, only 65% of the verbs referring to “falling
Figure 8.1.
116
Elissa L. Newport
events” use the FALL morpheme, the other 35% use a scattering of other forms, with no one of these used with substantial frequency. This distribution may be one in which learners will acquire only the major form, and will fail to acquire the very low frequency forms. Moreover, this selectivity of learning may be particularly true for children, with limited ability to learn complex data. We are in the process of testing the second hypothesis with experimental studies in the lab (Hudson and Newport, in progress). Building structure with no input: Universal architectural principles of grammar We have also studied other linguistic structures, whose input exhibits different types of reduction. In some studies we examine not individual morphemes, but rather the way constructions combine over the language. In contrast to language-specific individual forms, such combinations across the language concern the architecture of the grammar, and are the domain of linguistic universals. What input do late-learning parents provide to such patterns, and what do their children do when learning from this input? Singleton and Newport (in preparation; Newport 1999) analyzed two such arenas in Simon at 9:1 and his late-learning deaf parents. One construction concerned inflections for number and aspect, and their combination. Simon’s parents used each of the inflections with middling consistency, but they never combined these inflections; rather, they described complex events by using only one inflection at a time, expressing the remaining part of the meaning in a periphrastic expression (a phrase accompanying the verb). They thus provided Simon with no input concerning how these inflections might be combined to express complex meanings, and in fact provided input that suggested that two inflections cannot be combined. This noncombinatorial pattern, common in late learners, is quite uncharacteristic of natural languages; natural languages have rules that apply independently (either in sequence or simultaneously), and thus rules can freely combine. (For example, if the event concerns plurality and possession, both plural and possessive morphemes will be used; one does not block or exclude the other.) Despite his input but in line with other linguistic systems, Simon combined inflections perfectly (on our elicitation tasks, Simon scored 100% correct, compared with 25% and 0% correct for his mother and father). This outcome suggests that he did not learn these architectural principles from his input, but imposed them on his language. Singleton and Newport also examined Simon’s comprehension of topicalized structures in ASL. Topicalization is the movement of a phrase to the beginning of the sentence, marking it as the topic of that sentence; in ASL, topics are marked with a special facial expression as
A Nativist’s View of Learning
117
well as a special word order. Syntactic movement, such as topicalization, universally follows a principle called structure dependence: Only proper units of structure, such as phrases, can be moved; other strings of adjacent words which do not form a phrase cannot undergo topicalization (Chomsky 1975). However, Simon’s parents (and other late learners of ASL), tested on their comprehension of topicalized structures, did not exemplify or observe structure dependence. In their own signing, Simon’s parents did not produce the full range of topicalized structures; they only topicalized subjects (for example, JOHN HIT BALL, where JOHN is topic-marked). In comprehension, they scored below chance on sentences in which other phrases were topicalized. In short, their input to Simon exhibited a highly reduced range of examples for this principle, and their comprehension consistently violated the principle. Nonetheless, Simon comprehended topic-marked sentences 100% correctly, indicating that his comprehension was fully in accord with structure dependence. Simon presumably acquired the basic phrase structure of ASL from his input; but his organization of movement rules in accord with structure dependence appears not to have come from his input. In both these cases—combining inflectional rules, and moving words to new positions—Simon appears to have gone well beyond his input, imposing universal principles of rule architecture that his parents’ usage did not illustrate. Presumably the constraints underlying these principles are part of the child’s internal biases about how language must be structured. Conclusions As I noted at the beginning of this chapter, the apparent contrasts in these lines of work—one showing remarkable statistical learning, the other showing remarkable reshaping and restructuring of input—might sound like they come from opposing views of acquisition. However, if combined, they offer an appropriately rich picture of acquisition, in which there is both learning and constraints within which this learning occurs. One relatively simple integration within these findings concerns the relation between statistical learning and Simon’s ability to surpass his inconsistent linguistic input. When faced with probabilistically used morphology, Simon acquires the most consistent portions of his input, and fails to learn the inconsistencies. By selectively learning parts of his input, then, he turns statistics into rules (Singleton and Newport, under review). In other parts of our research a more complex integration and further investigation are needed. Where do learners sharpen the probabilities provided by the input, where do they ignore their input, and where do
118
Elissa L. Newport
they add structure that is not present? Are the occasions for these processes created by particular distributional contexts in the input itself, or (as is more traditionally claimed) by innate knowledge of grammatical architectures? I hope it is not surprising that I don’t have answers to these questions. But one more lengthy meeting of the Gleitman research seminar, with Henry and Lila putting their remarkable minds together over “my” research, would really help. Acknowledgments All of the research described in this chapter was done with important collaborators, whom I gratefully acknowledge: Richard Aslin and Jenny Saffran, my collaborators on statistical learning in word segmentation; and Jenny Singleton, Danielle Ross, and Ted Supalla, my collaborators on studies of language acquisition from imperfect input. This research was supported in part by NIH grant DC00167 to E. Newport and T. Supalla, and in part by NSF grant KDI–9873477 to R. Aslin, E. Newport, R. Jacobs, and M. Hauser. References Asaad, P. (1998) Statistical learning of sequential visual patterns. Honors thesis, Department of Brain and Cognitive Sciences, University of Rochester. Aslin, R. N, Saffran, J. R., and Newport, E. L. (1998) Computation of conditional probability statistics by 8-month-old infants. Psychological Science 9:321–324. Aslin, R. N, Saffran, J. R., and Newport, E. L. (1999) Statistical learning in linguistic and non-linguistic domains. In B. MacWhinney, ed., Emergentist Approaches to Language. Mahwah, NJ: Lawrence Erlbaum Associates. Aslin, R. N., Woodward, J. Z., LaMendola, N. P., and Bever, T. G. (1996) Models of word segmentation in fluent maternal speech to infants. In J. L. Morgan and K. Demuth, eds., Signal to Syntax: Bootstrapping from Speech to Grammar in Early Acquisition. Mahwah, NJ: Lawrence Erlbaum Associates. Brent, M. R. and Cartwright, T. A. (1996) Distributional regularity and phonotactic constraints are useful for segmentation. Cognition 61:93–120. Charniak, E. (1993). Statistical Language Learning. Cambridge, MA: MIT Press. Chomsky, N. (1955/1975) The Logical Structure of Linguistic Theory. New York: Plenum Press. Chomsky, N. (1957) Syntactic Structures. The Hague: Mouton. Chomsky, N. (1965) Aspects of the Theory of Syntax. Cambridge, MA: MIT Press. Chomsky, N. (1975) Reflections on Language. New York. Christiansen, M. H., Allen, J., and Seidenberg, M. S. (1998) Learning to segment speech using multiple cues: A connectionist model. Language and Cognitive Processes. Christophe, A., Dupoux, E., Bertoncini, J., and Mehler, J. (1994) Do infants perceive word boundaries? An empirical study of the bootstrapping of lexical acquisition. Journal of the Acoustical Society of America 95:1570–1580. Gold, E. M. (1967) Language identification in the limit. Information and Control 16:447–474. Goodsitt, J. V., Morgan, J. L., and Kuhl, P. K. (1993) Perceptual strategies in prelingual speech segmentation. Journal of Child Language 20:229–252. Harris, Z. S. (1951) Methods in Structural Linguistics. Chicago: University of Chicago Press.
A Nativist’s View of Learning
119
Harris, Z. S. (1955) From phoneme to morpheme. Language 31:190–222. Hayes, J. R. and Clark, H. H. (1970) Experiments in the segmentation of an artificial speech analog. In J. R. Hayes, ed., Cognition and the Development of Language. New York: Wiley. Hunt, R. H. and Aslin, R. N. (1998) Statistical learning of visuomotor sequences: Implicit acquisition of sub-patterns. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Jusczyk, P. W. and Aslin, R. N. (1995) Infants’ detection of the sound patterns of words in fluent speech. Cognitive Psychology 29:1–23. Jusczyk, P. W., Cutler, A., and Redanz, N. J. (1993) Infants’ preference for the predominant stress patterns of English words. Child Development 64:675–687. Maratsos, M. and Chalkley, M. A. (1980) The internal language of children’s syntax: The ontogenesis and representation of syntactic categories. In K. Nelson, ed., Children’s Language, vol. 2. New York: Gardner Press. Mehler, J., Dupoux, E., and Segui, J. (1990) Constraining models of lexical access: The onset of word recognition. In G. Altmann, ed., Cognitive Models of Speech Processing: Psycholinguistic and Computational Perspectives. Cambridge, MA: MIT Press. Miller, G. A. and Selfridge, J. A. (1950) Verbal context and the recall of meaningful material. American Journal of Psychology 63:176–185. Mintz, T. H., Newport, E. L., and Bever, T. G. (1995) Distributional regularities of form class in speech to young children. In J. N. Beckman, ed., Proceedings of NELS 25 (vol. 2, pp. 43–54). Amherst, MA: Graduate Linguistic Student Association. Morgan, J. L. and Saffran, J. R. (1995) Emerging integration of sequential and suprasegmental information in preverbal speech segmentation. Child Development 66:911–936. Newport, E. L. (1990) Maturational constraints on language learning. Cognitive Science 14:11–28. Newport, E. L. (1999) Reduced input in the acquisition of signed languages: Contributions to the study of creolization. In M. DeGraff, ed., Language Creation and Language Change: Creolization, Diachrony, and Development. Cambridge, MA: MIT Press. Ross, D. S. and Newport, E. L. (1996) The development of language from non-native linguistic input. In A. Stringfellow, D. Cahana-Amitay, E. Hughes, and A. Zukowski, eds., Proceedings of the 20th Annual Boston University Conference on Language Development, vol. 2. Somerville, MA: Cascadilla Press. Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996) Statistical learning by 8-month old infants. Science 274:1926–1928. Saffran, J. R., Aslin, R. N., and Newport, E. L. (1997) Reply to five letters to the editor on the topic of “acquiring language.” Science 276:1177–1181, 1276. Saffran, J. R., Johnson, E. K., Aslin, R. N., and Newport, E. L. (1999) Statistical learning of tonal sequences by human infants and adults. Cognition 70:27–52. Saffran, J. R., Newport, E. L., and Aslin, R. N. (1996) Word segmentation: The role of distributional cues. Journal of Memory and Language 35:606–621. Saffran, J. R., Newport, E. L., Aslin, R. N., Tunick, R. A., and Barrueco, S. (1997) Incidental language learning: Listening (and learning) out of the corner of your ear. Psychological Science 8:101–105. Singleton, J. L. (1989) Restructuring of language from impoverished input: Evidence for linguistic compensation. Doctoral dissertation, University of Illinois. Singleton, J. L. and Newport, E. L. (1994) When learners surpass their models. Unpublished manuscript, University of Illinois. Singleton, J. L. and Newport, E. L. (under review) When learners surpass their models: The acquisition of American Sign Language from inconsistent input.
Chapter 9 Learning with and without a Helping Hand Susan Goldin-Meadow Lila was at Swarthmore College when I first met her. I was a prospective graduate student at Penn and anxious to meet the “language person” affiliated with the department. I took the train out to Swarthmore where we had our meeting in her office. I proudly described to this famous person, whose New York accent instantly made me feel at home, the work that I had begun as an undergraduate during my junior year abroad in Geneva, a project conducted with a fellow student (Annette Karmiloff-Smith) under the guidance of the “language person” at the Piagetian Institute (Mimi Sinclair). I carefully explained the findings on children’s comprehension of embedded and nonembedded relative clauses and was doing fine until Lila asked me why I did the research. What was interesting about it? We were both, at that point, embarrassed—Lila for having asked me a question to which I clearly did not know the answer, and me for being so completely naive as to have never even thought about the question. I have since learned many things from Lila—the importance of an elegantly turned phrase, of a good pun in the title of a manuscript, of having food at a seminar—but nothing more essential than that lesson, reinforced by example many times over the years. It is vital not only to know what you’re doing but to know why you’re doing it. Equally important, the “why” must be statable in terms that are accessible to your grandmother, or at the least to all members of a psychology or linguistics department. My most salient memory of Henry, other than the late-night conversations after the seminars around the Gleitman kitchen table (I have learned that kitchen tables are where all important conversations take place, and I have a great fondness for that particular table), was his role in preparing us to present our work to the world, and in particular, to the departments interested in interviewing us for a job. The job-talk ritual at Penn was an incredible learning experience and it revolved around Henry. You would give a talk, including slides (no overheads then) and videotapes (reel-to-reel, no cassettes) to a subset of the department, always including Henry and always in A29 (the basement
122
Susan Goldin-Meadow
room with no windows). The faculty then ripped the talk to shreds. At the end, you were told that this was great work, it just needed to be put in the right light. The next step was to take the shredded talk to Henry. What Henry did to the talk was part theater, but only part. His real contribution was to listen—really listen—to what you thought was important about your work and to help you say just that. And the outcome was not only a polished job talk but a much deeper understanding of your own work. As I watched student after student go through this process (the difference between the first version of the job talk and the final product presented to the entire department was always large and striking), I developed a deep appreciation for the intelligence and skill that led to each transformation. When it came time for my own practice job talk, I was not disappointed. The reformulation that Henry worked out with me the night we repaired my shredded talk has stayed with me and influenced my research for twenty years—both as a model for how to do good research and as a model for how to be a good teacher. While Henry helped me to understand, and be articulate about, my own ideas about language learning, it was Lila whose remarkable creativity helped generate those ideas in the first place. Lila nurtured the fascination with language that I brought to Penn, giving me an appreciation for the complexities and elegance of linguistic structure and for how hard the language-learning problem is, for both the child and the experimenter. Language learning was a frequent topic of conversation at the weekly evening seminars at the Gleitman household. Indeed, it was there that the strategy of exploring language learning by varying its parameters was born. The experimentalist in Henry demanded manipulation—to understand a phenomenon fully, one must be able to vary the factors thought to be important to it and observe their effect on it. But, for obvious ethical reasons, the language-learning situation is not easily amenable to experimental manipulation. Resourceful as always, Henry and Lila arrived at the idea of making use of experiments of nature to explore the language-learning problem, a strategy that has resulted in a large number of profitable studies converging on a coherent picture of language learning (Gleitman and Newport 1995). First came the “Motherese” work—did the natural variations in how mothers spoke to their children result in differences in how those children learned language (Newport, Gleitman, and Gleitman 1977)? Next came attempts to extend the range of natural variation. Would children lacking access to linguistic input develop language nonetheless (Feldman, Goldin-Meadow and Gleitman 1978)? Would children lacking access to vision be able to align their sightless worlds with the linguistic inputs they receive so that they too could learn language (Landau and Gleitman 1985)? Would children whose mental
Learning with and without a Helping Hand
123
development is delayed be able to acquire language following the same trajectory as more cognitively able children (Fowler, Gelman, and Gleitman 1994)? An intellectually exciting research program grew out of those meetings in the Gleitman living room, one that was devoted to exploring how children learn language, not just documenting the stages children pass through as they learn language. This question—how do children arrive at language given their inherent endowments and the inputs they receive—has guided my research for two decades. My research program grew out of the work I began with Lila and Heidi Feldman, exploring learning over ontogenetic time with a focus on tapping the aspects of language learning that are developmentally stable or “resilient.” I have studied the gesture systems created by deaf children whose hearing losses prevent them from learning the spoken language that surounds them, and whose hearing parents have not yet exposed them to a conventional sign language. The linguistic properties of these gesture systems, by virtue of the fact that they are developed under language-learning circumstances that vary dramatically from the typical, must be robust in children. To the extent that the outcome of the language development process is the same when so many of the input parameters are changed, we learn about how impervious the process is to variations in the environment (“learning without a helping hand”). In addition, I have over the last ten years begun another research program (linked superficially with the first by virtue of its focus on gesture) that explores learning over much shorter periods of time. Here, in normally hearing children, gesture is not called upon to fulfill the functions of a linguistic sytem (speech does that admirably well), and indeed gesture does not. Instead, gesture takes on a different role, reflecting the thoughts, sometimes inexpressible in speech, that learners have as they go from a less adequate to a more adequate understanding of a task. It is, in fact, the mismatch between the information conveyed in gesture and in speech that signals a learner’s readiness to make this transition. Occurring as it does in naturalistic conversation, gesture can serve as an observable index of one’s readiness to learn and can therefore provide an additional medium of instruction and communication for both learners and teachers (“learning with a helping hand”). Learning without a Helping Hand: Gesture as a Testament to the Resilience of Language There may be no greater proof of the resilience of language in humans than the fact that, when deprived of a language model entirely, the human child will invent one nonetheless. Deaf children whose severe
124
Susan Goldin-Meadow
hearing losses prevent them from learning spoken language and whose hearing parents have not exposed them to a sign language might be expected to fail to communicate—or to communicate in nonlanguage-like ways. But, in fact, deaf children in these circumstances do communicate with those around them and they use gesture to do so. Although it would certainly be possible to convey information in a mimelike fashion (e.g., elaborately enacting a scene in which the child gets and eats a cookie to request one), the children don’t behave like mimes. Rather, they produce gestures according to a segmented and combinatorial format akin to the format that characterizes all natural languages, be they signed or spoken (e.g., the child points to the cookie and then jabs her hand several times at her mouth, using two gestures in sequence to convey “cookie EAT”1). In the next sections, I briefly describe the properties of the deaf child’s gesture system, properties that constitute resilient properties of language learning. Resilient Properties of Language-Learning Sentence-level structure The gesture strings that the deaf children generate can be described in terms of very simple “rules.” The rules predict, on a probabilistic basis, which semantic elements are likely to be gestured and where in the gesture string those elements are likely to be produced (Feldman et al. 1978; Goldin-Meadow and Feldman 1977; Goldin-Meadow and Mylander 1984, 1990). Thus, for example, the children were likely to produce a gesture for the patient, as opposed to the actor, in a sentence about eating (e.g., a gesture for the cheese rather than the mouse) and were likely to place that gesture in the first position of their two-gesture sentences (e.g., “cheese EAT” rather than the reverse). Furthermore, each of the children’s gesture sentences, although frequently incomplete at the surface level, was associated with a complete predicate frame at the underlying level (Goldin-Meadow 1985). For example, there was evidence in the children’s gestures themselves for a predicate frame consisting of three elements—the actor, the action, and the patient—underlying sentences about eating. Finally, the children’s gesture sentences were characterized by recursion, the concatenation of two or more one-proposition predicate frames into a single, complex sentence (Goldin-Meadow 1982); for example, “GIVE palm EAT,” a sentence requesting that the experimenter put in the child’s palm a toy grape (proposition 1) that could be eaten (proposition 2). In addition, the surface form of the deaf children’s complex sentences was characterized by the systematic reduction of redundant
Learning with and without a Helping Hand
125
elements (Goldin-Meadow 1987), devices that are fittingly reminiscent of those described by Lila in her early work on conjunction (Gleitman 1965). Word-level structure In addition to structure at the sentence level, each deaf child’s gestures also had structure at the word level (Goldin-Meadow, Mylander, and Butcher 1995). Each gesture was composed of a handshape component (e.g., an O-handshape representing the roundness of a penny) and a motion component (e.g., a short arc motion representing a puttingdown action). The meaning of the gesture as a whole was then determined by the meanings of each of these parts in combination (“putting-down-roundness”). Although similar in many respects, the morphological systems of four deaf children studied thus far were sufficiently different to suggest that the children had introduced relatively arbitrary—albeit still iconic—distinctions into their systems (Goldin-Meadow et al. 1995). For example, two children used a C-shaped hand to represent objects two to three inches in width (e.g., a cup or a box), while two other children used the same handshape to represent objects that were slightly smaller, one to two inches in width (e.g., a banana or a toy soldier). The fact that there were differences in the ways the children defined a particular morpheme suggests that there were choices to be made (although all of the choices still were transparent with respect to their referents). Moreover, the choices that a given child made could not be determined without knowing that child’s individual system. In other words, one cannot predict the precise boundaries of a child’s morphemes without knowing that child’s individual system. It is in this sense that the deaf children’s gesture systems can be said to be arbitrary. Grammatical categories In addition to combining components to create the stem of a gesture, one deaf child also altered the internal parts of his gestures to mark the grammatical function of those gestures (Goldin-Meadow, Butcher, Mylander, and Dodge 1994). In particular, the child tended to abbreviate a form when it played a noun role but not when it played a verb role. In contrast, the child would alter the placement of a form when it played a verb role but not when it played a noun role. For example, when used to mean “jar” (noun), the TWIST form would be produced with a single turn and in neutral space, but when used to mean “twist” (verb), it would be produced with several turns and extended toward the intended patient. In addition to marking grammatical categories morphologically, the same child also marked the categories syntactically (Goldin-Meadow et
126
Susan Goldin-Meadow
al. 1994). The child placed a form in the initial position of a two-gesture sentence when it played a noun role (“TWIST jar,” used to identify the jar) but in second position when it played a verb role (“jar TWIST,” used to request that the jar be opened). Interestingly, as in many natural languages (cf. Thompson 1988), adjectives in this deaf child’s gesture system were marked like nouns morphologically but like verbs syntactically (Goldin-Meadow et al. 1994). For example, when used as an adjective to mean “broken,” the BREAK gesture was produced in neutral space and abbreviated, like a noun (i.e., two fists held side-by-side in the chest area, separated from each other only once) but it was placed in second position, like a verb (i.e., “toy BREAK”). Language use The deaf children did not invent this structural complexity to serve a single function. Rather, they used their gestures for a wide variety of functions typically served by language—to convey information about current, past, and future events, and to manipulate the world around them (Butcher, Mylander, and Goldin-Meadow 1991). For example, to describe a visit to Santa Claus, one of the deaf children first pointed at himself, indicated Santa via a LAUGH gesture and a MOUSTACHE gesture, pointed at his own knee to indicate that he sat on Santa’s lap, produced a FIRETRUCK gesture to indicate that he requested this toy from Santa, produced an EAT gesture to indicate that he ate a pretzel, and then finished off the sequence with a palm hand arcing away from his body (his nonpresent marker) and a final point at himself (Morford and Goldin-Meadow 1997). In addition to the major function of communicating with others, one deaf child used gesture when no one was paying attention, as though “talking” to himself (Goldin-Meadow 1993). Once when the child was trying to copy a configuration of blocks off of a model, he made an ARCED gesture in the air thus indicating the block he needed next; when the experimenter offered a block that fit this description, the child ignored her, making it clear that his gesture was not directed at her but was for his use only. The same child also used gesture to refer to his own gestures (Goldin-Meadow 1993), and to comment on (indeed criticize) the gestures of his hearing sister (Singleton, Morford, and GoldinMeadow 1993). The Environmental Conditions That Support the Development of a Gesture System These then are properties of language that arise when a child develops a communication system without benefit of conventional linguistic input.
Learning with and without a Helping Hand
127
What does this list have to do with language learning that takes place under normal circumstances? Perhaps nothing, although it is an accepted fact that children, even when given access to a language model, routinely go beyond that input—at the least, children hear sentences, but learn rules. What is striking about the deaf children is not just that they are creating a language with little environmental support, but that their product has properties in common with the languages learned by children exposed to conventional languages even though they have very different materials to work with. To better understand the relationship between the product and the materials the deaf children have at their disposal, we explored the environmental conditions under which the children developed their gesture systems. Input from the gestures of hearing individuals We first observed the spontaneous gestures that the children’s hearing parents produced when they communicated with their children. We found that the structure evident at the sentence and word levels in each of the deaf children’s gesture systems could not be traced back to their mothers’ spontaneous gestures (Goldin-Meadow and Mylander 1983, 1984; Goldin-Meadow et al. 1995), nor could their grammatical categories (Goldin-Meadow et al. 1994) or many of their communicative functions (Butcher et al. 1991; Morford and Goldin-Meadow 1997). Indeed, the gestures the parents produced appeared to be no different from the gestures that any hearing individual uses along with speech (Goldin-Meadow, McNeill and Singleton 1996) and, as such, are global and synthetic in form with structure quite different from the structure of natural language (McNeill 1992). The surprising result is that the children’s gestures are structured so much like natural language even though their parents’ gestures, which are likely to serve as input to those gestures, are not. Parental responsiveness to the deaf child’s gestures It is possible, however, that the structure in the children’s gesture systems came from other nonlinguistic aspects of their environment. For example, by responding with either comprehension or noncomprehension to their children’s gestures, the hearing parents of the deaf children might have (perhaps inadvertently) shaped the structure of those gestures. However, we found that the mothers responded with comprehension to approximately half of each child’s gesture strings—whether or not those strings followed the child’s preferred orders. In other words, the mothers were just as likely to understand and act on the children’s ill-formed strings as their well-formed strings, suggesting that these particular patterns of parental responsivity did not shape the
128
Susan Goldin-Meadow
orders that the children developed in their gesture systems (GoldinMeadow and Mylander 1983, 1984). There seems little doubt that comprehensibility determined the form of the deaf children’s gestures at a general level—the children’s gestures were iconic, with gesture forms transparently related to the intended meanings. Indeed, the overall iconicity of the children’s gestures may have contributed to the fact that variations in gesture order had little effect on the parents’ comprehension—a mother could easily figure out that her child was describing apple eating whether the child pointed at the apple before producing an EAT gesture, or produced the EAT gesture before pointing at the apple. Thus, although the children’s gestures were quite comprehensible to the hearing individuals around them, there was no evidence that the structural details of each child’s gesture system were shaped by the way in which the mothers responded to those gestures—we found no evidence that the child was given a helping hand by the mother. Parent-child interaction and its effect on the deaf child’s gestures: A look across cultures Nevertheless, there may be other, more subtle ways in which parentchild interaction affects child communication. For example, Bruner (1974/1975) has suggested that the structure of joint activity between mother and child exerts a powerful influence on the structure of the child’s communication. To determine the extent to which the structure in the deaf children’s gestures is a product of the way in which mothers and children jointly interact in their culture—and in so doing, develop a more stringent test of the resilience of the deaf children’s gesture systems—we have begun a study of deaf children of hearing parents in a second culture, a Chinese culture. The literature on socialization (Miller, Mintz, and Fung 1991; Young 1972), task-oriented activities (Smith and Freedman 1982), and academic achievement (Chen and Uttal 1988; Stevenson, Lee, Chen, Stigler, Hsu, and Kitamura 1990) suggests that patterns of mother-child interaction in Chinese culture differ greatly from those in American culture, and we have replicated these differences in our own studies of interaction between hearing mothers and their deaf children in Chinese and American families (Wang 1992; Wang, Mylander, and Goldin-Meadow 1995). The salient differences between Chinese and American maternal interaction patterns provide us with an excellent opportunity to examine the role that mother-child interaction plays in the development of the gestural communication systems of deaf children. If, as our current work suggests (Goldin-Meadow and Mylander 1998), there are similarities between the spontaneous gestural systems developed by deaf chil-
Learning with and without a Helping Hand
129
dren in Chinese culture and deaf children in American culture, an increasingly powerful argument can be made for the noneffects of mother-child interaction patterns on the development of these gestural systems—that is, we will have increasingly compelling evidence for the resilience of the linguistic properties found in the deaf children’s gestural systems. Conversely, to the extent that the gestural systems of the Chinese deaf children are consistently different from the American deaf children’s gestural systems, an equally compelling argument can be made for the effects of cultural variation—as instantiated in motherchild interaction patterns—on the spontaneous gestural systems of deaf children. Resilience in the Face of External and Internal Variability: Equifinality The phenomenon of gesture creation suggests that language development is resilient across environmental conditions that vary dramatically from the typical. However, language is resilient not only in the face of external variation,but also in the face of organic variation. For example, the acquisition of grammar in the earliest stages has been found to proceed in a relatively normal manner and at a normal rate even in the face of unilateral brain injury (Feldman 1994). As a second example, children with Down’s syndrome have numerous intrinsic deficiencies that complicate the process of language acquisition; nevertheless, most of these children acquire some basic language reflecting the fundamental grammatical organization of the language they are exposed to (the amount of language that is acquired is in general proportion to their cognitive capabilities, Rondal 1988; see also Fowler et al. 1994). Thus human language appears to naturally assume a certain form, and that form can be reached through a wide range of developmental paths, some varying from the norm in terms of external factors, some in terms of internal factors. In other words, language development is characterized by “equifinality”—a term coined by the embryologist Driesch (1908, as reported in Gottlieb 1995) to describe a process by which a system reaches the same outcome despite widely differing input conditions. Are there any implications for mechanisms of development that we can now draw having identified language learning as equifinal? At least two types of systems seem possible: (1) A system characterized by equifinality can rely on a single developmental mechanism that not only can make effective use of a wide range of inputs (both external and internal) but will not veer off track in response to that variability; that is, on a mechanism that is not sensitive to large differences in input. The image that comes to mind here is a sausage machine that takes inputs of all sorts and, regardless of the
130
Susan Goldin-Meadow
type and quality of that input, creates the same (at least on one level) product. (2) A system characterized by equifinality can rely on multiple developmental mechanisms, each activated by different conditions but constrained in some way to lead to the same endproduct (cf. Miller, Hicinbothom, and Blaich 1990). The analogy here is to four distinct machines, each one designed to operate only when activated by a particular type of input (e.g., a chicken, pig, cow, or turkey); despite the different processes that characterize the dismembering operations of each machine, the machines result in the same sausage product. At first glance, it may seem improbable that a variety of developmental mechanisms would be constrained to arrive at precisely the same outcome. However, it is relatively easy to imagine that the function served by the mechanisms—a function that all of the developmental trajectories would share, such as communicating via symbols with other humans (cf. Goldin-Meadow et al. 1996)—might have been sufficient, over time, to constrain each of the mechanisms to produce the same product. The findings that we have assembled thus far on the gesture systems created by deaf children do not allow us to distinguish between these two hypothetical mechanisms—not yet. Nevertheless, it is certain that by continuing to compare the process of language learning in typical and atypical circumstances—the strategy born in the Gleitman living room—we will approach a more complete understanding of how children learn language. Learning with a Helping Hand: Gesture’s Role in the Learning Process One of the questions that plagued me for quite some time in my work on the deaf children’s gesture systems was this: If the linguistic properties listed above are so resilient, why don’t they appear in the gestures that the deaf children’s hearing parents use? I have since come to realize the answer—the parents’ gestures were not “free” to assume the language-like structure found in their children’s gestures simply because the parents always produced their gestures while talking. Gesture and speech in hearing individuals form a single integrated system—the two modalities work together to convey the speaker’s intended message, with speech assuming the segmented and combinatorial form that characterizes natural languages, and gesture assuming an imagistic and holistic form (McNeill 1992; when speech is absent, as in sign languages and the deaf child’s gesture system, it is the manual modality that assumes the segmented and combinatorial form that characterizes natural languages). My second research program explores the integrated ges-
Learning with and without a Helping Hand
131
ture-speech system in its own right, particularly in relation to learning over the short-term. The Relationship between Gesture and Speech as an Index of Readiness-toLearn My students and I have examined the gestures that hearing children spontaneously produce when explaining their solutions to a task. We coded gesture and speech independently and made an interesting discovery—at times, the children conveyed one message in speech and another in gesture. For example, a young child asked to solve a liquid quantity conservation task says that the transformed object is different from the original because “this one is taller than this one” but, in the same response, produces a gesture reflecting an awareness of the widths of the objects; specifically, she indicates with her hands the skinny diameter of the original object and the wider diameter of the transformed object, thus revealing knowledge of the widths of the task objects that was not evident in her speech (Church and Goldin-Meadow 1986). We have labeled instances in which gesture and speech convey different information in a problem-solving situation “mismatches.” Some children produce many gesture-speech mismatches on a given task while others produce few. Moreover, children who produce a relatively large number of mismatches in their explanations of a particular task (e.g., a conservation task or a mathematical equivalence task) are more likely to benefit from instruction in that task than children who produce few mismatches (Alibali and Goldin-Meadow 1993; Church and Goldin-Meadow 1986; Goldin-Meadow, Alibali, and Church 1993; Perry, Church, and Goldin-Meadow 1988, 1992). Thus gesture-speech mismatch signals that the child is in a transitional state with respect to a task, and is therefore ready to make progress on that task if given appropriate input. Why does gesture-speech mismatch index readiness-to-learn? When a child produces a mismatch, that child is, by definition, activating two notions on the same problem—one displayed in speech and one in gesture. We suggest that the activation of two ideas on the same problem is indeed what characterizes the transitional state and what may destabilize the learner so that input can have an effect. If children who produce a large number of mismatches on a task are, in fact, activating two notions every time they solve a task of that type, they should be expending more effort in reaching their incorrect solutions on the task than children who produce few mismatches (who also solve the problems incorrectly but should do so more efficiently). Evidence from a cognitive load
132
Susan Goldin-Meadow
task shows that children who produce many mismatches in their explanations of a math task, when later asked to solve the task but not explain it, expend more effort on the task (as gauged by their performance on a simultaneously performed word recall task) than do children who produce few mismatches (Goldin-Meadow, Nusbaum, Garber, and Church 1993). Gesture’s Effect on the Learner and the Learning Situation A difference—or mismatch—between the information conveyed in gesture and the information conveyed in speech appears to reflect the activation of two notions on the same problem and, as a result, to signal readiness for cognitive growth. It is an open question whether the actual production of gesture-speech mismatch contributes to change— that is, does the act of producing two different pieces of information across modalities but within a single communicative act improve the learner’s ability to transpose that knowledge to a new level and thus produce those pieces of information within a single modality? Some evidence suggests that it might (gesturers were somewhat more likely to improve after instruction on a math task than nongesturers; Alibali and Goldin-Meadow 1993) but more work is needed to determine whether the act of producing mismatches itself facilitates transition. We aren’t yet sure whether “sitting on your hands” is something more than a metaphor. Even if it turns out that the production of gesture-speech mismatches does not affect learners themselves and thus does not contribute to the learning process directly, it may still play an indirect role in the process by shaping the learning environment. The information conveyed in the gestured component of a child’s mismatch is frequently not found anywhere in that child’s speech (Alibali and Goldin-Meadow 1993). For example, a child who indicates the width of the containers in the gestured component of her mismatch is not likely to demonstrate an understanding of width in her speech on any of the liquid quantity trials (interestingly, the reverse does not hold—the information conveyed in the spoken component of a mismatch is quite likely to be conveyed by that child in gesture on other trials; Garber, Alibali and Goldin-Meadow 1998). In other words, gesture often conveys information about a child’s understanding that is not conveyed anywhere in that child’s spoken repertoire. As such, gesture provides a unique window into the thoughts of the learner. If those who interact with children in learning situations were able to “read” a child’s gestures, they would gain access to the unique information conveyed in gesture. Knowledge about a child’s understanding of
Learning with and without a Helping Hand
133
a task gained through that child’s gesture might then affect how the adult interacts with the child on the task. For example, an adult who has “read” a child’s width gesture on a conservation task might then treat that child as though she understood the importance of width in judging quantity, an instructional stance that could promote in the child an explicit understanding of width. We have taken several steps in exploring this hypothesis. We have shown that adults can accurately interpret the gestures that children produce when those gestures are replayed on videotape (Alibali, Flevares, and Goldin-Meadow 1997; Goldin-Meadow, Wein, and Chang 1992) and when they are observed “live” in fleeting real time (Goldin-Meadow and Sandhofer 1999). In fact, even children are able to read the gestures produced by other children (Kelly and Church 1997). Our final step—to discover whether the information adults gain from a child’s gestures affects the way they interact with that child—is still in the “in progress” stage. We have, however, shown that the gestures spontaneously produced in a teaching situation can affect the participants—in this case, the teacher’s gestures affected the learner’s responses (Fernandez, Flevares, Goldin-Meadow, and Kurtzberg 1996). When asked to instruct children individually in a mathematical equivalence problem, each of two teachers produced at least one gesture-speech mismatch with each child (it is not clear what prompted the mismatches; one hopes uncertainty about how best to teach the concept, rather than teacher uncertainty about the concept itself). What was interesting was that the children were much less likely to merely repeat what the teacher said when responding to a teacher mismatch than when responding to a teacher match or no-gesture turn. We have since replicated this pattern with eight additional teachers, each individually instructing a series of children (Goldin-Meadow, Kim, and Singer 1999). These findings are the first to suggest that the spontaneous gestures produced by the participants in a learning situation can alter the course of instruction—that learning may be influenced by the helping hand. Summary To summarize, I have explored the gestures of children in two very different learning contexts—in deaf children whose only form of effective communication is gesture, and in hearing children whose gestures routinely and naturally accompany speech. My program of research on the deaf children, launched in the Gleitman living room, has shown that not only does gesture in this context take on the functions language serves, it also takes on its forms—and it does so without benefit of an explicit model for those forms, that is, without a helping hand. In contrast, in hearing children, gesture coexists with speech and assumes neither the
134
Susan Goldin-Meadow
functions nor the forms of language. Rather, gesture assumes a format that is less analytic and more imagistic than speech (and than the deaf child’s gestures)—a format that complements the segmented and combinatorial structures found in speech. In this context, gesture is free to express ideas that are not easily incorporated into speech and, as such, has the potential to lend a helping hand to the learning process. In closing, I’d like to say what a privilege it is to have been mentored by Lila and Henry. Even more than learning what good research is and how it should be done, I have learned from Lila and Henry how to be a good teacher. By example and by explicit instruction, I learned how to teach in large classes, in small discussions, and in interactions with individual students, and I learned that good teaching is something to be valued highly and worked toward. I fear that I may have set standards that are just too high when I ask myself to be as superb a mentor to my own students as Lila and Henry were to all of us. But I figure I might as well aim high, and I thank Lila and Henry from the bottom of my heart for showing me how good it can get. Acknowledgment The work described here was supported by grants from the March of Dimes, the Spencer Foundation, the National Institutes of Child Health and Human Development (R01 HD18617), and the National Institute on Deafness and other Communication Disorders (RO1 DC00491). Notes 1. “Cookie EAT” is a sentence consisting of two gestures. Deictic pointing gestures are displayed in lower case letters, iconic gestures in capital letters. The boundary of a gesture sentence is determined by motoric criteria. If the hand is relaxed or returned to neutral position (chest level) prior to the onset of the next gesture, each of the two gestures is considered a separate unit. If there is no relaxation of the hand between the two gestures, the two are considered part of a single gesture sentence.
References Alibali, M. W. and Goldin-Meadow, S. (1993) Gesture-speech mismatch and mechanisms of learning: What the hands reveal about a child’s state of mind. Cognitive Psychology 25:468–523. Alibali, M., Flevares, L., and Goldin-Meadow, S. (1997) Assessing knowledge conveyed in gesture: Do teachers have the upper hand? Journal of Educational Psychology 89:183–193. Bruner, J. (1974/1975) From communication to language: A psychological perspective. Cognition 3:255–287. Butcher, C., Mylander, C., and Goldin-Meadow, S. (1991) Displaced communication in a self-styled gesture system: Pointing at the non-present. Cognitive Development 6:315–342.
Learning with and without a Helping Hand
135
Chen, C. and Uttal, D. H. (1988) Cultural values, parents’ beliefs, and children’s achievement in the United States and China. Human Development 31:351–358. Church, R. B. and Goldin-Meadow, S. (1986) The mismatch between gesture and speech as an index of transitional knowledge. Cognition 23:43–81. Driesch, H. (1908/1929) The Science and Philosophy of the Organism. London: A. and C. Black. Feldman, H. M. (1994) Language development after early unilateral brain injury: A replication study. In Constraints on language acquisition: Studies of atypical children, ed. H. Tager-Flusberg. Hillsdale, NJ: Erlbaum Associates, 75–90. Feldman, H., Goldin-Meadow, S., and Gleitman, L. (1978) Beyond Herodotus: The creation of a language by linguistically deprived deaf children. In Action, Symbol, and Gesture: The Emergence of Language, ed. A. Lock. New York: Academic Press, 351–414. Fernandez, E., Flevares, L., Goldin-Meadow, S., and Kurtzberg, T. (1996) The role of the hand in teacher-student interaction. Paper presented at the annual meeting of AERA, New York, April 1996. Fowler, A. E., Gelman, R., and Gleitman, L. R. (1994) The course of language learning in children with Down’s syndrome: Longitudinal and language level comparisons with young normally developing children. In Constraints on language acquisition: Studies of atypical children, ed. H. Tager-Flusberg. Hillsdale, NJ: Erlbaum Associates, 91–140. Garber, P., Alibali, M. W., and Goldin-Meadow, S. (1998) Knowledge conveyed in gesture is not tied to the hands. Child Development 69:75–84. Gleitman, L. R. (1965) Coordinating conjunctions in English. Language 41:260–293. Gleitman. L. R. and Newport, E. L. (1995) The invention of language by children: Environmental and biological influences on the acquisition of language. In Language, Volume 1, Invitation to Cognitive Science Series, ed. L. R. Gleitman and M. Liberman. Cambridge, MA: MIT Press, 1–24. Goldin-Meadow, S. (1982) The resilience of recursion: A study of a communication system developed without a conventional language model. In Language Acquisition: The State of the Art, ed. E. Wanner and L. R. Gleitman. New York: Cambridge University Press, 51–87. Goldin-Meadow, S. (1985) Language development under atypical learning conditions: Replication and implications of a study of deaf children of hearing parents. In Children’s Language, volume 5, ed. K. Nelson. Hillsdale, NJ: Lawrence Erlbaum and Associates, 197–245. Goldin-Meadow, S. (1987) Underlying redundancy and its reduction in a language developed without a language model: The importance of conventional linguistic input. In Studies in the acquisition of anaphora, Volume II: Applying the constraints, ed. B. Lust. Boston: D. Reidel Publishing Company, 105–133. Goldin-Meadow, S. (1993) When does gesture become language? A study of gesture used as a primary communication system by deaf children of hearing parents. In Tools, Language, and Cognition in Human Evolution, ed. K. R. Gibson and T. Ingold. New York: Cambridge, 63–85. Goldin-Meadow, S., Alibali, M. W., and Church, R. B. (1993) Transitions in concept acquisition: Using the hand to read the mind. Psychological Review 100:279–297. Goldin-Meadow, S., Butcher, C., Mylander, C., and Dodge, M. (1994) Nouns and verbs in a self-styled gesture system: What’s in a name? Cognitive Psychology 27:259–319. Goldin-Meadow, S. and Feldman, H. (1977) The development of language-like communication without a language model. Science 197:401–403. Goldin-Meadow, S., Kim, S., and Singer, M. (1999) What the teacher’s hands tell the student’s mind about math. Journal of Educational Psychology 91:720–730.
136
Susan Goldin-Meadow
Goldin-Meadow, S., McNeill, D., and Singleton, J. (1996) Silence is liberating: Removing the handcuffs on grammatical expression in the manual modality. Psychological Review 103:34–55. Goldin-Meadow, S. and Mylander, C. (1983) Gestural communication in deaf children: Non-effect of parental input on language development. Science 221:372–374. Goldin-Meadow, S. and Mylander, C. (1984) Gestural communication in deaf children: The effects and non-effects of parental input on early language development. Monographs of the Society for Research in Child Development 49, no. 207. Goldin-Meadow, S. and Mylander, C. (1990) Beyond the input given: The child’s role in the acquisition of language. Language 66:323–355. Goldin-Meadow, S. and Mylander, C. (1998) Spontaneous sign systems created by deaf children in two cultures. Nature 391:279–281. Goldin-Meadow, S., Mylander, C., and Butcher, C. (1995) The resilience of combinatorial structure at the word level: Morphology in self-styled gesture systems. Cognition 56:195–262. Goldin-Meadow, S., Nusbaum, H., Garber, P., and Church, R. B. (1993) Transitions in learning: Evidence for simultaneously activated strategies. Journal of Experimental Psychology: Human Perception and Performance 19:92–107. Goldin-Meadow, S. and Sandhofer, C. M. (1999) Gesture conveys substantive information about a child’s thoughts to ordinary listeners. Developmental Science 2:67–74. Goldin-Meadow, S., Wein, D., and Chang, C. (1992) Assessing knowledge through gesture: Using children’s hands to read their minds. Cognition and Instruction 9:201–219. Gottlieb, G. (1996) A systems view of psychbiological development. In The lifespan development of individuals: behavioral, neurobiological, and psychosocial perspectives (pp. 76–103). ed. D. Magnusson. New York: Cambridge University Press. Kelly, S. D. and Church, R. B. (1997) Children’s ability to detect nonverbal behaviors from other children. Cognition and Instruction 15:107–134. Landau, B., and Gleitman, L. R. (1985) Language and Experience: Evidence from the Blind Child. Cambridge, MA: Harvard University Press. McNeill, D. (1992) Hand and Mind: What Gestures Reveal about Thought. Chicago: University of Chicago Press. Miller, D. B., Hicinbothom, G., and Blaich, C. F. (1990) Alarm call responsivity of mallard ducklings: Multiple pathways in behavioural development. Animal Behavior 39:1207–1212. Miller, P. J., Mintz, J., and Fung, H. (1991) Creating children’s selves: An American and Chinese comparison of mothers’ stories about their toddlers. Paper presented at the biennial meeting of the Society for Psychological Anthropology, October 1991. Morford, J. P. and Goldin-Meadow, S. (1997) From here to there and now to then: The development of displaced reference in homesign and English. Child Development 68:420–435. Newport, E. L., Gleitman, H., and Gleitman, L. R. (1977) Mother I’d rather do it myself: Some effects and non-effects of maternal speech style. In Talking to children, ed. C. E. Snow and C. A. Ferguson. New York: Cambridge University Press, 109–150. Perry, M., Church, R. B. and Goldin-Meadow, S. (1988) Transitional knowledge in the acquisition of concepts. Cognitive Development 3:359–400. Perry, M., Church, R. B., and Goldin-Meadow, S. (1992) Is gesture-speech mismatch a general index of transitional knowledge? Cognitive Development 7:109–122. Rondal, J. A. (1988) Down’s syndrome. In Language Development in Exceptional Circumstances, ed. D. Bishop and K. Mogford. New York: Churchill Livingstone, 165–176.
Learning with and without a Helping Hand
137
Singleton, J. L., Morford, J. P., and Goldin-Meadow, S. (1993) Once is not enough: Standards of well-formedness in manual communication created over three different timespans. Language 69:683–815. Smith, S. and Freedman, D. (1982) Mother-toddler interaction and maternal perception of child development in two ethnic groups: Chinese-American and EuropeanAmerican. Paper presented at the annual meeting of the Society for Research in Child Development, Detroit. Stevenson, H. W., Lee, S.-L., Chen, C., Stigler, J. W., Hsu, C.-C., and Kitamura, S. (1990) Contexts of achievement. Monographs of the Society for Research in Child Development 55, no. 221. Thompson, S. A. (1988) A discourse approach to the cross-linguistic category “adjective.” In Explaining Language Universals, ed. J. A. Hawkins. Cambridge, MA: Basil Blackwell, 167–185. Wang, X-L. (1992) Resilience and fragility in language acquisition: A comparative study of the gestural communication systems of Chinese and American deaf children. Unpublished Ph.D. dissertation, University of Chicago. Wang, X-L., Mylander, C., and Goldin-Meadow, S. (1995) The resilience of language: Mother-child interaction and its effect on the gesture systems of Chinese and American deaf children. In Language, Gesture, and Space, ed. K. Emmorey and J. Reilly. Hillsdale, NJ: Erlbaum Associates, 411–433.
Chapter 10 The Detachment Gain: The Advantage of Thinking Out Loud Daniel Reisberg The weekly meetings of the Gleitman research seminar were for many of us an extraordinary opportunity to learn psychology—its substance, its methods, and its history. In that seminar, we gained from others’ comments and criticisms of our work, and we also gained simply from the opportunity to articulate our ideas before this sophisticated and demanding audience. Likewise, Henry and Lila Gleitman were (in this seminar as in all other contexts) eloquent and effective, articulate and persuasive. They were, perhaps, merely “thinking out loud,” but it was thinking of a remarkable sort, one that benefited all of us enormously. Two decades later, however, I am led to ask: What does it mean to “think out loud”? Does one merely articulate one’s thoughts, or (as seems more likely) must one “translate” the thoughts into some expressible form? If the latter, what gains and what losses derive from this translation? These questions are tied to a number of research problems on the current scene (Crutcher 1994; Ericsson and Simon 1980; Payne 1994; Schooler, Ohlsson, and Brooks 1993; Wilson 1994), but these questions are of course also tied to a much older theoretical framework: Early in this century, John Watson took a strong stand on the relation between “thinking” and “talking,” arguing that the former was merely a covert version of the latter; “thinking out loud,” therefore, was no trick at all, but merely the overt voicing of representations (for Watson: “response sequences”) already in linguistic form. Watson intended that this argument be understood quite literally. In fact, Henry Gleitman is fond of quoting a quip from Watson, emphasizing this point: What’s the difference, Watson asked, between someone with laryngitis and a congenital moron? As Henry tells it, Watson’s answer was simple: Someone with laryngitis will get better. Modern psychologists tend to scoff at these claims. Thought is not, we all know, covert speech. But one might still wonder whether there is an element of truth in Watson’s observations. Could there be some intellectual impairment associated with laryngitis? Or to put this more
140
Daniel Reisberg
realistically, what is it that happens when we think out loud (or write out our thoughts, or express our thoughts in sign language)? Is there any functional advantage in putting thoughts into these “externalized” forms—judgments we can make or problems we can solve that we couldn’t solve with silent thought? As a related question, Roger Shepard made this observation some years back: Imagine that you’ve been asked how many windows there are in your home or apartment. Many people tackle this task by closing their eyes and drawing the perimeter of their home in the air with an index finger. Why this peculiar behavior? Is there any advantage to externalizing the representation in this (motoric) fashion? To anticipate the data, it turns out that we do benefit in many tasks from this sort of externalization—be it talking out loud or just subvocalizing, and likewise whether it’s drawing on paper or merely drawing in the air. However, not all tasks show this “externalization benefit,” and this invites the crucial question: Why is it that some tasks can be done perfectly well based only on the information contained in a mental representation, whereas other tasks benefit from externalization? As a way of entering this discussion, consider the following procedure. Subjects were shown a number of letter and number strings, such as D-2-R, or N-C-Q-R. The subjects’ task was to discern what word or phrase these strings would produce, if pronounced aloud. (In the two examples just given, the solutions would be “detour” and “insecure.”) Once subjects had learned to play this game, they were given a series of such strings, presented visually, and asked to write down their responses. In one condition, subjects did this with no further requirements beyond the stipulation that they make no noises while working on the puzzles. They were forced, in other words, to form images of these sounds and to make their judgments based on the images. In a second condition, subjects were blocked from talking to themselves while doing the task. This was achieved by requiring subjects to say “tata-ta,” aloud, over and over, while working on the puzzles. This manipulation severely impaired performance. Apparently, subjects need to subvocalize to do this task. In a third condition, subjects did the puzzles in a noisy environment, and this too impaired performance. Thus subjects need to hear themselves thinking to do this task. Finally, in a fourth condition, subjects had to say “ta-ta-ta” aloud and were in a noisy environment. This too was disruptive, but no more so than either of the individual manipulations of covert speech or hearing by themselves. One might worry that these effects simply reflect the distracting effect of noise, such that performance is better in a quiet environment than a noisy one. In two of the conditions just described, the noise is presented by the experimenter; in two other conditions, the noise is produced by
The Detachment Gain
141
the subjects’ own pronunciation of ta-ta-ta. In either case, the manipulations do produce sound, and this may lead to distraction. To check on this possibility, a further study compared three conditions. The first was a control condition in which subjects simply solved the puzzles already described. In another condition, subjects did this task while saying “tata-ta” aloud, over and over. In the final condition, subjects were asked to solve these puzzles while gritting their teeth together, pushing their tongue up against the roof of their mouths, and pushing their lips together. This “clamping” is silent, but occupies the muscles needed for subvocalization, and also preempts central circuits needed to control this subvocalization. And clamping was indistinguishable in its effects from the ta-ta-ta manipulation; both were reliably worse than the control procedure. It would seem, therefore, that it is the loss of subvocalization, and not simply the presence of noise, that produces these interference effects. A number of other tasks show a similar data pattern. For example, in another study, subjects were given the titles of highly familiar songs (“Happy Birthday to You”; “London Bridge”), and were asked whether the melody of each song rises or falls from the second note to the third (rises for “Birthday,” falls for “Bridge”). Again, subjects did this in one of four conditions: In two of the conditions, covert speech was blocked (by the requirement of concurrent articulation); in two, hearing was blocked (by the presentation of task-irrelevant noises). The data, as before, show good performance in the control condition, and impaired performance with either concurrent articulation or ambient noise. The double-interference condition (concurrent articulation and ambient noise) produced performance levels similar to those in the single-interference conditions.1 These data make it clear that auditory imagery does in some cases depend on subvocalized support. When this support is denied, subjects are unable to use their images for some purposes. More, this subvocalization seems to create a quasi-perceptual representation: If the channels of audition are unavailable (because of background noise), performance again suffers. Perhaps, then, Watson was not altogether mistaken: Thought sometimes does require enactment. This enactment, even if covert, produces a stimulus to which we are then able to respond. However, we need to put these data into a broader context and, in particular, the context provided by other findings showing that tasks requiring auditory imagery do not always rely on subvocalization. For example, Baddeley and Andrade (1995) simply asked subjects to reflect on the subjective vividness of their auditory imagery while doing the ta-ta-ta task. Their subjects reported a slight decrease in image vivid-
142
Daniel Reisberg
ness under these circumstances, but, according to their self-report, they were still able to maintain their (slightly impoverished) images. It would seem, then, that concurrent articulation need not devastate auditory imagery. Put differently, we can sometimes create auditory representations without subvocalized support. In addition, common sense strongly argues that some auditory images do not rely on subvocalization. After all, most of us can easily imagine sounds outside of the human vocal repertoire—the sounds of glass breaking or of brakes squealing. We cannot vocalize these sounds and so presumably we cannot subvocalize them either. Therefore, for these sounds, imagery cannot depend on subvocalization. This last claim, however, like the Baddeley and Andrade result, relies on an introspective observation. Is there some way to corroborate this claim? One line of evidence comes from Crowder and Pitt (1991) and focuses on subjects’ ability to imagine the timbres of various orchestral instruments. The key assumption here is that these timbres are outside of subjects’ vocal repertoires, and so, if subjects can imagine these sounds, this would demonstrate the ability to imagine sounds without subvocalized support. Crowder and Pitt’s data on this issue are instructive, but, in the interests of brevity, I note simply that the procedure they use provides only a coarse portrait of auditory imagery. Concretely, their data seem to show us that subjects’ auditory image of, say, a flute is more like a flute than it is like a tuba. But whether the flute image really “sounds like” a flute is another question. Thus Crowder and Pitt’s findings tell us little about the actual fidelity of these images, and, in particular, tell us little about whether subjects can, to some criterion, accurately imagine these sounds. Concerns such as this led us to seek a new paradigm to explore these issues. In our studies, subjects were asked to imagine two instrument sounds, say, the sound of a bassoon and the sound of an oboe. They were then asked to assess how similar these images were to each other on a seven-point similarity scale. Then they did the same with two other instruments, say, a clarinet and a violin, and so on for a number of other pairs. Once subjects had compared their images in this way, we were able to do a multidimensional scaling of their data in order to ask what the nature of the similarity space is for describing these various auditory images. We ran a parallel procedure with subjects actually hearing these sounds, rather than imaging them, and making the same pair-wise similarity judgments on these perceived sounds. Thus we can also assess the multidimensional similarity space for perception. Our hope was that the multidimensional space for imagined sounds would match reasonably well with the multidimensional space for perceived sounds. If this came to pass, it would be an indication that subjects’ images preserve all of the pair-wise similarities available in actual
The Detachment Gain
143
perception. And, if the full pattern of pair-wise relationships is preserved, then it seems fair to argue that subjects’ images are reproducing these sounds with some fidelity. Our data were very much in line with these predictions. Subjects in all conditions first heard “reminders” of these instrument sounds, played via computer; subjects could repeat this exposure sequence as many times as they wished until they felt certain that they knew these sounds well. (All stimuli presented in this experiment were digitized versions of actual instrument sounds.) Subjects in the imagery condition were then given a series of trials in which only the instrument’s name was supplied, and they were instructed to practice forming images of these timbres. Subjects then received the test trials, with two instrument names presented on each trial and subjects making a judgment about the similarity between the sounds of the two instruments named. Subjects in the perception condition ran through roughly the same procedure, but made their judgments based on actual sounds, rather than images. The data were averaged across subjects and then run through a standard multidimensional scaling algorithm, seeking the best reduction of the pair-wise relationships. Figure 10.1 shows the data from our imagery condition. The data are quite orderly, with the y-axis corresponding roughly to rise-time (abrupt starts are high, gradual starts are low) and with the x-axis corresponding to the pattern of harmonics (with brighter sounds to the right and duller sounds to the left). The pattern also agrees reasonably well with that reported by Iverson and Krumhansl (1993), using actually perceived sounds, with the main aberration lying in the flute, which one would expect, based on their data, to be lower and to the left in this plot. Figure 10.2 shows the data from our perception condition, and the match to the imagery data is reasonably good, but not perfect. For example, the bells, piano, and vibes cluster together as one group in both plots; in both plots, the cello and tenor saxophone are reasonably close to each other. But contrasts between the two plots are also easy to find: The bassoon, for example, appears in the lower right quadrant for the imagery plot, but in the upper left for the perception data. These visual comparisons of the two plots, however, are difficult to interpret, largely because the axes in this (or any) multidimensional scaling are arbitrary. To compare these two plots, therefore, we need a more sophisticated axis-independent assessment. One way to do this is suggested by Besag and Diggle (1977). This analysis begins by computing the point-to-point distances for every pair of points in the perception plot, and the same in the imagery plot. Next, we calculate the correlation between these two sets of distances. (That is, one data pair is the flute-to-clarinet distance in the imagery data, and the flute-to-
144
Daniel Reisberg
Figure 10.1. The two dimensional space summarizing subjects’ pairwise similarity ratings of their mental images of the sounds produced by various orchestral instruments. The y axis seems to correspond roughly to rise-time (with abrupt starts high, gradual starts low); the x axis seems to reflect the pattern of harmonics (with brighter sounds to the right and duller sounds to the left). See Figure 10.2 for corresponding data obtained with perceived sounds.
The Detachment Gain
145
Figure 10.2. The two dimensional space summarizing subjects’ pairwise similarity ratings of their perceptions of the sounds of various orchestral instruments. As in Figure 10.1, the y axis seems to correspond roughly to rise-time and the x axis seems to reflect the pattern of harmonics (with brighter sounds to the right and duller sounds to the left).
146
Daniel Reisberg
clarinet distance in the perception data. A second data pair is the clarinet-to-oboe distance in the imagery data, and the corresponding distance in the perception pair. And so on.) The correlation calculated in this way is 0.455 (p<0.001), indicating that there is a reasonable concordance between these two sets of distances, and thus reasonable alignment between these two multidimensional plots.2 Let us be careful, though, not to overinterpret these findings. As we have already acknowledged, the correspondence between figures 10.1 and 10.2 is not perfect. Perhaps this merely reflects the difficulty of making comparisons between two multidimensional spaces. Perhaps it reflects the difficulty, for our subjects, of remembering the full set of instrument sounds. (What does an English horn sound like, as opposed to an oboe?) Or perhaps it does reflect some limitations in the quality of subjects’ auditory imagery. Even with these cautions in view, the fact remains that there is a reasonable and highly reliable correspondence in this task between subjects’ perceptual and imagery judgments. This seems to be pushing us, therefore, toward an inelegant conception of auditory imagery, along the lines depicted in figure 10.3. In this figure, there are two different paths through which an auditory image can be created. In some cases, the image is created through subvocalization; this pathway seems crucial, for example, in the D-2-R task. In other cases, the image is created without subvocalization; this seems to be the case in our study of instrument timbres. Is this model necessary? Do we really need two pathways, through which images can be created? Or can the data be explained in more parsimonious terms? One could argue, perhaps, that one’s image of (say) a flute sound, or bassoon, does depend on subvocalization. In this case, one could maintain the claim that all auditory images depend on subvocalized support, eliminating the top pathway shown in the figure. There are several ways this claim might be developed and defended (cf. Baddeley and Logie 1992; Smith, Wilson, and Reisberg 1992), but it is a claim, in any case, easily addressed empirically. Two further groups of subjects were run through our instrument timbres procedure. One group of subjects was asked to compare their images of the various musical instruments, but was also blocked from subvocalizing. Thus each trial began with the instruction that subjects repeat “ta-ta-ta” aloud, and then, a moment later, the test instruments for that trial were named, and subjects made their similarity assessment. Subjects were then allowed to cease the articulation until the start of the next trial, when they began again. A second group of subjects did the same, but with actually perceived stimuli rather than imagined ones. Again, the data were averaged across subjects and assessed via a multidimensional scaling algorithm. If subjects’ images in this procedure de-
The Detachment Gain
147
Figure 10.3. The data suggest that there are two ways to produce an auditory image. The top pathway indicates “pure imagery,” created without subvocalized support. The bottom pathway indicates “enacted” imagery, dependent on covert speech (or covert singing).
pend on subvocalized support, then these results should contrast with the multidimensional spaces obtained without the requirement for concurrent articulation. But that is not what the data show. Instead, the multidimensional spaces obtained in this newer study resemble each other quite closely: A comparison of the imagery and perception data yields r = 0.704, p < 0.001. There is also a reasonable concordance between these new imagery data (with concurrent articulation) and the imagery data shown in figure 10.1 (r = 0.679, p < 0.001). There is a weaker correspondence, but still a reliable correlation, between these perception data (with concurrent articulation) and the perception data shown in figure 10.2 (r = 0.227, p < 0.001). Thus we obtain roughly the same results if covert speech is permitted and available for use, or if covert speech is denied. Apparently, therefore, the auditory images for this task are not being supported through subvocalization and so we can safely conclude that subjects can imagine sounds, with reasonable accuracy, without subvocalized support. It truly seems, then, that we are driven toward the unparsimonious model shown in figure 10.3. Some images are created and maintained by means of covert speech; I will refer to those as “enacted images.” Other images are created without this enactment; those I will call “pure
148
Daniel Reisberg
images.” (The terminology was first suggested by Reisberg, Smith, Baxter, and Sonenshine 1989.) A broadly similar point can be made based on studies of workingmemory rehearsal. It has long been known that this rehearsal is disrupted by concurrent articulation, and this has led many researchers to suggest that this rehearsal relies on subvocalized speech. However, a number of authors have raised questions about the interpretation of the working-memory evidence (e.g., Gupta and MacWhinney 1995). These questions led us to further experiments, and, as we will see, these experiments again highlight the “duplex” nature of auditory images, with some images created via subvocalization and some created in a more direct fashion, with no need for covert speech. To understand the debate over working-memory rehearsal, consider again the two pathways shown in figure 10.3. Both pathways, I have suggested, create a quasi-perceptual representation, a representation that resides in a buffer within the auditory system. Use of this buffer will be denied, however, if there are other noises present during the experimental procedure. These noises will also land in the auditory buffer, perhaps displacing, perhaps simply mixing with, the representation needed for imagery. In either of these cases, the incoming noise will disrupt the representation. In contrast, imagine what will happen if subjects are blocked from subvocalizing during an experimental treatment. Let’s say that subjects are forced to bite firmly on a pencil during the procedure, or to chew a great big wad of gum. These activities produce little noise, and so they will not disrupt the auditory buffer. But these activities do demand the articulators, and so these articulators won’t be available for subvocalization. Thus these sorts of activities will disrupt enacted imagery, but not pure. They will, in short, disrupt the bottom path in figure 10.3, but not the top path.3 Finally, imagine that subjects are asked to say “ta-ta-ta” aloud during an experimental task. This manipulation takes over the articulators and also produces noise, and therefore should have the effects of disrupting pure imagery (because the activity is noisy) and also enacted imagery (because the activity is noisy and because it blocks covert speech). The idea, in short, is that concurrent articulation—such as saying “tata-ta” aloud—has a double effect, and this renders ambiguous the effects of this manipulation. Put differently, if a task is disrupted by concurrent articulation, this is not enough, by itself, to allow the conclusion that the task relies on subvocalization. Instead, the task might rely on pure imagery, and be disrupted by the noisiness of saying “ta-ta-ta” aloud. We earlier confronted this same point in considering our D-2-R task, and there we showed that silent clamping of the articulators had the
The Detachment Gain
149
same effect as concurrent articulation. We concluded that the D-2-R puzzles do indeed require subvocalized support. But the same issue, and the same ambiguity, emerges with memory rehearsal. Prior studies make it clear that concurrent articulation disrupts this rehearsal, but why is this? Is it because concurrent articulation preempts covert speech? Or is it because concurrent articulation is noisy? A number of researchers have reported evidence pertinent to this issue, but, for our purposes, the most interesting studies are those that provide side-by-side comparisons between memory rehearsal and various imagery tasks. Several of our own procedures have been designed to yield these comparisons, but the clearest data on this issue come from a study by Dorothy Bishop at the Applied Psychology Unit in Cambridge. In half of the trials in her design, subjects were given a conventional digit-span task (with nine digits as the to-be-remembered material; spoken presentation; written recall). In the other half of the trials, the same subjects were tested for their ability to detect “verbal transformations” with imagined stimuli. Ordinarily, verbal transformations are perceived when an actual (not imagined) auditory stimulus is repeated over and over and over. Thus, for example, the word “stress,” repeated several times aloud, will soon be perceived as (repetitions of) “dress,” or, for some subjects, “rest.” The issue is whether these same transformations will be detected when the repetitions are imagined rather than perceived. The data are shown in table 10.1. In the top row, it is clear that concurrent articulation (“ta-ta-ta aloud”) is disruptive of memory-span performance, as one would expect, dropping performance from 65% correct to 57%. But silent clamping has no effect, and yields performance essentially equivalent to that in the control condition. The pattern is different, however, in the bottom row. For the verbal transformations, both interference manipulations yield performance reliably worse than that observed in the control condition (also see Reisberg et al. 1989; Smith et al. 1994). It also seems that, in this case, concurrent articulation is more disruptive than clamping—perhaps because the former is noise-producing, perhaps because it is more distracting (a dynamic activity, rather than the static state required for silent clamping). One way or the other, the key point remains: Silent clamping disrupts verbal transformations, and seems not to disrupt memory rehearsal. We do urge some caution in interpreting these data, since other studies in this series have yielded slightly different patterns, and the reasons for this are not yet clear. (For other data on this issue, and other complications, see Bishop and Reisberg 1996; Gupta and MacWhinney 1993; Smith, Wilson, and Reisberg 1996.) Nonetheless, these data do seem to suggest that memory rehearsal relies on the top pathway in figure 10.3,
150
Daniel Reisberg
Table 10.1 Summary of results from Bishop (1996) Main task
Memory span (percent correct) Verbal Transformations (words transformed)
Interference type None (control)
Clamp
65.3
65.7
1.97
1.60
Concurrent articulation 57.2 1.17
and so is disrupted by the noise produced by concurrent articulation. The imagined verbal transformations, in contrast, rely on the bottom pathway, and so are truly dependent on subvocalization, and disrupted if subvocalization is blocked—even if silently blocked, as in the clamping task. More broadly, these results confirm the claim that there are indeed two different ways to create an auditory image. Both are disrupted by the presence of noise; only one is disrupted by (silently) occupying the articulators. In other words, both pathways involve the auditory system, but only one involves articulation. (For related data, see Bishop and Reisberg 1996; Gupta and MacWhinney 1993; Smith, Wilson, and Reisberg 1996.) This now forces us to confront a series of questions that have obviously been waiting in the wings from the start: How do these two paths, each forming a mental image, differ? Why do some tasks require one path rather than the other? What is it about the D-2-R task or the imagined verbal transformations that demands enacted imagery (and thus is disrupted by silent clamping) even though another route toward creating auditory representations is available? Here is a hypothesis: When someone creates a mental image, the image comes into being with certain understandings attached to it—understandings similar to the ones inherent in the Gestalt principles of perception. If the task requires visual imagery, this initial understanding will include some specification of the figure-ground organization of the imaged scene, the orientation of the imaged form, its arrangement in depth, its segmentation, and the like. There is, of course, room for debate about how this organization is implemented—as part of the image itself, or as a series of constraints on how the image is inspected—but, one way or the other, this organization is in place, and so subjects’ images are not unorganized pixel patterns. Instead, subjects form images of organized perceptual objects, and thus their images reflect their perceptual understanding of that organization. To make this concrete, consider the simple form shown in figure 10.4. This form can be perceived as a black square on a white background, or as
The Detachment Gain
151
an aperture in a white surface, through which a (more distant) black surface can be (partially) viewed. Similarly, the black form can be perceived as a square, or as a rotated diamond. The picture shown in figure 10.4 is thus ambiguous. More precisely, the depiction itself is quite neutral as to interpretation, and is compatible with each of the interpretations just described. The perception of this figure, however, is not ambiguous; the form is perceived in one of the fashions just described or another. The perception, in other words, isn’t neutral, and somehow does, in one way or another, incorporate the perceiver’s understanding of the depicted form. Many other examples can also serve to illustrate this point. The drawing of the Necker cube is neutral as to interpretation, and so open to multiple interpretations. But the Necker cube is perceived in some determinate fashion, and, in fact, is perceived first in one way and then in another. This shift, from one perceptual organization to another, underscores the fact that the percept is indeed organized, and thus already interpreted in a way that pictures are not. My claim is that images are, in this regard, just like percepts, and not like pictures—perceptually organized depictions, and not mere pixel patterns.4 So far I have illustrated this point with examples from vision, but the same ideas apply both to perception and to imagery in other modalities. If a task requires auditory imagery, the subject will create an image with a certain perceptual understanding of the imaged form—an understanding that stipulates how the form is to be parsed, how the units group together, and the like. In this case, we should not think about the image as though it were some sort of echo or spectrogram. Instead, the image is an image of a perceptually organized auditory event. This theoretical sketch leaves a great deal unsaid, but even in this skeletal form this view has clear implications for how imagery functions, and for how subjects can use their imagery to make new discoveries or to reach new insights. In essence, the proposal is this: An image,
Figure 10.4. A simple ambiguous figure. Is this a black square on a white background, or a squareshaped aperture in a white surface, through which one can see a portion of a black field?
152
Daniel Reisberg
I have suggested, is more than a mere depiction; instead, the image is an interpreted, organized depiction. Processes leading to image-based discovery, therefore, must take as their starting point this organized depiction, and this puts important constraints on image-based discoveries. To see how this matters, consider two contrasting cases. In one, subjects are seeking to make an image-based judgment or an image-based discovery that is entirely compatible with their initial organization of the target form. In this case, image-based discoveries flow quite readily. If the image is visual, subjects gain little advantage from “reproducing” the image in an actual (out-in-the-world) picture. If the image is auditory, subjects gain little advantage either from overtly voicing the imaged sound, or from subvocalization. In other cases, though, subjects seek to create an image, but then must reanalyze it. Perhaps the discovery depends on a change in one’s assumptions about how the form is segmented, or perhaps the discovery requires that subjects blur together two or more of the imaged elements. These are cases in which the subjects’ task requires them to change their understanding of the form, such that their goal is incompatible with the imaged object as initially organized. Image-based discovery of this sort turns out to be surprisingly difficult and heavily dependent on hints (typically: hints suggesting exactly how the imaged form might be reorganized). This sort of discovery is also much easier with an actual stimulus (auditory or visual) than with a mental image in the corresponding modality. And this sort of discovery also seems, in the auditory case, to require subvocalized support. The D-2-R task provides an obvious example in this category. An image is initially formed with three temporal segments (“dee,” “two,” “are”) and these must be collapsed into two segments, plainly demanding a shift in how the form was initially understood. This task, therefore, requires subvocalized support. The verbal-transformation effect provides another example, in this case demanding a shift in how the auditory stream is parsed. Other examples show the same pattern with nonlinguistic stimuli (e.g., Smith et al., Experiment 3). In all cases, if a change in parsing is required, or if a memory chunk must be decomposed into its elements, then subvocalization seems to be needed. But why does subvocalization help with these tasks? One plausible suggestion is that, with these tasks, you need to set aside your initial perception, and, in essence, reperceive. This creates the possibility that you will arrive, on the “second look,” at a different perceptual product than you did the first time around. To make this possible, you need to jettison your initial view of the imagined event and go back to the “raw material,” the initial data specifying that event. For this, you need access to this raw material—you need, in short, a stimulus. And
The Detachment Gain
153
what subvocalization provides is a stimulus—the raw material, serving as grist for perception, existing independent of your understanding of it. There is no need for this restructuring, no need for a “second look,” if a task depends on a judgment that is compatible with your initial understanding. For such tasks, you do not need to jettison one organization in order to find another. Hence there is no need to return to the raw material, and, indeed, no need for the raw material. Therefore, there is no need for subvocalized stimulus support, and no cost if subvocalization is unavailable. To put this in slightly different terms, the idea is that subvocalization allows you to gain some distance, and, crucially, some detachment, from your own mental products, in order to inspect these products as a stranger might. This allows a “new start” with the information, and makes possible a fresh perspective that (potentially) can lead to fresh discoveries. An impressive range of data fits with these suggestions. As just one example, consider these tasks: Would these strings sound alike, if pronounced aloud: HEJ and HEDGE? RANE and REIGN? These judgments can be made about an intact unaltered representation, and, as it turns out, these judgments can be done with subvocalization blocked, in a noisy environment, or with both subvocalization blocked and a noisy environment (Smith et al. 1996). In contrast, would these strings rhyme if pronounced aloud: TAPE and FAIP? GAUZE and PAWS? These judgments do seem to rely on subvocalization, and are compromised if subjects are blocked from covert speech. The rhyme and homophone tasks are obviously quite similar—both require a judgment about how two strings are related; both hinge on a comparison between mental representations of sound (or, at least, representations of phonology). With appropriate stimulus selection, the two tasks can also be well matched in terms of overall difficulty. Despite these commonalities, however, a crucial difference remains: The homophone judgment rests on an assessment of intact representations—created and then judged as whole units. The rhyme judgment, in contrast, rests on an assessment of representations that must be constructed and then dissected, in order to ignore all-but-the-final sound of the target string. It is this dissection that requires stimulus support—requires a stimulus that subjects can “step away from,” in order to change their thinking about the form’s temporal boundaries. This stimulus support could be provided by an actual overt stimulus—if, for example, the subject chose to read these visual strings aloud. Or it can be provided covertly, via subvocalization. It is this latter path that subjects generally use, and it is this latter path that is blocked by concurrent articulation.
154
Daniel Reisberg
(For other data relevant to these themes, see Reisberg 1996; Reisberg and Logie 1993; Reisberg et al. 1989; Smith et al. 1996.) Let me close this chapter, though, by putting these ideas into a larger context, in order to emphasize the breadth of issues that may be at stake here. First, I have on occasion described this work to psychotherapists, and they mention the fact that it’s sometimes useful in therapy to taperecord the client, and then allow the client to listen to himself or herself. Equivalently, the client is sometimes asked to write down an account of this or that issue and then the client and the therapist together read this account. The idea is that, in these cases, there is a lot to be gained by ripping ideas out of their initial mental context, undoing, in essence, encoding specificity, and then examining these ideas with some detachment. This seems to me an interesting idea, and it says something, I believe, about the processes of discovery, and also about the way our thoughts take some definition from a broader mental context. But, in any event, we can put this example alongside of our auditory imagery work, with the suggestion that we may be seeing parallel phenomena here—similar effects in highly different settings. A second example comes not from the clinic but from the classroom. All instructors know that teaching material, be it psych 1 or an advanced course, is a terrific way to learn that material. Why is this? Part of the answer, of course, lies in the thought one has to give to the material in order to figure out the best sequence of presentation, the best level of specificity, the right explanatory metaphors, and so on. But there is also, I believe, a gain from the teaching activity itself, from getting the ideas outside of you, into the public arena, so that, again, you can hear those ideas as a stranger might. This, too, seems a parallel phenomenon to the one at stake in our research. It is also one of the reasons, by the way, why I believe it is particularly important to train graduate students to teach—as a way of making them better teachers, of course, but also as a way of making them better thinkers. And, of course, this is an approach to graduate education strongly associated with the University of Pennsylvania’s psychology department, and, in particular, associated with both Henry and Lila Gleitman. Third, and finally, we still need to ask: So why does Henry think out loud? I’m willing to entertain the idea that this, too, is the same phenomenon. Thinking out loud is, in some ways, just like writing a paper. In both cases, you’re forced to fill in the gaps in your thinking, so that the argument is fluent. In both cases, you’re forced to articulate ideas that had, until that point, been vague intuitions. In both cases, the task of remembering all of one’s points is eased, and this frees up capacity for other chores, including, of course, the evaluation of these points.
The Detachment Gain
155
But, beyond these fairly obvious suggestions, I also believe there’s something I’ll call a “detachment gain,” a gain from being able to rip the ideas out of their initial context. It’s interesting, for example, that you gain something by writing your ideas down, and you gain even more by then setting the written document aside and returning to it a few days later. Why is this? If the gains come merely from filling-in of gaps, or from a reduction in memory load, then the full benefit should be realized by writing the ideas down in the first place. Why, therefore, do we observe a further gain from “increasing the distance” between the author and the product? My suggestion is that this sort of “detachment gain” is similar in its origins to the effects I have described in our imagery experiments. In both cases, this “externalization” allows one to remove the idea from the context of understanding in which it was created. And that removal leads to the possibility of new discoveries that might not have been obtained in any other fashion. This is a substantial gain. It is a gain easily demonstrable in the laboratory. It is a gain that motivates many of us to write. It even motivates some of us to write textbooks. It’s part of what motivates many of us to teach. And it is part, I think, of why Watson was correct in suggesting that some of us, with laryngitis, are impressively reduced in our mental capacities. Notes 1. Of course, the aspects of covert speech relevant for this task are likely to be different from the aspects needed for the D-2-R task. To judge pitch, subjects need information about their own vocal chords (or at least the planning mechanisms for the vocal chords). For the D-2-R task, they need access to information about the articulators. Thus, although both of these tasks depend on subvocalization, they may rely on different parts of subvocalization. For discussion of this point, see Smith, Wilson, and Reisberg 1996, especially their discussion of Experiment 1. 2. To compute the reliability of this correlation, one first needs to determine the sampling distribution for this statistic; this can be done via a Monte Carlo technique, asking what the correlation would be if there was no linkage between these two data sets, that is, if there was no tendency for positions to be “preserved” as one moved from one of these plots to the other. This can be calculated by considering the correlations obtained when these two sets of distances—in the imagery data and in the perception data—are randomly paired. This technique indicates that the expected value for these correlations, under the null hypothesis, is r = –0.005, with a standard deviation of 0.093. For further discussion, see Besag and Diggle (1977). 3. To be more precise, these activities block usage of the planning mechanisms needed to coordinate and control subvocalization. Thus the locus of interference is not the articulators themselves, but the more central systems that control the articulators. For discussion of this point, see, for example, Baddeley and Wilson 1985, or Smith et al. 1994. For relevant data, also see the chapter by Jonides, this volume. 4. For related arguments, see Pylyshyn (1981). I might add, however, that my views of imagery have been misconstrued by a number of colleagues, and so a note of clarification seems worthwhile. The claim is not that images are “mere descriptions.” Instead, it seems undeniable that images do depict the represented form—that is, images show
156
Daniel Reisberg
(don’t describe) what the form looks like. The key, however, is that images depict in a special way: Unlike pictures, images are perceptually organized. Thus the historical debate, asking whether mental images are more like “pictures” or more like “propositions,” simply gave us the wrong alternatives. Images (like pictures) depict, but also (like propositions) are structured in a fashion that renders interpretation unambiguous.
References Baddeley, A. D. and Andrade, J. (1995) The impact of concurrent articulation on auditory imagery. Unpublished manuscript; Applied Psychology Unit; Medical Research Council; Cambridge, England. Baddeley, A. and Logie, R. (1992) Auditory imagery and working memory. In D. Reisberg (ed.), Auditory Imagery. Hillsdale, NJ: Erlbaum Associates. Baddeley, A. and Wilson, B. (1985) Phonological coding and short-term memory in patients without speech. Journal of Memory and Language 24:490–502. Besag, J. and Diggle, P. J. (1977) Simple Monte Carlo tests for spatial pattern. Applied Statistics 26:327–333. Bishop, D. and Reisberg, D. (1996) The role of articulation in generating, maintaining, and manipulating speech-based representations. Unpublished manuscript; Applied Psychology Unit; Medical Research Council; Cambridge, England. Crowder, R. and Pitt, M. (1992) Research in memory/imagery for musical timbre. In D. Reisberg (ed.), Auditory Imagery. Hillsdale, NJ: Erlbaum Associates. Crutcher, R. (1994) Telling what we know: The use of verbal report methodologies in psychological research. Psychological Science 5:241–244. Ericsson, K. and Simon, H. (1980) Verbal reports as data. Psychological Review 87:215–251. Gupta, P. and MacWhinney, B. (1993) Is the phonological loop articulatory or auditory? Journal of Memory and Language 33:1–26. Iverson, P. and Krumhansl, C. (1993) Isolating the dynamic aspects of musical timbre. Journal of the Acoustical Society of America 94:2595–2603. Payne, J. (1994) Thinking aloud: Insights into information processing. Psychological Science 5:241–248. Pylyshyn, Z. (1981) The imagery debate: Analogue media versus tacit knowledge. In N. Block (ed.), Imagery (pp. 151–206). Cambridge, MA: MIT Press. Reisberg, D. (1996) The non-ambiguity of mental images. In Cornoldi, C., Logie, R., Brandimonte, M., Kaufmann, G., and Reisberg, D. (eds.), Stretching the Imagination: Representation and Transformation in Mental Imagery (pp. 119–172). New York: Oxford University Press. Reisberg, D. and Logie, R. (1993) The ins and outs of working memory. In M. IntonsPeterson, B. Roskos-Ewoldsen, R. Blake, and K. Clayton (eds.), Imagery, Creativity, and Discovery (pp. 39–86). Hillsdale, NJ: Erlbaum Associates. Reisberg, D., Smith, J. D., Baxter, D. A., and Sonenshine, M. (1989) “Enacted” auditory images are ambiguous; “Pure” auditory images are not. Quarterly Journal of Experimental Psychology 41A:619–641. Schooler, J., Ohlsson, S., and Brooks, K. (1993) Thoughts beyond words: When language overshadows insight. Journal of Experimental Psychology 22:166–183. Smith, J. D., Reisberg, D., and Wilson, M. (1992) The role of inner speech in auditory imagery. In D. Reisberg (ed.), Auditory Imagery (pp. 95–119). Hillsdale, NJ: Erlbaum Associates. Smith, J. D., Wilson, M., and Reisberg, D. (1996) The role of subvocalization in auditory imagery. Neuropsychologia 33:1433–1454. Wilson, T. (1994) The proper protocol: Validity and completeness of verbal reports. Psychological Science 5:249–252.
Chapter 11 An Update on Gestalt Psychology Philip J. Kellman Long ago, in my first year of graduate school at Pennsylvania, I heard rumors about a mythical gathering of zealous researchers, whose weekly deliberations extended far into the night and disbanded only when exhaustion finally overcame insight. That year, the Gleitman research seminar was only myth, as Henry was ill. His recovery led to the seminar’s revival, and both were sources of joy in the department. When I joined the seminar, I was just beginning to learn about perception. Henry was and is a great interpreter of all things psychological, but in his heart, I believe, perception holds a special place. No doubt some of his interests in perception were traceable to his time on the faculty at Swarthmore College. He was there during the height of the Gestalt influence, interacting with Köhler, Wallach, and others. As Elizabeth Spelke and I began research on the developmental origins of principles from Gestalt psychology, Henry richly conveyed much of that tradition. He was always encouraging about our chances of answering some very old questions about perceptual organization; his support meant much to our fledgling project. Meanwhile, he kept testing my emerging ecological theoretical leanings with his own empiricist ones. In the seminar, Henry’s insights and those of others improved many a research project. Lila, in particular, had me baffled. I wondered if there were another L. Gleitman who was famous in psycholinguistics, as her comments about perception research showed such depth and wisdom that that they could only have come from a specialist in perception. Although we used them to guide our initial studies of the development of perceptual organization (e.g., Kellman and Spelke 1983), the Gestalt principles of object segregation, which had been applied to occlusion situations by Michotte, Thines, and Crabbe (1964), were vague and a bit numerous. In time, my own research has come back to these principles, trying to develop from them more precise ideas that could form part of computational and neural models of perception. In this gratifying and challenging enterprise, I have worked closely with Tim Shipley, whose dissertation Henry and I co-supervised in 1988. Much of
158
Philip J. Kellman
what we have accomplished and much of what remains to be done can be characterized as an update of Gestalt psychology. Transforming the Gestalt insights into a detailed understanding of perceptual computations is important for diverse reasons: It advances our understanding of adult human visual perception, sheds light on perceptual development, and informs attempts to make artificial vision systems that could produce descriptions of physical scenes from information in reflected light. In this chapter, I will try to make clear what has become of various Gestalt principles in some current research. The Gestalt principles have been applied in many domains, but their original and most familiar home is the domain of object perception, in particular the problems of visual segmentation and grouping. This is the domain I will consider in discussing the computational legacy of Gestalt ideas. The basic problems in segmentation and grouping are easy to describe and illustrate. In a sheaf of light rays arriving at the eye, no ray of light is physically connected to any others. Some image descriptions likewise preserve information separately for each physical location. A digitally encoded image might list for each location (pixel) intensity and chromatic values. There is no linkage between pixels 384 and 385, for example. What we get perceptually from the light rays coming from a real scene
Figure 11.1. Examples of boundary and surface interpolation. a) Partially occluded object. b) Illusory object. c) Apparent transparency.
An Update on Gestalt Psychology
159
or a digitized image is quite different. It is a description of objects and surfaces in a three-dimensional (3-D) space. The objects are seen as detached or separable from adjacent objects and surfaces, and each object unites many visual directions or pixels. The problems of segmentation and grouping involve a mapping from the optic array onto these representations of objects. How do we group together and separate regions to achieve these, usually accurate, representations of the objects in a scene? Perhaps the most vexing part of the problem is that parts of an object often reflect light to the eyes from several spatially-separated areas. In figure 11.1a, the black object appears as a single entity whose contours are partly occluded in several places. How can a human viewer or a computer vision system connect separate visible regions and represent their hidden contours, surfaces, and overall shapes? This set of problems will be our focus. The Identity Hypothesis in Object Completion There is one idea it will be helpful to introduce and place in the background: what we have called the identity hypothesis in object completion. There are a number of different-looking phenomena in which the visual system accomplishes segmentation and grouping by supplying hidden contours and connecting regions. Some of the phenomena are shown in figure 11.1. Along with the partial occlusion display in figure 11.1a, figure 11.1b shows what is usually called an illusory object or illusory figure, and figure 11.1c shows an apparently transparent (translucent, really) object. The identity hypothesis states that these different-looking perceptual completion phenomena are caused by the same underlying process. In these displays, the same parts of the central figure are defined by luminance edges, and the gaps across which edges are interpolated are the same. Those are formal similarities, but what I am suggesting is that the same gap-surmounting process is at work in all of these cases as well. The differences in our phenomenology for the various cases have to do not with differences in interpolation processes, but with how the interpolated edges and surfaces are situated relative to other surfaces in the array (especially whether they are in front or behind). The arguments and data suggesting a common interpolation process can be found elsewhere (Kellman and Shipley 1991; Kellman, Yin, and Shipley 1998; Ringach and Shapley 1996). Here I give one example to convey the general idea. In figure 11.2, we see yet another perceptual segmentation and completion phenomenon, called a self-splitting figure or SSO. The particular SSO shown is one constructed by Petter
160
Philip J. Kellman
(1956) and later discussed by Kanizsa (1979). The display has several interesting properties. As noted above, it resolves into two distinct objects—boundaries get constructed through homogeneously colored regions of the display. A second interesting property, and our immediate concern, is the depth relationship between the two objects. At the top of the display, the righthand ring appears to pass in front of the left, whereas at the bottom, the lefthand ring passes in front of the right. These effects of perceptual organization appear to be strong and consistent across observers. At the top, where the righthand ring crosses in front, its contours are classic illusory contours and its surface is said to be modally completed (Michotte et al. 1964), meaning it has a sensory presence. In the same visual direction, the lefthand ring has a partly occluded surface and contours, sometimes called amodal completion. Amodal means that the hidden surfaces are perceived or represented, but they do not have local sensory presence. (You could not answer a question about the presence or absence of a smudge on the occluded surface because, after all, it is occluded.) The phenomenological difference between illusory and occluded contours and surfaces has led many to think that these are phenomena of very different character, the former explainable by sensory mechanisms and the latter involving cognitive processes. On the identity hypothesis, these involve, at least in part, the very same interpolation mechanisms, and the phenomenological difference concerns whether, in the final percept, the interpolated surface forms in front of or behind some other surface in the scene. Here is where Petter’s effect comes into the story. In displays such as the rings in figure 11.2, Petter observed that a simple rule governs which object will be seen as in front, having illusory contours, and which will be seen as going behind. The object that must be completed across the smaller gap always ends up in front, and the object that traverses the larger gap ends up behind. From this observation, which appears to be correct, we can make the following logical argument. If the final “illusory” or “occluded” status of a contour depends on some comparison with another interpolated contour, then some mechanism that interpolates contours must operate before the final status as illusory or occluded is determined. For an explicit or implicit comparison to take place, the visual system must recognize both contour completions crossing at that site. In other words, the mechanism that interpolates contours is not “modal” or “amodal.” (For other phenomena and data that converge on the same point, see Kellman, Yin, and Shipley 1998.) The idea of a common underlying mechanism producing phenomena whose subjective appearance differs so greatly is somewhat surprising,
An Update on Gestalt Psychology
161
Figure 11.2. Self-splitting Object (SSO) after Petter (1956). Although the display is homogeneous in color, it is perceived as two bounded objects. The ring on the right tends to appear in front of the ring on the left at the top of the display but appears to pass behind the ring on the left at the bottom. (See text.)
and there is residual controversy about what exactly is shared and what must differ in different-looking cases of visual completion. In what follows, the identity hypothesis will not be our focus, but it will allow us to move between experiments and data involving illusory and occluded objects and boundaries without distinguishing these cases. Gestalt Principles and Unit Formation Segmentation and grouping, illusory contours and transparency phenomena all involve issues of unit formation, determining what goes with what. The Gestalt psychologists first inquired into these problems, and Gestalt principles have been applied to all of these phenomena (Kanizsa 1982; Michotte et al. 1964). It is a nice consequence of the identity hypothesis that our “updates” of certain Gestalt principles will apply to all of them as well. We now look at particular principles and examine their legacies in more recent work. Good Continuation In his classic (1921) paper “Untersuchungen zur Lehre von der Gestalt” (“Laws of organization in perceptual forms”), Wertheimer gave a number of examples illustrating what he called the “Factor of Direction” or the “Factor of Good Curve.” Despite offering these two formal names, another of Wertheimer’s phrases used in passing—”good continuation”—has stuck as the name of this principle. Figure 11.3 shows some examples redrawn from Wertheimer (1921). Despite the compelling and varied nature of the demonstrations, Wertheimer’s definition of this principle is rather vague. In fact, the displays are meant to convey the following idea, without any formal definition:
162
Philip J. Kellman
Figure 11.3. Examples of the “Factor of Direction” (Good Continuation) from Wertheimer (1923). a) The segments labeled A and D appear to form a unitary object, as do those in C and B. b) Despite the possible appearance of three closed regions, the display is usually seen as containing a unitary curved edge and a square-wave.
On the whole, the reader should find no difficulty in seeing what is meant here. In designing a pattern, for example, one has a feeling how successive parts should follow one another; one knows what a “good” continuation is, how “inner coherence” is to be achieved, etc.; one recognizes a “good Gestalt” simply by its own “inner necessity.” Despite its intuitive importance, it is hard to find in the seventy or so years since Wertheimer any explicit definition of the “good” in good continuation. One obvious candidate is to relate “good” to mathematical descriptions of contours. A function is often called “smooth” in mathematics if it has no discontinuities in the first derivative. A discontinuity would correspond to a sharp corner, that is, a point at which there is no unique slope of the function. But there are other notions of
An Update on Gestalt Psychology
163
smoothness, involving higher derivatives. In the design of automobile bodies, for example, smooth might mean differentiable at least two or three times (Prenter 1989). Which notion captures the phenomena of human visual segmentation and grouping? Surprisingly, this issue has been the subject of little empirical investigation. Some of the issues, and some clues to the answers, are illustrated in figure 11.4. In (a), there is first-order continuity between parts A and B, but a second derivative discontinuity between them. Between A and C there is a first-order or tangent discontinuity (TD). Perceptually, A and B appear unitary whereas C appears separable. In (b), there is a smaller direction change between A and B than between A and C. Direction change might therefore predict that A and B will be linked more than A and C. On the other hand, both B and C have a TD with A. Perceptually, neither B nor C appears to have continuity with A, suggesting the importance of TDs in segmentation. In (c), parts A and B are not distinguishable as separate parts; as parts of a constant curvature arc, they agree in all derivatives. The case is different in (d). Here, a straight segment (B) meets a constant curvature segment (A). The two parts agree in the first derivative, but there is a second-order discontinuity. Nevertheless, the two parts appear smoothly joined. All of these examples suggest that TDs lead to segmentation and their absence—agreement in the first derivative—facilitates joining. Apart from these considerations about continuous contours, we may ask what relationships between separated contours lead to their perceptual connections, as in partially occluded and illusory objects? Here again, the notion of good continuation has been invoked (e.g., by Michotte et al. 1964), but without any specific definition. Parts (e) and (f) of the figure illustrate the same contour relations as (c) and (d) with gaps now caused by occlusion. Both displays produce the appearance of a unitary contour passing behind the occluder. Formalizing Good Continuation: Ecological Constraints and Computational Theory Seventy years after Wertheimer, the intuition behind the principle of good continuation is still important. Making this idea useful in models of human and computer vision requires first of all a precise mathematical specification. It also requires placing continuity in the context of the general problem of scene segmentation and object perception. Instead of starting with particular contours and patterns, we need to pose briefly the question of how objects reflecting light make available information that might be used to segment and group the world into discrete objects and surfaces. These are questions of ecological optics Gibson
164
Philip J. Kellman
Figure 11.4. Examples illustrating the importance of first-order continuity and discontinuity. a) Segments A and B, which are first-order continuous, appear connected moreso than A and C or B and C. b) A first-order or tangent discontinuity divides A, B and C. c) Apparently unbroken contour made from A and B segments, where all derivatives agree at the point of connection. d) Apparently unbroken contour made from A and B segments where the first derivatives of A and B agree at the point of connection, but there is a discontinuity in the second derivative. e) and f) Contours in c) and d) under partial occlusion. (See text.)
An Update on Gestalt Psychology
165
(1966, 1979) and computational analysis (Marr 1982). I consider this context first and then return to good continuation. Multiple Tasks in Object Perception Object perception involves multiple computational tasks. The first is edge detection. Different physical objects, having different material composition, will tend to produce reflected light of differing luminance and spectral composition. Accordingly, abrupt changes in luminance and spectral characteristics are likely to indicate locations of object edges. Not all such changes are boundaries of objects, however. Some are shadows; others are textural markings on continuous surfaces, and so on. Some classification process must distinguish surface edges from these other cases. Much of edge classification may be achieved by coordinating information from a luminance map of a scene with a depth map, gotten from stereoscopic information. For a moving observer, there will also be available a motion map, assigning to each location a velocity vector (see, e.g., Lee 1974). Discontinuities in the depth and motion maps will correspond to true surface edges with less ambiguity. The most common type of edge emerging from these initial analyses is the occluding edge. It is a contour that bounds an object or surface on one side. Each occluding edge indicates where something ends in the scene, but each also marks a mystery. If a person is seen standing in front of a car, the image contour separating the visible surfaces of the car and the person is a boundary of the person but not the car. At this contour, the car’s surface disappears behind. The mystery is where it goes. Determining which side bounds the object is called boundary assignment (Koffka 1935). Nakayama, Shimojo, and Silverman (1989) suggested that an image contour be labeled intrinsic to a surface region if it bounds that region and extrinsic if it does not. Boundary assignment may not be implemented as a separate process. When depth or motion information is available, it is computationally simple to recover edges and the relative depth order of two surfaces at those locations. Because depth order determines boundary assignment (the nearer surface always owns the boundary), boundary assignment and edge classification may occur together. To this point, we have a representation of occluding edges, partially bounding surface regions. Now we are in a position to consider how Gestalt notions of continuity can be implemented in perceptual processing. The story has two parts. The first involves particular locations in images in which edge continuity is disrupted—what I called above tangent discontinuities (TDs). A TD is nothing more than a sharp corner
166
Philip J. Kellman
Figure 11.5. Occlusion display from figure 11.1 with all tangent discontinuities indicated by arrows.
where contours meet.1 At such a point there is no unique slope of the contour. TDs thus characterize all the standard types of contour junctions—“T,” “X,” “Y,” arrow, and so on. To be a junction means to be a TD. In our updated notion of good continuation, a TD is the key concept. TDs in one of the displays from figure 11.1 are marked in figure 11.5. Referring to figure 11.1, it can be seen that all of the interpolated edges, in all displays, begin and end at TDs. The ecological importance of TDs is straightforward. It can be proven (see Kellman and Shipley 1991, appendix B) that every instance in which an object boundary is partly occluded produces a TD at the place where the boundary goes out of sight. Thus TDs are potential loci of occlusion. They also mark the transition points from extrinsic to intrinsic contours of a surface region. For hidden parts of objects, TDs are where we pick up the trail of where their hidden edges might go. Relatability Some TDs are merely the visible corners of objects. Not all are loci of occlusion. Moreover, even when a TD is a locus of occlusion, there remains the question of where the occluded part of the boundary goes. More is needed to determine object boundaries. Here we come to the second part of the implementation of the Gestalt idea of good continuation, what we have called relatability (Kellman and Shipley 1991, 1992). Relatability formalizes good continuation. It constrains unit formation based on an assumption that object boundaries tend to be smooth. Specifically, relatability expresses the conditions required to connect two edges by a smooth (at least once differentiable) and monotonic (singly
An Update on Gestalt Psychology
167
Figure 11.6. Construction used to define relatability. a) E1 and E2 are surface edges; R and r are perpendiculars to the tips (points of tangent discontinuity) of E1 and E2, assigned so that R > r. is the angle between R and r, E1 and E2 are relatable if 0 ≤ R cos ≤ r. b) Illustration of relatable edges. c) Illustration of nonrelatable edges. Either a doubly inflected curve or introduction of tangent discontinuities are required to connect two nonrelatable edges.
inflected) curve that agrees with the tangents of the two edges being connected at the point where each leads into a TD. We will define relatability with edges separated in the optical projection and show that the case of continuous edges (zero gap) is a limiting case. Relatability can be defined using the construction shown in figure 11.6a. E1 and E2 are edges of surfaces. Let R and r be perpendiculars to these edges at the point where they lead into a TD. Let R be the longer of the two perpendiculars, and let the angle be the angle of intersection of R and r. Intuitively, when relatability holds, there will always be a smooth, monotonic curve that can be constructed, starting from the endpoint of E1 (and matching the slope of E1 at that point) and proceeding through not more than a 90-degree bend to the endpoint of E2
168
Philip J. Kellman
(and matching the slope of E2 at that point). When R cos > r, any connection between E1 and E2 would have to be doubly inflected (if it matched the slopes at E1 and E2) or would have to introduce sharp corners where the interpolated edge meets E1 and E2. (See figure 11.6c.) According to this model, visual boundary interpolation does not occur in such cases. Formally, E1 and E2 are relatable iff: 0 ≤ R cos ≤ r This statement can be unpacked in two steps. The righthand side of the inequality simply states that the projection of R onto r (R cos ) falls within the extent of r. Whenever the length of r is less than projection of R onto r, the edges are not relatable. Second, the curve constructed to connect the two edges cannot bend more than 90 degrees. This limitation is expressed by the lefthand side of the inequality, because cos will be negative for > 90. Below we will see that relatability should involve all three spatial dimensions, although we have defined it here in terms of two. A good deal of work, however, can be done with 2-D edge relations alone, because the smoothness of objects in the 3-D world has consequences for their 2-D projections. It can be shown using elementary projective geometry that collinear edges, smooth curves, and sharp corners in 3-space always project onto collinear edges, smooth cuves, and sharp corners in a 2-D projection (excluding degenerate cases, such as projection of a line to a single point). Thus, much of the information about object smoothness and edge relations is preserved in the optical projections reaching the eyes, even in a static, 2-D image. Experimental Evidence about Relatability A variety of experimental evidence supports relatability as a formal description of connections formed by the visual system under occlusion and in illusory contours (Kellman and Shipley 1991; Shipley and Kellman 1992a). Some of the best comes from an elegant paradigm introduced by Field, Hayes, and Hess (1993). Field et al. used arrays of oriented Gabor patches, small oriented elements consisting of a sinusoidal luminance pattern multiplied by a Gaussian window. A Gabor patch closely approximates the best stimulus for the oriented filters found in simple cells of V1, the first visual cortical area. Displays used by Field et al. contained randomly placed, spatially separated elements varying in orientation. Some displays contained a “path.” A path was constructed by having the a sequence of several nearby elements having the same angular relationship, for example, successive elements were collinear, or successive elements differed by 15 degrees, etc. In the
An Update on Gestalt Psychology
169
experiments, subjects on each trial judged which of two successively and briefly presented arrays contained a path. When the positional and angular relations satisfied the relatability criterion, subjects performed very well at this task. When the path consisted of a sequence of elements rotated 90 degrees, so that relatability was violated, performance was much poorer. It appears that certain edge relationships lead to edge connections which become salient, perhaps in parallel across large regions of the visual field. The study also supported the idea that edge connections decline as the angle varies from collinearity, with a cutoff around 90 deg. Strength of interpolation also depends on the relative extents of the physically specified edges and gaps in a scene. Interpolation strength appears to be a linear function of the “support ratio”: the ratio of physically specified edge lengths to total edge length (physically given edges plus gap length) over a wide range of display sizes (Shipley and Kellman 1992b; Lesher and Mingolla 1993). This relationship makes precise a version of the Gestalt law of proximity, that nearer elements are more likely to be grouped together. Relatability in Cases of Minimal Gaps We have defined and illustrated relatability in the context of occlusion and illusory contours—cases in which the visual system constructs connections across spatial gaps. In the classic Gestalt examples, good continuation was illustrated as determining the breakup of unoccluded displays, without appreciable gaps, into separate objects (as in figures 11.3 and 11.4). Unoccluded displays may be considered as a limiting case of relatability—the case where the gap is zero. (Actually, nearly zero. The contours of the perceived figures do overlap, producing minute occlusions and illusory contours.) In such cases, the “connection” of edges is the continuation of the edge that proceeds smoothly through a junction. We saw relevant examples in figure 11.4. These examples fit the definition of relatability in that smoothness resides in the first derivative. Connecting a straight segment (zero curvature) with a segment of positive curvature yields a well-defined first derivative at the point of connection but a discontinuity in the second derivative, yet figure 11.4d appeared to have perceptual continuity. In contrast, the sharp corner in figure 11.4b disrupts continuity of segment A with both B and C. This analysis of relatability at the limit sheds light on typologies of contour junctions in human and artificial vision (Clowes 1971; Waltz 1972). In a “T” junction, the contour that does not change direction indicates the boundary of a surface, whereas the other contour passes be-
170
Philip J. Kellman
neath. A “Y” junction is different in that no contour continues smoothly; all come to an end at that point in space. It has been suggested that the “Y” provides information for an object corner. Relatability subsumes these observations about contour junctions under a more general principle for connecting and segmenting visual arrays. 3-D Relatability: Depth Information in Object Completion For convenience, we defined the notion of relatability in a plane. Perception of object unity and boundaries in the 3-D world requires taking into account 3-D relationships of contours, however. Over the years, several demonstrations of 3-D contour completion have been devised. One is shown below in figure 11.7. If this display is viewed stereoscopically (free-fuse by crossing or diverging the eyes), it gives rise to a 3-D illusory contour on one side and a 3-D occluded region on the other. Binocular disparity places the inducing edges at particular 3-D orientations, and contour interpolation processes build the connections, smoothly curving through three dimensions, across the gaps. The demonstration suggests that interpolation processes take 3-D positions and relations as their inputs and build connections across all three spatial dimensions. Until recently, these phenomena have not been addressed experimentally. Recently, we carried out a series of experiments to test 3-D relations in object completion. A full report will appear elsewhere (Kellman, Yin, Shipley, Machado, and Li, in preparation); here I note some of the main results. We used 3-D illusory object stimuli such as those shown in figure 11.8. Such displays appear to produce vivid 3-D illusory contours and sur-
Figure 11.7. Example of 3-D illusory and occluded contours. (Free-fuse by crossing or diverging the eyes.)
An Update on Gestalt Psychology
171
Figure 11.8. Stimuli in depth relatability experiments. Each display is a stereo pair. (Free-fuse by crossing the eyes.) Below each stereo pair is a side view of the display with the relation to the observer’s eye shown. a) 3-D relatable display. The top and bottom white areas lie in intersecting planes and appear connected by a 3-D illusory surface. b) Non-relatable display made by depth-shifting one inducing surface in (a) relative to the other. c) 3-D relatable display with top and bottom areas in a common plane. The top and bottom areas appear connected by a planar illusory surface, slanted in depth. d) Non-relatable display made by depth-shifting one inducing surface in (c) relative to the other. (From Kellman, Yin Shipley, Machado, and Li, in preparation.)
faces. We hypothesized that these occur when the physically given contours satisfy a 3-D criterion of relatability. The extension from the 2-D case is this: Bounding contours are relatable in 3-D when they can be joined by a smooth, monotonic curve. This turns out to be equivalent to the requirement that, within some small tolerance, the edges lie in a common plane (not necessarily a frontoparallel plane), and within that plane, the 2-D relatability criterion applies. Another way of saying the same thing is that the linear extensions of the two edges meet in their extended regions in 3-D space (and form an angle greater than 90 degrees). Three-dimensional relatability can be disrupted by shifting one piece in depth, as shown in figure 11.8b. Another relatable display and a corresponding shifted, nonrelatable display are shown in figures 11.8c and 11.8d. The experimental paradigm used these displays as follows. Subjects were shown a stereoscopic display on each trial. Stereoscopic disparities were produced by outfitting the subject with liquid-crystal-diode
172
Philip J. Kellman
(LCD) shutter glasses, synchronized with alternating computer images. Subjects made a speeded judgment on each trial about the positions of the upper and lower parts of the display. Displays like those in figure 11.8a and 11.8b were said to be in intersecting or converging planes. Those in figure 11.8c and 11.8d were said to be in parallel planes (including coplanar). Note that the classification required from the subject on each trial was orthogonal to the display’s status as relatable or nonrelatable. The key predictions were that (1) perception of a unified object would facilitate classification performance, and (2) perceived unity would depend on relatability. The former was expected based on results in 2-D displays showing that object completion produces an advantage in detecting boundary orientation (Shapley and Ringach 1996; Kellman, Yin, and Shipley 1998). Results of the initial experiment (Kellman, Yin, Shipley, Machado, and Li, in preparation) are shown in figure 11.9, which shows discrimination sensitivity (d’) in a signal detection analysis by condition. Two values of depth displacement (used to disrupt relatability) were used. These corresponded to a 5 cm and a 10 cm shift in depth of one of the pieces from the observer’s viewing distance (100 cm). Results indicate a clear superiority for the relatable displays. (Note that performance on parallel and converging displays are combined in the sensitivity analysis.) Response times reflected the same advantage: Both parallel and converging relatable displays produced faster responding. On the surface, these results suggest that object completion produces a performance advantage in this task and that 3-D relatability, to a first approximation, predicts unit formation in these displays. Even the smaller value of depth shift disrupted performance markedly. As this is a new paradigm and new data, however, there are several alternative explanations to be considered. Some of these are still occupying us in the lab, but we can relate a couple of important results here. First, it is possible that performance in our task might not really require object completion. Perhaps relatable displays were better processed because their pieces were more nearly at the same distance from the observer. Comparing two parts’ orientations might be easier when the parts are equidistant. Our design allowed us to check this hypothesis using a subset of the data. As figure 11.8d illustrates, a subset of parallel displays used a shift away from the canonical (relatable) stimulus that actually made the two parts more nearly equidistant. We compared these displays (which had either 0 or 5 cm depth differences) with relatable parallel displays having parts that differed substantially in depth (10 cm for the largest slant condition). Results showed that relatability, not similarity in depth, produced superior accuracy and speed. More recently we have tested even more subtle alternatives to the idea that
An Update on Gestalt Psychology
173
Figure 11.9. Sensitivity as a function of slant in the depth completion experiment. Relatable displays were more accurately and rapidly classified, suggesting that the upper and lower inducing areas were processed as a connected unit. (From Kellman, Yin, Shipley, Machado, and Li, in preparation.)
our effects are due to object completion. Results support the object completion hypothesis. But are these truly three-dimensional effects? Introducing binocular depth differences involves monocularly misaligning contours in each eye. Perhaps these monocular effects, not true depth effects, cause the performance decrement. It is known that misalignment of parallel or nearly parallel contours disrupts 2-D object completion (Shipley and Kellman 1992a; Kellman, Yin, and Shipley 1998). In designing the original study, we aimed to produce significant depth shifts using misalignments that remained within the tolerances for 2-D completion. It has been estimated that contour completion breaks down at about 15 minutes of misalignment of parallel edges (Shipley and Kellman 1992a). Our misalignments were on the order of about 10 minutes in the maximum depth shift condition. To check the effect of monocular misalignment, we carried out a separate experiment. In our binocular, depth-shifted displays, each eye had the same misalignment with opposite sign. In this experiment, we used the same displays, but gave misalignment of the same sign in both eyes. Thus the amount of monocular misalignment was exactly identical in every display as in the original experiment. Because both members of each stereo pair had misalignments of the same sign, shifted displays appeared to
174
Philip J. Kellman
be at the same depths as relatable displays, but with some lateral misalignment. Results showed no reliable accuracy or speed differences between shifted and relatable displays in this experiment. This outcome is consistent with the idea that perceived depth relationships affected object completion in the first study. The effects are not explainable by monocular misalignment. This line of research is just beginning, but it suggests that our updated notion of good continuation—contour relatability—applies in three spatial dimensions. Good Form The principle of good form (or more generally, Prägnanz) describes the tendency of perceptual processing to maximize simplicity and or regularity. Whether perceptual systems act in accordance with such a principle remains controversial. The principle has been difficult to define precisely, in part because it seems to refer to perceptual ourcomes rather than stimulus relationships. Some attempts have been made to formalize the notion of overall figural simplicity (e.g., Buffart, Leeuwenberg, and Restle 1981). It is difficult to separate good form from other factors. Common illustrations almost invariably involve edge continuity besides good form. Figure 11.10 shows two illustrations of good form redrawn from a textbook on perception. Both can be explained in terms of edge relatability. In the display in (a), the edges leading into the TDs are relatable so that the physically specified plus interpolated edges produce two closed forms—the triangle and the rectangle. The second example involves a case of relatability across minimal gaps. At each contour intersection, edges entering and leaving with no TD in between are classified visually as connected. In contrast, a TD between entering and leaving contours indicates a possible breakpoint. In the figure, the continuity of edges gives the two closed forms shown. Kanizsa (1979) argued that that global symmetry is a questionable or weak determinant of object completion, using demonstrations that pitted global factors against local edge continuity. Two of these are redrawn in figure 11.11. The debate about local vs. global determinants of segmentation and completion has persisted, however. Sekuler, Palmer, and Flynn (1994), for example, reported evidence from a priming paradigm suggesting that global completion occurs in displays like the one shown in figure 11.12a. (Global completion entails seeing a fourth articulated part behind the occluder, making the display radially symmetric.) Others have reported evidence for both global and local completions using priming (Sekuler 1994; van Lier, van der Helm, and Leeuwenberg 1995). Van
An Update on Gestalt Psychology
175
Figure 11.10. Putative examples of good form or Pragnanz. a) A triangle and a rectangle are seen. b) an ellipsoid and a square are seen. Both outcomes are explainable by relatability with no additional principle of good form or Pragnanz. (Redrawn from Goldstein 1995).
Figure 11.11. Kanizsa’s Demonstrations pitting local continuity against global symmetry. a) (Redrawn from Kanizsa 1979.)
176
Philip J. Kellman
Figure 11.12. Displays pitting local continuity and global symmetry. a) Occluded object for which local and global completion hypotheses make differing predictions. b) Illusory object version of a. Although subjects are willing to report a global (symmetric) completion in the occluded version, the symmetric completion is not seen in the illusory object display.
Lier et al. interepreted their results in terms of dual or multiple representations activated by partly occluded displays. This suggestion is close to our own hypothesis: Various experimental effects reflect two distinct categories of processing. One is a bottom-up, relatively local process that produces representations of boundaries according to the relatability criterion. This process is perceptual in that it involves a modular process that takes stimulus relationships as inputs and produces boundaries and forms as outputs. The other process is more top-down, global, and cognitive, coming into play when familiar or symmetric forms can be recognized. For lack of a more concise label, we call it recognition from partial information (RPI). One factor pointing toward such a distinction involves the identity between partly occluded and illusory objects, which we have already described. The identity hypothesis has received considerable support (Kellman, Yin, and Shipley 1998; Ringach and Shapley 1996; Shipley and Kellman 1992a), and certain types of displays, such as the Petter ef-
An Update on Gestalt Psychology
177
fect which we considered earlier, suggest that an identity at some point in processing is logically required (Kellman, Yin, and Shipley 1998). If true, the identity hypothesis sheds light on the global-local controversy, for this reason. Global completion phenomena are not observed in illusory object displays. Figure 11.12b shows the illusory object display with physically defined edges equivalent to those in figure 11.12a. The reader may observe that there is no appearance of a fourth articulated part in the illusory figure display. If the identity hypothesis is true, why should global completion occur in occluded but not illusory object displays? The answer may be that the displays are the same in terms of the perceptual processes of contour and surface interpolation but different in terms of RPI. An occluded surface is an interpolated surface that is not the nearest to the observer in some visual direction (i.e., there is something in front of it). An illusory surface is nearest to the observer among all surfaces in a certain visual direction. The crucial consequence of this difference is this: An observer viewing an occluded display is aware that part of the object is hidden from view. This allows certain kinds of reasoning and responses that are not sensible when no part of an object is occluded. In particular, despite any local completion process, the observer can notice what parts are visible (unoccluded) and whether they are consistent with some familiar or symmetric object. Consider a concrete example. If the tail rotor of a helicopter is seen protruding from behind a building, an observer may easily recognize and report that such a helicopter is present, even though the particular contours and surfaces of the hidden parts are not given perceptually. A stored representation of the helicopter may be activated and a belief about the presence of the helicopter may be formed. But RPI differs from perceptual processes that actually specify the positions of boundaries and surfaces behind an occluder. This separation of processes might explain conflicting reports about global and local processing. First, the only objective data supporting global outcomes come from priming studies. It is well known that priming occurs at many levels, from the most basic representation of the stimulus to higher conceptual classifications involving the stimulus (e.g., Kawaguchi 1988). Unfortunately, there have been no attempts to distinguish these influences in the priming literature on occlusion. Studies reporting global completion have typically used large numbers of trials with a small set of familiar and/or symmetric figures, such as circles and squares. Even if the subjects start out with little familiarity or do not notice the possibility of symmetry under occlusion, repeated exposure may produce familiarity or symmetry responses.
178
Philip J. Kellman
The Dot Localization Paradigm Priming may not be suitable for separating perceptual processes of boundary and surface completion from more cognitive influences. To test the possibility of different processes, we developed a new experimental paradigm. We focused on the idea that perceptual boundary completion processes lead to specific perceived boundary locations whereas RPI will not in general do so, as in our occluded helicopter example. We measured the precision of boundary location by showing an occluded display and briefly flashing a probe dot in front of the occluder. Subjects were instructed to respond on each trial whether the probe dot fell inside or outside the occluded object’s boundaries (i.e., whether the projection of the occluded object to the eye would or would not encompass the dot). We used an adaptive staircase procedure. In this procedure, the stimulus value for each trial changes depending on the subject’s responses. Systematic changes allow a single point on the subject’s psychometric function to be estimated. For each display, we used both a “two-up, one down” and a “one up, two down” staircase to estimate two points: the 0.707 probability of seeing the dot as outside the boundary and 0.707 probability of seeing the dot inside the boundary (= 0.293 probability of outside). We took the difference between these estimates as a measure of the precision of boundary perception, and the mean of these estimates as an estimate of the perceived location of the boundary. Staircases for several stimulus patterns were interleaved, that is, patterns appeared in a random order, and screen position was varied randomly. We realized that competing perceptual and recognition processes might lead to different strategies across subjects. Therefore, we gave subjects explicit strategy instructions. In the global instruction condition, we told subjects that they should see the display as symmetric; for the display in figure 11.12a, for example, they were told that there was a fourth protrusion behind the occluder identical to the three visible protrusions around the circle. In the local instruction condition, we told them that we wanted them to see the display as containing a simple curve connecting the two visible edges. In this manner, we sought to find subjects’ best abilities to localize boundaries under a global or local set. A number of interesting findings have emerged (Kellman, Shipley, and Kim 1996). Localization of boundaries in displays where completion is predicted by relatability is extremely precise. This is true for straight (collinear) and curved completions. A very different outcome occurs in cases where completion is predicted to follow global symmetry. Here, the precision (difference between “out” and “in” thresholds) is an order of magnitude worse. It is about 15 mm in a display of about 70 cm diameter (in visual angle, about 20 arcmin in a display 87 arcmin
An Update on Gestalt Psychology
179
in diameter). Moreover, the midpoint of the range is close to 1 cm away from the theoretically predicted location of the boundary. This result has shown up consistently in a range of displays testing symmetry and related global notions of object completion. There are a number of issues still under investigation in this new paradigm. What is already clear is that global influences do not lead to specification of precise boundary position in the way local perceptual completion does. These outcomes are consistent with the idea of separate perceptual completion and more cognitive RPI processes. Similarity An interesting feature of edge relatability is that it does not seem to be sensitive to similarity of surface quality (e.g., lightness, color, or texture). Figure 11.13 gives two examples. In (a) the visible parts are seen as a unified object despite differences in their surface lightness and contrast polarity from the occluding object. In (b) an illusory figure is formed from connections between pieces of very different luminances. Shipley and Kellman (1992a) found that magnitude estimations of object completion under occlusion in a large sample of randomly generated figures showed no reliable differences whether the relatable pieces were the same or different in luminance and chromatic color. The Gestalt principle of similarity thus seems to have little effect on relatability or the boundary interpolation process in general. Does this mean that there is no role for similarity in object completion? Kellman and Shipley (1991) proposed a surface-spreading process
Figure 11.13. Surface color insensitivity of boundary interpolation. a) A unitary partly occluded object is seen despite differences in lightness of its visible regions. b) Illusory contours form between surfaces of different lightnesses.
180
Philip J. Kellman
that complements boundary interpolation (cf. Yarbus 1967; Grossberg and Mingolla 1985). Surface quality spreads within physically specified and interpolated boundaries. In figure 11.14a the circle appears as a spot on a background. In figure 11.14b, the righthand circle still looks the same way but the lefthand circle may appear as a hole in the occluding surface. This effect appears to be dependent on similarity between the surface lightness and texture of the circle and the partly occluded ellipse. Because the circle has no TDs, it does not participate in the boundary interpolation process. What connects the circle with the surface behind the occluder appears to be a separate connecting process related to surface similarity. This surface process appears to be confined within the boundaries of the completed partly occluded figure in figure 11.14b. Figure 11.14c suggests, however, that surface spreading also occurs within the extended tangents of the boundaries of a partly occluded area (the half of the ellipse above the occluder), even when they are not relatable to others. In her dissertation, Carol Yin tested these two hypotheses—that surface quality spreads within relatable edges and also within extended
Figure 11.14. Examples illustrating the surface completion process. a) The circle appears as a spot in front of a background. b) The lefthand circle now appears as a hole, due to surface completion, based on similarity of lightness and texture. c) Surface completion can occur even without edge relatability. (See text.)
An Update on Gestalt Psychology
181
tangents of nonrelatable edges continuing behind occluding surfaces (Yin, Kellman, and Shipley 1997). In a series of experiments, subjects made a forced choice of whether a circular area appeared to be a hole in a surface or a spot on top of the surface in a number of displays varying edge and surface similarity relations. In a variant of the method, subjects made forced-choice responses of which of two displays looked more like it contained a hole for all possible pairs of displays in a particular experiment. These studies confirmed the hypotheses of surface spreading within relatable edges and tangent extensions. Yin also studied the surface completion process from an objective performance paradigm, pitting the effects of surface completion in making a circle look like a hole or a spot against small amounts of stereoscopic disparity. She found that surface completion interactions reduced sensitivity to stereoscopic depth (Yin, Kellman, and Shipley in press). Surface similarity and edge relatability seem to play complementary roles in object perception. Interpolated edges establish connections under occlusion, and surface qualities (lightness, color, and texture) spread within physically given and interpolated boundaries. Common Fate Wertheimer (1921) defined the “Factor of Common Fate” in this way. Suppose one sees a row of dots in which some are closer to others, leading to grouping by proximity. Now suppose some dots are shifted upward while others remain at rest. The shift will seem more disruptive if only dots that were initially grouped together are moved. If the shift involves some dots from different groups, it appears to change the grouping. The principle of common fate received little emphasis in later Gestalt discussions of perceptual organization. In Koffka’s (1935) treatise, for example, the principle is not even mentioned. In some ways, however, the nugget of insight in the principle of common fate connects to the most important modern developments in understanding perception. Owing in part to the development of ecological analyses of perception (Gibson 1966; Johansson 1968), we know that motion relationships provide a wealth of information about object structure and spatial layout. For perceiving unity under occlusion, there are two distinct types of information (Kellman and Shipley 1991). One, a direct descendant of Wertheimer’s common fate, we have called the edge-insensitive process. Certain motion relationships lead two visible parts to be seen as connected. This connecting principle does not require any particular relationships among the visible edges of the parts for unity to be seen. Computational and psychophysical research has revealed processes
182
Philip J. Kellman
that can determine whether particular 2-D motion patterns are consistent with a rigid 3-D structure, and if so, what structure it is. Wertheimer’s notion of common fate includes at least the stimulus relationships that allow recovery of rigid structure (Ullman 1979; Todd 1981). They may also include many nonrigid motions, such as the jointed motions characteristic of a moving human body, and elastic motions, characteristic of organisms or inanimate objects that stretch and contract during movement (Johansson 1975). Spatiotemporal Relatability of Edges A complementary process—the edge-sensitive process—does involve edge relationships in information given over time by motion. If a stationary observer looks through dense foliage, she may see meaningless fragments of color from the scene behind. If the observer moves while looking, however, the objects and spatial layout behind the foliage may be revealed. Sequential projection of parts seems to allow visual perception of complete objects, although this ability has not been much studied. There is evidence that sequential perception of inducing elements can produce illusory contours and figures (Kellman and Cohen 1984; Bruno and Bertamini 1988). Perception under these circumstances requires not only integration of information over time, but interpolation, because some parts of the object never project to the eyes. The situation is one encountered often in ordinary perception. What stimulus relationships in both space and time lead to perception of complete objects? With the extra degree of freedom given by motion, attempting to answer this question might seem daunting. It might be possible, however, to extend the criterion of spatial relatability to account for completion in dynamic scenes. A simple hypothesis about how this might be done is illustrated in figure 11.15. In (a), a moving opaque panel containing two apertures moves in front of an object. Suppose one part of the figure becomes visible through an aperture at time t1 and another part becomes visible at time t2. If the position and edge orientation of the part seen at t1 is encoded in a buffer and persists until the part at t2 appears, the standard relatability computation can be performed on the currently visible part and the earlier encoded part. The situation in (b) adds a step. Here the object moves, revealing one part through the bottom aperture at t1 and another through the top aperture at t2. Here the hypothesis is that when the part appears at t1, the visual system encodes not only its position and edge orientation but a velocity signal. This velocity signal could be used to update the spatial position of the earlier visible part over time, either in a special-purpose buffer or by triggering a pursuit eye movement. When the second part
An Update on Gestalt Psychology
183
Figure 11.15. Spatiotemporal relatability. a) A moving occluding panel with two windows passes in front of an object, projecting parts of the object to the eyes at different times. If a trace of the first visible part can be preserved until the second appears, spatial relatability can operate. b) A moving object’s parts are projected at two different times in two different places. If velocity information is available, the position of the initially viewed part can be updated (by an eye movement or in a visual buffer) so that it’s position relative to the second visible part can be extrapolated. Spatiotemporal relatability applies the spatial relatability computation to the currently visible and previously visible, positionally extrapolated parts. (From Palmer, Kellman, and Shipley, in preparation.)
becomes visible, it is combined with the updated position of the first part in the standard spatial relatability computation. The Dynamic Occlusion Paradigm Evan Palmer, Tim Shipley, and I recently developed an experimental paradigm to test these ideas (Palmer, Kellman, and Shipley 1997). The paradigm works as follows. On each trial, an object passes behind an occluder with several narrow slits, vertically separated so that some parts of the object never project to the eyes. This feature makes the task a completion or interpolation task as opposed to only an integration task (where visible parts are integrated over time). On each trial an object passes once back and forth behind the occluder. Subjects then make
184
Philip J. Kellman
Figure 11.16. Design for studying dynamic object completion. A target array consisting of three visible parts moves behind the occluder, visible only through narrow apertures. After each presentation, the subject makes a forced choice between two displays. a) Relatable display. b) Nonrelatable display. (See text.) (From Palmer, Kellman, and Shipley, in preparation.)
a forced choice between two test displays, choosing which matched the moving target display. The design is illustrated in figure 11.16. Two display conditions were used. Relatable displays (apart from the shift manipulation; see below) met the criterion of spatiotemporal relatability. The upper test display in figure 11.16 is an example. The other test display differs from the first by having one of the three fragments shifted by some amount. Five different amounts of shift (ranging from 1.67 arcmin to 8.33 arcmin of visual angle) were used. The target matched the unshifted test display on half of the trials and the shifted display on the other half. We predicted that relatability would facilitate encoding of the visible parts in the target display. If three parts moving behind slits were grouped into a single, coherent object, this might lead to more economical encoding and memory than for control displays (see below) in which three detached pieces were encoded. For simplicity, I will consider here only the cases in which either a test display or both the target and a test display were relatable. In these cases, it was predicted that the greater ease of encoding a relatable display would lead to better performance. Displays in a second condition were compared to the first. These nonrelatable displays consisted of the identical three pieces as in the relatable condition, but the top and bottom pieces were permuted. (See figure 11.16b.) With these nonrelatable displays, it was hypothesized
An Update on Gestalt Psychology
185
Figure 11.17. Results of dynamic object completion experiment. Sensitivity is shown as a function of the misalignment difference between the canonical display and the other test choice. Separate plots are given for relatable and nonrelatable displays. (From Palmer, Kellman, and Shipley, in preparation.)
that visual completion would not occur; each nonrelatable target might have to be encoded as three distinct pieces, which would lead to greater encoding demands and lower sensitivity to the relative spatial positions of the three parts. These experiments are just beginning, but we can present some early results. Figure 11.17 shows accuracy data (discrimination d’) from 16 subjects for relatable and nonrelatable displays as a function of shift. Relatable displays were far more accurately discriminated than displays made of the identical physical parts but placed in nonrelatable positions. The results provide tentative support for generalizing the notion of relatability from the spatial to the spatiotemporal domain. There are a whole range of issues raised but not yet addressed by the results. For example, we did not control fixation, and it is unclear whether eye movements based on velocity signals from the moving fragments facilitate spatiotemporal object completion. Likewise, we have not yet investigated effects of a number of other parameters. One of special importance is velocity. We suspect from other research (Shipley and Kellman 1994) that spatiotemporal completion will occur within a restricted temporal window of integration, around 165 msec. So the
186
Philip J. Kellman
results of our initial studies of dynamic occlusion raise more questions than they answer. They do provide some basis for connecting dynamic object perception to previous work with static displays, by means of the extended notion of relatability. Neural Models The theoretical ideas about boundary interpolation and surface filling that I have sketched are largely formal or computational in nature. That is, they characterize stimulus relationships that underlie object completion. They provide only hints about a precise process model or neural realization. I think it is worth concluding by mentioning some clues in these areas that are central to some of our current thinking and work in progress, as well as some work by others. We defined relatability in edge interpolation as a simple mathematical relationship between edge pairs. A number of considerations are leading us to consider interpolation effects as resultants of excitation fields that arise from individual edges. For example, there is some evidence that edges and the surface of a single region continue behind an occluder even when they do not connect to any other region (Kanizsa 1979; Nakayama and Shimojo 1992). We call this edge continuation to distinguish it from edge completion or interpolation. In this case, edges seem to continue along linear extensions of edge tangents at the point of occlusion. Surface spreading along such tangent extensions was found in Yin’s research, described above. One way to account for edge continuation and interpolation is to assume that each physically specified edge at its endpoint gives rise to a field of excitations at nearby locations. A vector field would identify with each spatial location and at each orientation (perhaps in a 3-D network) a certain excitation. Excitation would decrease with distance and would also depend on the orientation and positional relations as specified in the geometry of relatability. An interpolated boundary in this scheme arises when excitation fields from two separate physically specified edges meet, with a winner-take-all inhibition scheme preventing multiple completions. The temporal component of spatiotemporal relatability could be realized by adding the dimension of time to the vector field. Our research group and others are working on the specifics of this kind of model. For now it may be sufficient to note that this approach is consistent with some other psychophysical work, including that of Field and colleagues, Polat and Sagi (1994), Das and Gilbert (1995), and others. Both neurophysiological and psychophysical experiments suggest that cortical cells sensitive to orientation trigger the kinds of spatial interactions that could implement relatability. There is, of course, more
An Update on Gestalt Psychology
187
work to do in pursuing these general ideas. A meaningful theory will build on previously proposed frameworks (Grossberg and Mingolla 1985; Grossberg 1994; Heitger and von der Heydt 1993) but specific quantitative relationships faithful to psychophysical data must be added. New dimensions must also be added. Our research suggests that successful models must incorporate relationships across all three spatial dimensions and relationships in information given over time. As daunting as the theoretical task appears, it may be made tractable by precisely characterizing the grammar of object completion. In particular, we are encouraged by the idea that a simple piece of geometry—the notion of relatability—may provide a common thread knitting together pictorial, 3-D, and spatiotemporal object completion. This unifying idea may provide a platform for precise process modeling and investigations into the underlying neural mechanics. Conclusion Understanding perceptual organization—and segmentation and grouping in particular—still poses deep mysteries to researchers in biological and artificial vision. Yet often, when progress is made, we can trace its roots to insights made more than a generation ago by the Gestalt psychologists. It is amazing to realize that not only did the Gestaltists provide some of the clues about how to solve these problems, but they were the first to articulate clearly that these problems existed at all. At the same time, it must be admitted that their principles lacked precision and coherence. That these principles can still be recognized in more recent computational models, however, attests to the robustness of the original insights. In this chapter, I have attempted to make explicit some of these connections between the old and the new. A simple piece of geometry—the relatability criterion—appears to capture much of the grammar of edge interactions that lead to object completion. With rather simple extensions, relatability can be applied to contour interactions in depth and to dynamic object completion. Underlying this principle—and the Gestalt idea of good continuation—is the idea that object boundaries tend to be smooth. An alternative ecological interpretation might be that objects are not all that smooth, but for making inferences about where objects go under occlusion, smoothness is the best general assumption for a visual processor to use. Relatability might be implemented by simple interactions of units responding to oriented edges. Evidence is beginning to suggest that such interactions occur surprisingly early in cortical visual processing. Complementary to the boundary completion process is the spreading of surface quality within boundaries. Here, the Gestalt principle of similarity lives on. Some other principles, such as an idea of Prägnanz or
188
Philip J. Kellman
global symmetry, may turn out not to be determinants of perceptual representations per se, but may exert their effects more in memory and recognition. Of the original Gestalt principles, it is the notion of good continuation that emerges as having the most important legacy in models of object perception. This is the principle that also stands out when I reflect on the impact of the Gleitmans and the Gleitman Research Seminar. These many years later, Henry’s and Lila’s insight, dedication, and high standards continue to help all of us in our academic endeavors. That we seek to emulate them in our own research and teaching is perhaps the best principle of good continuation. Acknowledgments Portions of this research were supported by National Science Foundation grant SBR-9496112. I thank Thomas Shipley, Carol Yin, Sharon Guttman, and Evan Palmer for useful discussions, and John Jonides and Dan Reisberg for helpful comments on an earlier draft of this chapter. Address reprint requests to Philip J. Kellman, Department of Psychology, UCLA, 405 Hilgard Avenue, Los Angeles, CA 90095–1563 or by email to
. Note 1. Even the language we use to describe the idea contains the idea implicitly. We say a TD is a point where “contours meet,” but the presence of the TD is what makes it sensible to say “contours” (plural). Without the TD there is only a single contour.
References Bruno, N. and Bertamini, M. (1990) Identifying contours from occlusion events. Perception and Psychophysics 48 (4):331–342. Buffart, H., Leeuwenberg, E., and Restle, F. (1981) Coding theory of visual pattern completion. Journal of Experimental Psychology: Human Perception and Performance 7(2):241–274. Clowes, M. B. (1971) On seeing things. Artificial Intelligence 2:79–112. Das, A. and Gilbert, C. D. (1995) Long-range horizontal connections and their role in cortical reorganization revealed by optical recording of cat primary visual cortex. Nature 375(6534):780–784. Field, D., Hayes, A., and Hess, R. F. (1993) Contour integration by the human visual system: Evidence for a local “association field.” Vision Research 33 (2):173–193. Gibson, J. J. (1966) The Senses Considered as Perceptual Systems. Boston: Houghton-Mifflin. Gibson, J. J. (1979) The Ecological Approach to Visual Perception. Boston: Houghton-Mifflin. Grossberg, S. (1994) 3-D vision and figure-ground separation by visual cortex. Perception and Psychophysics 55 (1):48–120. Grossberg, S. and Mingolla, E. (1985) Neural dynamics of form perception: Boundary completion, illusory figures, and neon color spreading. Psychological Review 92:173–211.
An Update on Gestalt Psychology
189
Heitger, F. and von der Heydt, R. (1993) A computational model of neural contour processing: Figure-ground segregation and illusory contours. Proceedings of the Fourth International Conference on Computer Vision. Los Alamitos, CA: IEEE Computer Society Press, 32–40. Johansson, G. (1970) On theories for visual space perception: A letter to Gibson. Scandinavian Journal of Psychology 11(2):67–74. Johansson, G. (1975) Visual motion perception. Scientific American 232(6):76–88. Kanizsa, G. (1979). Organization in Vision. New York: Praeger. Kawaguchi, J. (1988) Priming effect as expectation. Japanese Psychological Review, Special Issue: Problems of repetition in memory 31(3):290–304. Kellman, P. J. and Cohen, M. H. (1984) Kinetic subjective contours. Perception and Psychophysics 35(3):237–244. Kellman, P. J., Machado, L., Shipley, T. F. and Li, C. C. (1996) 3-D determinants of object completion. Investigative Ophthalmology and Visual Science 37(3):685. Kellman, P. J. and Shipley, T. (1991) A theory of visual interpolation in object perception. Cognitive Psychology 23:141–221. Kellman, P. J. and Shipley, T. F. (1992) Visual interpolation in object perception. Current Directions in Psychological Science 1(6):193–199. Kellman, P. J., Shipley, T. F., and Kim, J. (1996) Global and local effects in object completion: Evidence from a boundary localization paradigm. Paper presented at the 32nd Annual Meeting of the Psychonomic Society, St. Louis, Mo., November 1996. Kellman, P. J. and Spelke, E. S. (1983) Perception of partly occluded objects in infancy. Cognitive Psychology 15:483–524. Kellman, P. J., Yin, C., and Shipley, T. F. (1998) A common mechanism for illusory and occluded object completion. Journal of Experimental Psychology: Human Perception and Performance 24(3):859–869. Kellman, P.J., Yin, C., Shipley, T.F., Machado, L. and Li, C.C. The 3-D geometry of object completion. Manuscript in preparation. Koffka, K. (1935). Principles of Gestalt Psychology. New York: Harcourt. Lee, D. N. (1974) Visual information during locomotion. In R. B. MacLeod and H. L. Pick (eds.), Perception: Essays in Honor of James J. Gibson. Ithaca, NY: Cornell University Press. Lesher, G. W. and Mingolla, E. (1993) The role of edges and line-ends in illusory contour formation. Vision Research 33(16):2253–2270. Marr, D. (1982) Vision. San Francisco: Freeman. Michotte, A., Thines, G., and Crabbe, G. (1964) Les complements amodaux des structures perceptives. Studia Psycologica. Louvain: Publications Universitaires de Louvain. Nakayama, K., and Shimojo, S. (1992) Experiencing and perceiving visual surfaces. Science 257(5075):1357–1363. Nakayama, K., Shimojo, S., and Silverman, G. (1989) Stereoscopic depth: Its relation to image segmentation, grouping and the recognition of occluded objects. Perception 18(1):55–68. Palmer, E., Kellman, P. J., and Shipley, T. F. (1997) Spatiotemporal relatability in dynamic object completion. Investigative Ophthalmology and Visual Science 38(4):256. Palmer, E., Kellman, P. J. and Shipley, T. F. Spatiotemporal relatability in dynamic object completion. Manuscript in preparation. Petter, G. (1956) Nuove ricerche sperimentali sulla totalizzazione percettiva. Rivista di psicologia 50:213–227. Polat, U. and Sagi, D. (1994) The architecture of perceptual spatial interactions. Vision Research 34(1):73–78. Prenter, P. M. (1989) Splines and Variational Methods. New York: Wiley.
190
Philip J. Kellman
Ringach, D. L. and Shapley, R. (1996) Spatial and temporal properties of illusory contours and amodal boundary completion. Vision Research 36:3037–3050. Sekuler, A. B. (1994) Local and global minima in visual completion: Effects of symmetry and orientation. Perception 23(5):529–545. Sekuler, A. B., Palmer, S. E., and Flynn, C. (1994) Local and global processes in visual completion. Psychological Science 5(5):260–267. Shipley, T. F. and Kellman, P. J. (1992a) Perception of partly occluded objects and illusory figures: Evidence for an identity hypothesis. Journal of Experimental Psychology: Human Perception and Performance 18(1):106–120. Shipley, T. F. and Kellman, P. J. (1992b) Strength of visual interpolation depends on the ratio of physically-specified to total edge length. Perception and Psychophysics 52(1):97–106. Shipley, T. F. and Kellman, P. J. (1994) Spatiotemporal boundary formation. Journal of Experimental Psychology: General 123(1):3–20. Todd, J. T. (1982) Visual information about rigid and nonrigid motion: A geometric analysis. Journal of Experimental Psychology: Human Perception and Performance 8(2):238–252. Ullman, S. (1979) The Interpretation of Visual Motion. Cambridge, MA: MIT Press. van Lier, R. J., van der Helm, P. A., and Leeuwenberg, E. L. J. (1995) Competing global and local completions in visual occlusion. Journal of Experimental Psychology: Human Perception and Performance 21(3):571–583. Waltz, D. L. (1972) Generating semantic descriptions from drawings of scenes with shadows (Tech. Rep. AI-TR–271). Cambridge, MA: MIT. Wertheimer, M. (1921) Laws of organization in perceptual forms. In W. D. Ellis (ed.), Readings from Gestalt Psychology. New York: Harcourt Brace, 1938. Yarbus, A. L. (1967) Eye Movements and Vision. New York: Plenum Press. Yin, C., Kellman, P. J., and Shipley, T. F. (1997) Surface completion complements boundary interpolation. Perception, special issue on surface appearance, 26:1459–1479. Yin, C., Kellman, P.J., and Shipley, T.F. (in press). Surface integration influences depth discrimination. Vision Research.
Chapter 12 Beyond Shipley, Smith, and Gleitman: Young Children’s Comprehension of Bound Morphemes Katherine Hirsh-Pasek In the fall of each year, as leaves turn bright against the New England landscape, psycholinguists make their annual pilgrimage to the Boston Language Conference. One of the highlights of the trip to Boston is the Gleitman dinner, a gathering of all those fortunate enough to be Lila and Henry’s intellectual children, grandchildren, and great grandchildren. As you look around the dining room, you can’t help but be impressed by the large number of scientists who have been touched by the Gleitman tradition, a tradition characterized by outstanding scholarship, first-rate teaching, and personal friendship. There is no match for the scholarship that we witnessed during our graduate years. Lila always understood the big picture of language development, constantly reframing our narrow questions into ones that addressed major issues in the field. I remember marveling at the way in which she made our first-year research projects seem so much more important than we had imagined. (She magically molded my research on young children’s understanding of jokes into a key project on the relationship between metalinguistic processing and reading.) Lila also had (and still has) the insight and common sense to know just where to look to test her account of a developmental story. She has that rare ability to integrate data from linguistic and psychological journals with examples from the TV guide, Star Trek, and a neighborhood two-year-old. While Lila helped us ask the questions, however, it was Henry who would sculpt those questions into psychologically interesting research. The result was a constant stream of papers in child language, each of which fit into a larger program of research, many of which became classics in the field. Their scholarship is unquestioned, yet their style of teaching and advising stand out as the shining light of my graduate years. When my thirteen-year-old son recently asked Henry what he would describe as his greatest accomplishment in psychology, he answered without hesitation, “My students.” No one who worked with Henry or Lila would be surprised by that answer. The Thursday night cheese seminars at the
192
Kathy Hirsh-Pasek
Gleitman home showed us how much they cared. Every week we met until all hours of the night, learning how to respect each person’s ideas, even when we disagreed. We learned that there were no simple answers, and that every result had alternative explanations. Beyond our weekly meetings, Henry and Lila were always available, never too busy to read our drafts or to look at our preliminary analyses. They worked with us side-by-side to ensure that our papers were of high quality. Then, they graciously offered us first authorships on our collaborative efforts. Perhaps the reason that Lila and Henry were such good mentors, however, is that they were not just academic advisors. They were also good friends. When you became a Gleitman student you entered into the rich world of the Gleitman’s life—on the tennis court, at the theater, and at the finest local restaurants. I am counting on a long and continued collaboration and friendship with both Lila and Henry. I have no doubt that they will define the field of psycholinguistics as we move into the next millennium. In this paper, I take the opportunity to demonstrate one way in which their insights continue to shape my research. Using the now classic Shipley, Smith, and Gleitman (1969) as a springboard, my collaborators Roberta Golinkoff, Melissa Schweisguth and I ask anew, “When are children first sensitive to grammatical morphemes in the input language?” and “Do they show this sensitivity sooner in comprehension than in production?” Almost thirty years after these questions were addressed in the Shipley, Smith, and Gleitman paper, they remain central to the study of grammatical development. Grammatical Morphemes and Their Role in Language Acquisition One of the key issues in language development concerns the young child’s ability to discover the building blocks of grammar: the nouns, verbs, and other parts of speech in the ambient language. Only with these building blocks in hand (or in head) can children come to recognize syntactic patterns in the input and to construct the grammar of their native language. Indeed, every theory of grammar acknowledges that the discovery of grammar is one of the fundamental problems of language acquisition. Throughout the years, a number of proposals have been advanced for how children might go about finding categories like nouns and verbs. Among them are syntactic distributional proposals, phonological proposals, and semantic proposals. In the syntactic distributional view, for example, children can use fairly regular distributional properties of the grammar to begin the process of assigning words to form classes (see Maratsos and Chalkley 1980). For example, nouns generally occur at the ends of sentences in child-directed
Beyond Shipley, Smith, and Gleitman
193
speech (Aslin, Woodward, LaMendola, and Bever 1996). Nouns also generally occur after grammatical morphemes like “the.” Through attention to these structural cues children might come to create a categories of nounlike and verblike words based on distributional regularities in the input. Nouns and verbs also have different prosodic properties. Nouns, for example, are more heavily stressed within sentences than are verbs. They also tend to have more syllables per word, longer durations, more vowels, and more phonemes overall than do verbs. Perchance these statistical regularities assist the child in finding the relevant form classes (Kelly 1992, 1996). Finally, accompanying these structural and prosodic distinctions are semantic differences that can assist children in locating nouns and verbs. Nouns often, though not invariably, refer to persons, places, and things, whereas verbs are more likely to refer to actions. These gross correlations have become the fodder for semantic bootstrapping theories (Pinker 1984; Grimshaw 1981; Bowerman 1973; but see Gleitman and Gillette 1995 for a contrasting view). Such proposals can get the learner started in form class assignment, but semantic “bootstrapping” can only take her so far. A word’s meaning does not define its form class. To use Pinker’s (1994) example, the word interest can be a noun in “Her interest in bowling,” a verb in “Bowling started to interest her,” and an adjective in “She seemed interested in bowling from the start.” All of these instantiations of interest share a similar meaning. Yet, they are not all classified into the same part of speech. In sum, then, learners have a number of partially redundant cues to linguistic form class through syntactic distribution, prosody, and semantics. Undoubtedly, they can capitalize on this redundancy by attending to the coalition of cues to solve the problem of finding the building blocks of grammar (Hirsh-Pasek and Golinkoff 1996; Morgan, Shi, and Allopenna 1996). Among all of these available cues, however, one set of cues to nouns and verbs stands out as more reliable than the rest; one set of cues is sufficient (though not necessary) for distinguishing between the major form classes: grammatical morphemes. Grammatical morphemes are the closed-class words (such as “the”) and bound morphemes (such as /ing/) associated with particular form classes. Although in English these elements are usually weakly stressed in the input, they are fairly reliable cues for form class assignment. For example, nouns (or noun phrases) follow the grammatical morpheme “the,” and the morphological ending /ing/ tends to signal a verb. Thus, even though many cues operate in tandem to allow children a way to assign words into basic grammatical units of nouns and verbs, grammatical morphemes might well provide the easiest and most reliable cue of all those available.
194
Kathy Hirsh-Pasek
The potential role of grammatical morphemes in syntactic development has not gone unnoticed (Maratsos and Chalkley 1980; Morgan 1986; Morgan and Newport 1981; Morgan, Meier, and Newport 1987). As noted above, Maratsos and Chalkley thought that grammatical morphemes would be central to a distributional view of how children learn grammatical categories. Further, Morgan and his colleagues argued that grammatical morphemes were key to the “prosodic bracketing” that allowed adults to parse artificial grammars into linguistic constituents. These cues are certainly available in the input. Yet, while adults may notice and be able to capitalize on these cues, significant controversy exists as to whether young children could even attend to, let alone mine, these weakly stressed cues in the service of grammatical acquisition. Children do not reliably produce grammatical morphemes until they are about twenty-four months of age (Brown 1973; deVilliers 1973; Valian 1986; P. Bloom 1990). Thus, many think that these cues could not be used by young language learners to assist them in discovering grammar. Pinker (1984) wrote, In general, it appears to be very common for unstressed closedclass morphemes not to be present in the earliest stages in the acquisition of many languages. Thus, as much as it would suit my purposes to claim that Stage I children have latent control over the morphemes whose presence defines the categorization of certain constituents, it does not seem to be tenable given available evidence. (p. 103) Though grammatical morphemes would assist children in form class assignment, Pinker is suggesting that children might not be able to use these cues until the grammar is at least partially acquired. This argument is powerful one. Yet, there is a rub. Pinker’s assertions are based on production data. Shipley, Smith, and Gleitman (1969), however, claim that children could potentially be sensitive to these markers in the language even though they do not produce them. That is, children might well comprehend grammatical morphemes (and therefore use them in form class assignments) before they can say them. On this account, the lack of grammatical morphemes in children’s speech represents a production constraint rather than a portrait of toddlers’ linguistic competence. It was this insight that Shipley, Smith, and Gleitman (1969) captured in their paper, “A study in the acquisition of language: Free responses to commands.” The authors noted, It seems clear, however, that the study of spontaneous speech does not provide a sufficient basis for understanding what the child “knows” about language at various stages of development. . . .
Beyond Shipley, Smith, and Gleitman
195
[A] study of spontaneous speech, however objective and comprehensive, forms a poor basis even for the study of adult language. (p. 103) It was Shipley, Smith, and Gleitman (1969), then, who set the stage for the study of language comprehension as a metric for emerging language development. In Shipley, Smith, and Gleitman (1969) two questions were posed. First, did infants and toddlers understand more than they could say? Second, were holophrastic and telegraphic listeners—who did not use any grammatical morphemes—sensitive to grammatical morphemes in the input that they heard? Subjects 18 to 33 months of age participated in an “act out” task in which they responded to three simple types of commands. Appropriate commands had obligatory grammatical morphemes, as in “Throw the ball.” Omissions were telegraphic commands that omitted the obligatory morphemes, as in “Throw ball.” Finally, nonsense commands placed nonsense words in places in which the obligatory morphemes belonged, as in “Gor ronta ball.” In answer to the first question, results differed depending on the language level of the children. Those in the holophrastic group carried out more commands when the commands omitted obligatory morphemes than when they included them. Children in the telegraphic group, in contrast, carried out fewer commands when they omitted grammatical morphemes than when they included them. As Shipley et al. wrote, “What is surprising is that just those utterance types they themselves did not use were more effective as commands” (p. 331). These findings suggest something that most researchers did not consider in 1969—that children may be sensitive to grammatical morphemes even when they are not yet producing them in their own speech. In response to the second question on grammatical morphemes, Shipley et al. (1969) made an even more remarkable discovery. When they presented telegraphic speakers with requests in which nonsense words replaced the grammatical morphemes, the response pattern was disrupted. This further confirms the finding that the telegraphic speaker is not a telegraphic listener. These children were sensitive to grammatical morphemes in the input that they heard. The Shipley, Smith, and Gleitman (1969) findings opened the door for more investigations that probed young children’s sensitivity to grammatical morphemes and their use of these markers in the construction of grammar. A number of studies followed that confirmed and extended the findings of Shipley, Smith, and Gleitman (1969). By way of example, Katz, Baker, and MacNamara (1974) and Gelman and Taylor (1984) found that infants as young as 17 months of age were sensitive to
196
Kathy Hirsh-Pasek
distinctions between “the” and “a.” In enactment tasks, these toddlers were more likely to retrieve a particular block when requested to get “the” block than when requested to get “a” block. An even more dramatic example comes from Shafer, Gerken, Shucard, and Shucard (1992) who used an evoked potential procedure to demonstrate that 10- and 11-month old children could attend to the phonological properties of grammatical morphemes. When normal function morphemes (such as “a” and “of”) were replaced with nonsense functors (such as “gu”), infants noticed the change and paid more attention to the sentences containing nonsense functors. It appears as if “infants are sensitive to enough of the canonical phonological properties of their language to begin to identify function morphemes as a phonological class” (Gerken 1996, p. 417). Finally, Gerken and McIntosh (1993) offer a compelling demonstration of toddler sensitivity to grammatical morphemes in comprehension. Using a picture-pointing task with four choices, toddlers 21 to 28 months of age were requested to (a) Find the dog for me (correct morphology); (b) Find * dog for me (morphology absent); (c) Find was dog for me (ungrammatical morpheme); or (d) Find gub dog for me (nonsense morpheme). Consistently, children performed better in the grammatical than the ungrammatical task. For toddlers with MLUs of under 1.5 hearing the stimuli in female Motherese, the proportions correct were 86% in the correct condition and 75% in the missing condition, with a dramatic drop to 56% and 39% in the ungrammatical and nonsense conditions, respectively. Thus, children who were not producing morphemes in their own speech were nonetheless sensitive to this information in comprehension. Most importantly, children with low MLUs have obviously learned something about particular phonological forms within the input. If they had not yet noted the particular grammatical morphemes, then all except the absent morpheme should have been treated similarly. If they had just classified the input as prosodically or lexically familiar versus unfamiliar, the ungrammatical “was” condition should have been as good as the grammatical “the” condition. Thus, children are not only sensitive to grammatical morphemes in the input, but seem to know something about their appropriate locations in the sentence—a crucial fact reopening the possibility that they could use different morphological cues to classify different constituents into the correct form classes. In short, the studies that followed Shipley, Smith, and Gleitman (1969) reaffirmed their interpretation that children are sensitive to morphological cues in the input. The studies also confirmed the role that comprehension can play in providing an important window on language development.
Beyond Shipley, Smith, and Gleitman
197
Expanding This Literature: A Study of Bound Morpheme Comprehension The findings about sensitivity to grammatical morphemes are encouraging. Most of the studies performed to date, however, have either been conducted with older toddlers or have used only free morphemes like “the” that signal noun phrases (e.g., Brown 1973; Taylor and Gelman 1988). To make the comprehensive case that toddlers note morphological cues in the input, one must demonstrate that they can attend to the full range of morphological cues. That is, one must demonstrate that children are equally sensitive to bound morphemes like “ing” that mark verb phrases, or “ly” that mark adverbial phrases. It can be argued that bound morphemes should be even more difficult to notice because they are not only weakly stressed, but are affixed to the ends of the words that support them. To address this gap in the literature, Roberta Golinkoff, Melissa Schweisguth, and I tested toddler sensitivity to the bound morpheme “ing” (Golinkoff, Hirsh-Pasek, and Schweisguth, in press). Borrowing directly from the Gerken and McIntosh study, we presented three types of stimuli to the children: grammatical morphemes (“dancing”), ungrammatical morphemes (“dancely”), and nonsense morphemes (“dancelu”). The logic of the design is as follows: If children do not attend to bound morphemes, they might see the three stimuli described above as virtually identical—interpreting each word as the stressed stem dance. If, on the other hand, toddlers make any distinction among the three stimuli, then there is evidence that bound morphemes are detectable in the input. Again, paralleling Gerken and McIntosh, our claim is that if the toddlers have more correct responses in the grammatical condition (“ing”) than in the other two conditions (“lu,” “ly”), there would be evidence that the children are distinguishing among the phonological forms and that they could potentially use the information in categorizing grammatical constituents. The hypothesis driving this research was the latter one. We predicted that the children would indeed differentiate among the three conditions, performing best in the grammatical condition (“ing”), less well in the ungrammatical (but familiar) condition (“ly”), and not at all well in the nonsense condition (“lu”). In what follows, I present data on this issue, for it not only underscores what Shipley, Smith, and Gleitman (1969) suggested, but again gives us reason to look for unfolding linguistic competence through measures of language comprehension. The subjects for this experiment were 108 toddlers, distributed equally and randomly into the three conditions of “ing,” “ly,” and “lu,” balanced for gender, and ranging in age from 18 and 21 months. All of the children had been screened by phone to ensure that they
198
Kathy Hirsh-Pasek
comprehended at least 6 of the 8 verbs being used as stimuli. At the time of the visit, the children were also asked if they were producing “ing.” Very few (about 6 of the children) occasionally produced “ing.” Children were tested individually in the intermodal preferential looking paradigm developed to assess language comprehension in toddlers (Golinkoff, Hirsh-Pasek, Cauley, and Gordon 1987; Hirsh-Pasek and Golinkoff 1996 a,b). In the intermodal preferential looking paradigm (IPLP), children are seated on their blindfolded parent’s lap midway between two television monitors spaced 2.5 feet apart. Figure 12.1 provides a schematic drawing of the procedure and the set-up. On one screen children might see a woman dancing. On the other screen, and at the same speed of delivery, they would see the same woman waving to the viewer. Through a hidden speaker located between the monitors, children heard the test stimulus, which in this case was either, “Where’s dancing? Do you see dancing?” or “Where’s dancely? Do you see dancely? Or “Where’s dancelu? Do you see dancelu?” The logic of this experiment, confirmed in numerous previous experiments (see HirshPasek and Golinkoff 1996a,b), is that children will look longer at the screen that “matches” or “goes with” the linguistic stimuli than at the nonmatching screen. Thus visual fixation serves as the dependent measure. All dependent data were collected by videotaping the children’s responses so that they could be coded “off-line.” Agreement between coders for these experiments has been consistently high, averaging at around 91% (see Hirsh-Pasek and Golinkoff 1996a). Before I describe the design further, it is important to note some of the advantages of this procedure over others for testing the early language comprehension of bound morphemes. The first is that, unlike the picture pointing tasks, the IPLP allows the experimenter to deliver dynamic stimuli to children in a controlled fashion. The bound morphemes used with young children are often (though not always) attached to action verbs. It was therefore important to be able to test for linguistic comprehension of these forms within the context of dynamically presented stimuli. The second advantage is that while action can be displayed, the procedure does not require any action on the child’s part. Thus, children are not lured into impulsive responses that might test their compliance (or lack thereof) rather than their linguistic competence. A mere looking response indicates the child’s preference for one screen over the other. For both of these reasons, then, the IPLP seems like an ideal way to examine children’s budding knowledge of these linguistic forms. The layout for these experiments is presented in table 12.1. The children were exposed to 8 different continuous actions (e.g., dancing, waving, pushing, turning) that appeared in four pairs. Each pair of verbs
Beyond Shipley, Smith, and Gleitman
199
Figure 12.1.
was represented two times for 6 seconds each. The trials were also separated by intertrial intervals during which a light between the two screens came on to draw children’s attention to the center. Thus, each trial required the child to make a new choice of which screen to attend to. The video tapes were tightly synchronized so that both members of a pair appeared in front of the child at the same time. The stimuli were also balanced for salience so that one member of a pair was not more enticing than the other. Finally, presentation of the actions were counterbalanced such that half of the time “dancing” would appear on the left screen and half of the time it would appear on the right screen. The linguistic stimuli determined which of the screens would be the matching versus the nonmatching screen and also differentiated the three test groups. In a between-subject design, 24 of the children heard all of the pairs in the “ing” condition, 24 heard the stimuli in the “ly” condition and 24 heard the stimuli presented in the “lu”condition. Note
200
Kathy Hirsh-Pasek
Table 12.1 Sample Block of Trials for “ING” Condition Left Screen
Linguistic Stimuli
Right Screen
Simultaneous Trials Black woman drinking from cup Black woman drinking from cup
Hey boys and girls! What do you see on TV? What’s going on on those TVs? What are they doing? Hey! Look up here! What’s going on on those TVs? What are they doing?
Black woman blowing air at piece of paper Black woman blowing air at piece of paper
Test Trials Black woman drinking from cup Black woman drinking from cup
Which one is drinking? Can you find drinking? Where’s drinking? Do you see drinking? Whoa! Find drinking. Look up here again? Which one is drinking?
Black woman blowing air at piece of paper Black woman blowing air at piece of paper
that the same sentence frames accompanied the words with the exception of the bound morpheme. Table 12.1 contains a sample of the design for the “ing” condition. Thus the total design for this preliminary experiment had withinsubject variables of verb (four pairs) and match (matching vs. nonmatching) and between-subject conditions of linguistic group (grammatical, ungrammatical, nonsense) and gender. Results Before reviewing the results, note that there were no stimulus salience problems in the simultaneous trials. That is, when the pairs of actions were presented with a neutral linguistic stimulus, neither verb in a pair was intrinsically more interesting than the other member of that pair. The first important result comes from the grammatical, “ing” condition. A three-way ANOVA with between-variables of verb and gender and within-variables of match revealed a main effect of match. The children looked at the matching screen (x = 4.01 sec.) significantly more than the nonmatching screen (x = 3.31 sec.). In the “ing” condition, both the boys and the girls responded correctly across all of the verbs. There
Beyond Shipley, Smith, and Gleitman
201
were no interactions with verb or with gender. This result is critically important because it suggests that children are responding to the stimuli. We do not know from this result alone, however, whether they are just listening to the verb stem, (e.g, dance), or whether they actually notice the bound morpheme “ing.” The “ly” (ungrammatical) condition produced more interesting results. Here, the ANOVA revealed a main effect of match and an interaction between verb and match. The children—both boys and girls—failed to watch the matching screen (x = 3.07 sec.) more than the nonmatching screen (x = 4.21 sec.) in the first verb and then looked at the matching screen (x = 3.83 sec.) significantly more than the nonmatching screen (x = 3.02) for the last three verbs. One possibility is that the children recognized “ly” as familiar but were puzzled at first by its placement on a verb. This would suggest that children are sensitive to the ungrammatical use of a familiar morpheme and that this usage is capable of disrupting sentence comprehension as in the Gerken and McIntosh (1993) “was” condition. Though confused at first, however, they later decide that perhaps the familiar ending could be an ending for the verb. Note here that if the children were only responding to the verb stem (e.g., dance) then no verb by match interaction should be expected, since the verb stems were the same in the “ing” and the “ly” conditions. Thus the pattern of results for the “ly” condition suggests that by 18 months of age children possess more sophistication about grammatical morphemes than we imagined. They appear to be aware not only of which morphemes are found in English but of the type of words on which the morphemes are typically to be found. These data suggest that children may be segmenting a verb into a stem and a morpheme. In the end, however, they let input rule the day and decide that the ending can be placed on the verb. Finally, the “lu” (nonsense) condition offers yet a third piece of evidence that children are attending to bound morphology. Here, comprehension is completely disrupted and neither the match nor the nonmatch is watched to a greater degree throughout the four blocks of trials. Mean visual fixation time across the four blocks of trials is identical across the match and nonmatch conditions at 3.56 seconds. Children were not sure which screen to watch in response to words like “dancelu” and “wavelu.” Again, the only difference in the three linguistic conditions is the difference in the bound morpheme. Thus the bound morpheme “lu” abolished all preferences for the verb stems. Discussion These findings are suggestive and parallel to those of Gerken and McIntosh (1993). Even in late infancy, children are sensitive to the grammatical
202
Kathy Hirsh-Pasek
morphemes in the input. What we saw in this experiment is that merely changing the weakly stressed, bound morpheme at the end of a sentence frame significantly influenced children’s sentence processing. With “ing” at the end of the main verbs, all children responded appropriately. With the familiar but ungrammatical “ly” at the end of the same verbs, responses were initially confused and then resolved on the matching screen. Finally, with the nonsensical ending “lu,” children’s responses were totally disrupted such that the only consistent trend was from the girls who preferred the nonmatching screen. What is clear from this pattern of responses is that the children did note grammatical morphemes even in the difficult case of bound morphemes in which the functor is not only weakly stressed, but is also attached to a verb that carries the primary semantic force. These children were not simply relying on the verb stem to determine their choices. If they had been, their responses should be equivalent across the three test conditions. What is less clear is exactly what the differential pattern of responses does indicate. Below, I consider three possible interpretations of these results. I then conclude by echoing Shipley, Smith, and Gleitman’s (1969) concern that if we are truly to understand the differences between children’s spontaneous speech and their knowledge, we must develop new techniques for the systematic observation of this knowledge. I will argue that in this study as in others, the intermodal preferential looking paradigm is a tool for the systematic observation of comprehension and that the study of early comprehension might provide a crucial way to explore linguistic competence. Three interpretations Three possible interpretations could be used to explain these results: the whole word explanation, the particular morpheme explanation, and the familiar morpheme explanation. Let me take each in turn. The first possibility is that children do not analyze a word into a stem and an accompanying morpheme. For years, psycholinguists have argued about whether words are stored as whole units or as base words plus morphemes (Taft and Forster 1975; Rubin, Becker, and Freeman 1979). If words are stored as whole units, the lexicon would require separate storage of each morphological variant of a word. Thus teach, teacher, teaching, and teaches would each be stored as a separate and independent word. If, on the other hand, a complex word is stored as a base or stem plus the morpheme, then the word teach would be stored in lexical memory as would a set of morphemes that could be affixed to the base word. For example, teach would be stored as would -er, -ing, and -es. Rules would then be required for adjoining base words and bound morphemes. It has been suggested that the whole word option requires
Beyond Shipley, Smith, and Gleitman
203
more memory storage, but that the stem plus morpheme solution to lexical storage allows for more productivity and increases the processing load. Most of the current evidence from adult lexical decision experiments supports the stem plus morpheme interpretation (Taft and Forster 1975). It is possible that children who are first learning words might favor a system for storing whole words as units. Without enough words in their repertoire, they might not be able to recognize the patterns of endings that comprise bound morphemes. Indeed, one could hypothesize that children might need to have a critical mass of words before such analysis into stem plus morpheme could take place. A similar argument has been offered by Jones and Smith (1993), who suggest that the shape bias in word learning does not occur until children have enough words in their lexicons to do an internal analysis. Under this interpretation, the children in our experiment might have learned the whole unanalyzed words for “dancing,” “pushing,” and “waving” and thus would perform better when these words were used as stimuli than when unfamiliar words like “dancely” or “dancelu” served as stimuli. This interpretation, however, is not entirely supported by the data. The children did do better in the “ing” condition than they did in the “lu” condition; yet, they gave mixed results in the “ly” condition. Had they been using a whole word strategy, the “ly” condition should have elicited the same responses as did the “lu” condition. Both are equally unfamiliar. Yet that was not the case. Hence we can tentatively reject the whole word alternative in favor of one of the two base plus morpheme alternative explanations. The particular morpheme hypothesis holds that the child has already learned something about particular morphemes in the input and thus knows, to a certain extent, that “ing” signals verbs and that “ly” signals adverbs. Of the three explanations, this alternative gives the child the most sophisticated knowledge, suggesting that children use bound morphemes to label constituent phrases. Again, the fact that the children watched the correct verb in the “ing” condition and watched both verbs in the “lu” condition supports this alternative. Again, however, the findings from the “ly” condition make this hypothesis less likely. One could argue that the children in the “ly” condition were faced with a forced-choice alternative and that no adverbial alternative was available. Choosing the lesser of two evils, they favored the “ing” alternative—thus explaining the results in total. Yet there is another reason to question this explanation. Children who are just beginning to learn grammar must be open to the full range of bound morphemes that they will encounter. If they fully restricted the class of morphemes to those that they currently knew, they would not be able to master new bound
204
Kathy Hirsh-Pasek
morphemes. Thus, instead of supporting this alternative, we turn to a slightly more flexible explanation offered by a third position. The final hypothesis, the familiar morpheme hypothesis, explains the results and also leaves room for further learning. On this scenario, children know that certain phonological forms heard in the input serve as bound morphemes. That is, the children store acoustic information that has no meaning for them as yet, but that has been repeated with some statistical frequency. Several recent experiments attest to the fact that infants as young as 8 months of age can perform this kind of statistical acoustic analysis (see, e.g., Saffran, Aslin, and Newport 1996). Once stored as acoustic templates, some of these sound sequences could then become associated with specific form classes and come to have more particular meanings. Familiar phonological patterns like “ing” may be so frequently encountered that they become associated with particular stems that the children have heard before. The morpheme “ly, “ however, may sit longer in this undifferentiated phonological class until enough information becomes available to classify it reliably (see Gerken 1996 for a similar proposal). On this scenario, the highly familiar “ing” and the familiar “ly” would pattern in somewhat the same way, while the unknown “lu” ending would pattern in quite a different way. That is, children might have mastered that “ing” can occur with verbs. They might also know that “ly” is a familiar ending, but not know its function. Thus, after some hesitation, they may trust the input and assign the interpretation to the verb stem. The “lu” condition, in contrast, presents an unfamiliar morpheme to the child. Since it is not in the familiar phonological or undifferentiated class, the children may assume that it is not an ending attached to the base form and in fact choose the nonmatching alternative. To borrow from other work that Roberta and I have done, the children might see the “lu” form as so different that they (or at least the more sophisticated children) apply the lexical strategy of novel-name-nameless-category and choose the nonmatching picture for the linguistic stimulus (Golinkoff, Mervis, and Hirsh-Pasek 1994). As this alternative permits the learning of new bound forms from the input, we favor that interpretation here and are preparing further studies with less familiar bound morphemes such as -ness to assess this hypothesis. In sum, the intriguing pattern of results presented above allows us to say with some conviction that children who are just beginning to use two-word sentences can detect (and perhaps use) bound morphology to assist them in constructing the grammar of their language. To learn grammar children must (1) be sensitive to these cues for constituent structure; (2) be able to use these cues among others to label the constituents of grammar; and (3) be able to figure out how these constituent
Beyond Shipley, Smith, and Gleitman
205
structures pattern in their own native tongue. Over the last several years, we have begun to make advances on the first two of these levels. The results presented here are yet another step in this progress. What these results also highlight is the critical role that comprehension data can play in our understanding of language acquisition. As Golinkoff and I noted (Hirsh-Pasek and Golinkoff 1996a): There can be little doubt that studies on young children’s language production in the past 25 years have provided a rich source for language acquisition theories. Language production, the observable half of the child’s language performance, however, is only part of the story. Just as astronomers were not satisfied to study only the light side of the moon, so researchers in language acquisition have long recognized that access from the “dark” side of their topic—namely, language comprehension—illuminates the language acquisition process far more than the study of production alone. (p. 54) As can be seen in the analysis of the bound morpheme data, a number of advantages can be obtained by looking at comprehension data. First, these data can be used to falsify theoretical assertions about the young child’s linguistic competence. In this case, data from language production have suggested that grammatical morphemes could not be used to assist the Stage I child in the learning of grammar (Pinker 1984). Data from comprehension present a different picture, suggesting that children are sensitive to both free and bound morphemes in the input and that they might in fact be able to use this information to segment and perhaps identify grammatical constituents. Second, comprehension data allow a clearer picture of the processes of language acquisition. By the time children are producing a structure, they have already acquired that structure. The steps leading up to mastery of the structure may be masked. Comprehension data, however, allow us to examine this process. If our hypothesis is correct and children do store familiar phonological information in an undifferentiated state before associating it with particular form classes, such storage might only be visible in comprehension tasks. Finally, comprehension studies allow for methodological control that is often not possible in tests of production. With the exception of elicited production tasks (Crain and Thornton 1991), those who examine production data are often in a position of “wait and see” in which they must wait for the child to produce something in the hopes that they will see the full repertoire of what the child can produce. Taking the bound morpheme data as an example, comprehension allows us to look, specifically, at bound morphemes before they are produced.
206
Kathy Hirsh-Pasek
The research presented in this chapter, then, both replicates and expands some of the classic findings of Shipley, Smith, and Gleitman (1969). Children are sensitive to grammatical morphemes in the input that they hear. They are even sensitive to what is arguably the most difficult class of grammatical morphemes—bound morphemes. Further, as noted in Shipley, Smith, and Gleitman’s original study, comprehension does indeed precede production, and systematic examination of language comprehension can provide a more accurate measure of the child’s developing language. To fully understand what children bring to the language-learning task, how they can mine the input for cues to grammatical structure, and how they utilize a coalition of these cues to find the building blocks of grammar, we will need to conduct extensive and focused studies of their language comprehension. Conclusions The now classic Shipley, Smith, and Gleitman (1969) paper represents one area in which Lila set the stage for language research to come. She and Henry continue to be architects for our field. They not only frame research questions that must be addressed if we are to understand how young children acquire their native tongue, but they also point us in the direction of new methodologies that can address these questions. Lila and Henry will continue to influence psycholinguistic research for years to come. The field is indebted to them and I feel honored to be among those at the Gleitman dinner, among those to have been touched by their brilliance. Acknowledgments The data reported here are the product of collaborative research with Roberta Golinkoff of the University of Delaware and Melissa Schweisguth now of the University of California at San Diego. We gratefully acknowledge the support of the University of Delaware’s Honors Psychology Program through which Melissa Schweisguth helped to design the project and to collect the data. This research was also supported by an NSF grant (#SDBR9601306) awarded to Hirsh-Pasek and Golinkoff and by an NICHD grant (#HD25455-07). Finally, we thank Rebecca Brand and He Len Chung for their able assistance in the data collection and Elissa Newport for her thoughtful comments on this chapter. References Aslin, R., Woodward, J., LaMendola, N., and Bever, T. (1996) Models of word segmentation in fluent maternal speech to infants. In Signal to Syntax, ed. J. Morgan and K. Demuth. Cambridge, MA: MIT Press, pp. 117–135.
Beyond Shipley, Smith, and Gleitman
207
Bloom, P. (1990) Syntactic distinctions in child language. Journal of Child Language, 17:343–356. Bowerman, M. (1973) Structural relationships in children’s early utterances: Syntactic or semantic? In Cognitive Development and the Acquisition of Language, ed. T. E. Moore. New York: Academic Press. Brown, R. (1973) A First Language. Cambridge, MA: Harvard University Press. Crain, S. and Thornton, R. (1991) Recharting the course of language acquisition. In Biological and Behavioral Determinants of Language Development, ed. N. A. Krasnagor, D. M. Rumbaugh, R. L. Schiefelbusch, and M. Studdert-Kennedy. Hillsdale, NJ: Erlbaum. de Villiers, J. and de Villiers, P. (1973) A cross-sectional study of the acquisition of grammatical morphemesin child speech. Journal of Psycholinguistic Research 2:267–278. Gerken, L. (1996) Phonological and distributional information in syntax acquisition. In Signal to Syntax, ed. J. Morgan and K. Demuth. Cambridge, MA: MIT Press, pp. 411–427. Gerken, L. and McIntosh, B. J. (1993) The interplay of function morphemes in young children’s speech perception and production. Developmental Psychology 27:448–457. Gleitman, L. and Gillette, J (1995). The role of syntax in verb learning. In The Handbook of Child Language, ed. P. Fletcher and B. MacWhinney. Oxford: Blackwell, pp. 413–429. Golinkoff, R., Hirsh-Pasek, K., Cauley, K. M., and Gordon, L. (1987) The eyes have it: Lexical and syntactic comprehension in a new paradigm. Journal of Child Language 14:23–46. Golinkoff, R. M., Mervis, C., and Hirsh-Pasek, K. (1994) Early object labels: The case for a developmental lexical principles framework. Journal of Child Language 21:125–155. Golinkoff, R., Hirsh-Pasek, K., and Schweisguth, M. A. (in press) A reappraisal of young children’s knowledge of grammatical morphemes. In J. Weissenborng and B. Hoehle (eds.), Approaches to Bootstrapping: Phonological, Syntactic, and Neurophysiological Aspects of Early Language Acquisition. Amsterdam, Philadelphia: John Benjamins. Grimshaw, J. (1981) Form, function, and the language acquisition device. In The Logical Problem of Language Acquisition, ed. C. L. Baker and J. McCarthy. Cambridge, MA: MIT Press, pp. 163–182. Hirsh-Pasek, K. and Golinkoff, R. (1996a) The Origins of Grammar. Cambridge, MA: MIT Press. Hirsh-Pasek, K. and Golinkoff, R. M. (1996b) The intermodal preferential looking paradigm reveals emergent language comprehension. In Methods for Assessing Children’s Syntax, ed. D. McDaniel, C. McKee, and H. Cairns. Cambridge, MA: MIT Press. Jones, S. and Smith, L. (1993) The place of perception in children’s concepts. Cognitive Development 62:499–516. Katz, N., Baker, E., and MacNamara, J. (1974) What’s in a name? A study of how children learn common and proper names. Child Development 45:469–473. Kelly, M. (1992) Using sound to solve syntactic problems: The role of phonology in category assignments. Psychological Review 99:349–364. Kelly, M. (1996) The role of phonology in grammatical category assignments. In Signal to Syntax, ed. J. Morgan and K. Demuth. Cambridge, MA: MIT Press, pp. 249–263. Morgan, J., Meyer, R. P., and Newport, E. L. (1987) Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language. Cognitive Psychology 19:498–550. Morgan, J., Shi, R., and Allopena, P. (1996) Perceptual bases of rudimentary grammatical categories: Toward a broader conceptualization of bootstrapping. In Signal to Syntax, ed. J. Morgan and K. Demuth. Cambridge, MA: MIT Press, pp. 263–287.
208
Kathy Hirsh-Pasek
Pinker, S. (1984) Language Learnability and Language Development. Cambridge, MA: Harvard University Press. Pinker, S. (1994) The Language Instinct. New York: William Morrow. Saffran, J. R., Aslin, R. N., and Newport, E. L. (1996) Statistical learning by 8-month-old infants. Science 274:1926–1928. Shafer, V. L., Gerken, L. A., Shucard, J., and Shucard, D. (1995) An electrophysiological study of infants’ sensitivity of English function morphemes. Unpublished manuscript, State University of New York, Buffalo. Shipley, E., Smith, C., and Gleitman, L. (1969) A study in the acquisition of language: Free responses to commands. Language 45:322–342. Taylor, M. and Gelman, S. (1988) Adjectives and nouns: Children’s strategies for learning new words. Child Development 59:411–419. Valian, V. (1986) Syntactic categories in the speech of young children. Developmental Psychology 22:562–579.
Chapter 13 Language and Space Barbara Landau While preparing for the event to celebrate Henry and Lila, I looked through the many papers I had saved from my graduate studies at Penn. Among these were a draft of a manuscript that would be my first published paper, some notes from one of my first “seminar” presentations, and the penultimate version of the manuscript written by Landau and Gleitman on the subject of language learning by children who were born blind. These three artifacts remain, for me, palpable evidence of the impact that Lila and Henry have had on my professional life. Looking at the draft manuscript of what would eventually become the “grandmother” paper, I can recall bringing the raw data to Lila and Henry during my first year of graduate school. I had actually collected the data as part of a Masters thesis at Rutgers, directed by Adele Abrahamsen (who had introduced me to Lila the year before). When I first described the study to Lila, she listened patiently, then explained to me why the data were important and what they actually meant. She then recommended that I show the data to Henry, who spent the next several months with me explaining how to conceptualize, analyze, and present the data to make a convincing argument. Following this, I wrote a first draft, which was then edited line-by-line by both Henry and Lila. The result was a wonderful paper, and I longed for them to be the coauthors they deserved to be. But when I suggested this, they declined, telling me “this is really your work.” Nothing could have been further from the truth, but this event reflects the first lesson I learned about Lila and Henry: They are great teachers, not only for their gift in educating their students, but for their intellectual and personal generosity. Looking at my seminar notes, I recall some of my earliest experiences there. These were lengthy seminars held in the Gleitman’s living room, sometimes running formally until close to midnight, and then continuing in the Gleitman kitchen until people dropped from exhaustion. (Lila and Henry always were the last to succumb.) But they were the most exhilarating intellectual experiences I had ever had, and I rarely left without feeling privileged to have been a part of them. During one of my
210
Barbara Landau
first presentations, I told the seminar that I was interested in how blind children learn language. Henry’s immediate question was “Why?”—a question that stunned me, as it seemed self-evident that the blind would provide an interesting symmetry to the recently published work by Feldman, Goldin-Meadow, and Gleitman on language learning by linguistically deprived deaf children. But he was right to ask that question; and it became immediately apparent that the real answer would require thinking in depth about underlying assumptions, competing theories, the connections between data and theory, and what the ultimate meaning would be of different empirical outcomes. This set the stage for my education under the Gleitmans’ watch. What also stunned me was Henry’s lengthy sequel to his own question—one of many times in which he would use the student’s fledgling idea to teach. He set about brilliantly laying out the (il)logic of a question about language in the blind, followed by the logic of asking about spatial knowledge in the blind. Lila disagreed, rearticulating the question about language learning, and brilliantly reformulating it as she went. Several other seminar members joined in, and the debate continued all evening, and for many evenings thereafter. At some point during this lengthy process, my research questions became clearly formulated and I became capable of defending them to the most penetrating critic. This, I came to learn, was the format for the seminar: A student would present an ill-formed research question, Lila and Henry would rearticulate and refine it (making it sound like the student was a genius along the way), and ultimately, that reformulation would become the student’s own. This was the second lesson I learned about Lila and Henry: They are great scholars, not only for their brilliance, but for their dedication to fostering great work in others. The final item I found was the penultimate version of the manuscript written by Landau and Gleitman on the subject of language learning by children who were born blind. On these proofs were copious comments in Henry’s hand, which reminded me vividly of the intense debates we had had for the five years that we had worked on studies of the blind child. The debates revolved around the question of whether the study of the blind child was really about language, or really about space. Henry argued that the work was really about space, for if one could only understand how spatial knowledge was constructed in the absence of visual experience, it would follow trivially that language could be learned. Lila argued that the work was really about language, for although it was fascinating to learn how spatial knowledge could be built upon nonvisual experience, it was impossible to understand how certain aspects of language could be acquired unless one considered the principles of language itself as they interacted with experience.
Language and Space
211
Was the study really about language, or was it about space? The question found itself perfectly poised within the larger group at Penn, which included two critical members of the psychology department (Liz Spelke and Randy Gallistel) as well as other members of the Sloan Group (a group of linguists, psychologists, and computer scientists at Penn dedicated to the emergence of cognitive science). Within this context, I think we all finally concluded that it was truly about both—that one could not understand how the blind child learned language unless one understood how any child could come to represent the spatial world, come to represent the formal system of human language, and, most critically, come to map these two together. But we only came to this conclusion, I think, after years of debate, during which I learned to present ideas, to defend ideas, to criticize ideas, and to admire ideas, all in the context of early morning coffees, late-night meetings, and perennial support, both personal and professional. Thus I learned my third lesson about Henry and Lila: They are great mentors, for they give to their students intellectual direction for life. The set of profound and difficult issues that were laid out under Henry and Lila’s guidance during these years formed the subject matter of Language and Experience (Landau and Gleitman 1985), and have continued to guide me since that time. 1.0 Initial Findings and Promissory Notes In trying to understand how spatial experience is used during language learning, we began with the simple hypothesis of John Locke (1690): If we will observe how children learn languages, we shall find that, to make them understand what the names of simple ideas or substances stand for, people ordinarily show them the thing whereof they would have them have the idea; and then repeat to them the name that stands for it, as ‘white’, ‘sweet’, ‘milk’, ‘cat’, ‘dog’. (Book 3.IX.9) In our empirical studies of the blind child, however, we made some rather surprising discoveries that could not be explained by Locke’s hypothesis: The blind child developed a normal vocabulary, complete with rich representations of visual terms—spatial terms, color terms, and visual verbs such as look and see, which clearly could have had no basis in “showing” things and “repeating the name.” At the end of Language and Experience, we concluded with a much more complex hypothesis about word learning: To explain how lexical learning based on different introducing circumstances in some domains yields up categories whose
212
Barbara Landau
substance and boundaries are much alike (e.g. see to blind and sighted children), we have argued that humans are endowed with richly specified perceptual and conceptual principles that highlight certain construals of experience and suppress others; endowed with linguistic principles about which discriminations among millions of salient ones are lexicalizable; endowed with principles for manipulating the speech presented to the ear in certain ways, but not in many other potentially available ways; and endowed with principles for pairing the perceptual-conceptual discriminanda with the lexical items. (p. 202) Simply put, we proposed that there are universal principles that guide the acquisition of new words despite very different kinds of experience. At the same time, we proposed a very specific role for experience: Regardless of how richly structured a child’s innate knowledge, some information from the environment must also be used to determine the meaning of any word. This is because any given word might be compatible with an infinite number of possible meanings, but the child cannot know in advance just which meaning is the one that the speaker intends. For this reason, information from the environment— together with the learner’s natural predispositions in interpreting words—can help serve as a “mental pointer” to the correct intended meaning. In the case of visual terms such as look and see, we proposed that the blind child could have used the syntactic contexts in which the verbs occurred together with the nonlinguistic spatial contexts in which the word was used—the contexts of haptic exploration, in which she could truthfully exclaim, “Let me see camera!” As we knew at the time, the work on the blind child left many promissory notes. Fleshing out our hypothesis and testing its truth would depend on detailed studies of those richly specified perceptual and conceptual principles that highlight certain construals of experience and suppress others; the kinds of linguistic principles that specify which discriminations are relevant to the lexicon; the kinds of principles used in manipulating speech; and the kinds of principles that exist for pairing the two. A substantial amount of progress has been made in each of these areas under Lila and Henry’s guiding hands and their descendants (see Gleitman and Gleitman 1997; and chapters by Fisher, Hirsh-Pasek, Goldin-Meadow, Naigles, and Newport, this volume). Over the past twelve years, I have directed my attention to specific aspects of these problems, focusing on the acquisition of words in two different ontological domains—objects and places. In both cases, I have spent a fair amount of time puzzling about the kinds of perceptual, conceptual, and linguistic principles that could in fact be brought to bear on
Language and Space
213
the indeterminacy problem. What kinds of initial biases are there in the ways in which learners represent objects, places, paths, and events? How do languages encode these notions? What skeletal conceptual and perceptual structures might map these onto various formal linguistic devices, and thereby serve as an engine for further learning? How do learners use their spatial and linguistic knowledge to learn the meanings of new words? It turns out that the investigation of two domains (object, place) is more than twice as complex as the investigation of one domain, and this has necessitated a kind of breadth that is the foundation for cognitive science today, but that served as a cornerstone of the Gleitman research group long before it became fashionable. For example, the fundamental organizing principles of what is “salient” to the learner are qualitatively different in the two cases of object and place. In the case of objects, we must consider how objects are represented by the learner, how different individual objects are grouped into categories based on different kinds of similarity, what kinds of categories deserve to be lexicalized, how these different kinds of categories are formally encoded in languages of the world, and how learners then actually learn the names for specific categories. None of these is simple. In the case of places, we must consider how a learner represents places geometrically, what kinds of geometric and force-dynamic relations deserve to be lexicalized, how these relations are formally encoded, and how learners acquire these place terms. The geometric representation of “place” appears to be quite tightly constrained in humans and other species (Gallistel 1990; Hermer and Spelke 1997; Landau, Spelke, and Gleitman 1984). Moreover, it is substantially different from the representation of objects qua objects, even though objects occupy locations and languages very often encode an object’s location with reference to other objects (Landau and Jackendoff 1993). Further, these linguistic terms appear to encode both more and less than the geometric properties engaged for navigation, and may constitute a distinct kind of semantic category specialized for talking about location. Add to this substantial cross-linguistic variability: In natural languages, spatial relationships are universally encoded as predicates—formal expressions of relationships—but their specific linguistic form class may be the verb, preposition, postposition, or even various nominal markers (such as terms for “head-of” or “foot-of”; see Levinson 1992). Across these forms, there is a fair amount of cross-linguistic variability in the kinds of spatial relationships that are encoded (e.g., English on covers a broader group of cases than German aan or auf; see Bowerman 1996), although these differences may reflect featural choices based on universal semantic properties (Landau and Jackendoff 1993; Landau 1996).
214
Barbara Landau
In the remainder of this chapter, I will confine myself to work on how objects are encoded for the purpose of naming. In this work, I have tried to fill at least some of the promissory notes left by Language and Experience. Even within this domain, things are quite complex. 2.0 Objects Named One of the most important findings of the blind study is one that is not cited very often: Blind children develop a vocabulary of object names that is virtually indistiguishable from that of sighted children of the same ages. Thus, with or without visual experience, children acquire roughly the same names for roughly the same kinds of objects and these names are generalized appropriately with little explicit tutoring. What is the basis for this learning? It is commonplace to assume that generalization of object names is based on the child’s understanding that object names are cover terms for “object kinds”—objects that are considered by the linguistic community to be relevantly similar to each other (Markman 1989). Much debate has revolved around the nature of these similarities—whether their foundation is innate knowledge of basic ontological kinds (object, substance, etc.) or whether the similarities are learned through language (for different views, see Keil 1979; Quine 1960; Soja, Carey, and Spelke 1991); whether the similarities holding among “natural kind” objects are qualitatively different from those holding among manmade objects (Kripke 1977; Putnam 1977; Malt and Johnson 1992); whether the similarities are specific to lexicalized entities, or are general similarities that are prepotent in all kinds of similarity tasks (Landau, Smith, and Jones 1988; Markman and Hutchinson 1984; Smith, Jones, and Landau 1996; Waxman and Markow 1995). To some, the very notion of similarity as a theoretical construct is misguided, too slippery to ever play a significant role in theory construction (Goodman 1972). But things that fall under the same object name are similar to each other in some sets of ways and not in others; and if we are to understand how it is that blind and sighted children can easily learn to assign a name to only certain objects (and not others), we must ask what kinds of similarity do matter in object naming, and what kinds do not. The question thus is not whether similarity matters, but what kinds of similarity matter, and how these differ for different domains, for different tasks, and for different developmental moments. Quine (1969) proposed two quite different kinds of similarity: One is “intuitive” similarity, present in many species and rooted in the sensory and perceptual systems—for example, similarities among colors— which are a strict function of the neural structures and psychological/ computational mechanisms that determine color perception. A second
Language and Space
215
kind is “theoretical” similarity; this allows us to construct and observe similarities that go beyond the perceivable properties of objects. Theoretical similarities are especially useful in explaining why things fall into the same named category and might include similarities based on feeding or reproductive behavior, social behavior, evolutionary considerations, or highly specific goals guiding categorization (Medin and Coley 1998; E. Shipley, this volume). Quine says: A crude example (of theoretical similarity) is the modification of the notion of fish by excluding whales and porpoises. Another taxonomic example is the grouping of kangaroos, oppossums, and marsupial mice in a single kind, marsupials, while excluding ordinary mice. By primitive standards the marsupial mouse is the more similar to the ordinary mouse than to the kangaroo; by theoretical standards the reverse is true. (p. 167) Clearly, our mature knowledge of object categories must engage similarities that are not necessarily perceptual. People regularly make decisions about kinship on the basis of true blood relationships rather than appearance: A grandmother is the mother of a parent; she may or may not have gray hair and wrinkles, even though four-year-old children may indiscriminately call all gray-haired, wrinkled women “grandmas” (Landau 1982). Analogously, when scientists decide how to classify objects in nature, they rely on properties deemed to be important in current scientific understanding of the nature of different kinds, for example, an animal’s digestive or reproductive system, its lineage, its ecological niche. Recent work in cognitive development has shown that, from the age of around four, children also make judgments based on similarities other than the perceptual (Gelman and Wellman 1992; Keil 1989). Yet acknowledging that such bases for classification are possible does not mean that perceptual—intuitive—similarity is unimportant, nor even that it is less important. Consider the recent headline in the New York Times Science Section (Sept. 19, 1995); and see fig. 13.1. Strange Bird Must Think It’s a Cow The article describes the work of Alejandro Grajal, an ornithologist who studied the hoatzin, a tropical bird with a digestive system called “foregut fermentation” similar to that of cows as well as Colombine monkeys, kangaroos, and tree sloths. This discovery clearly is important in understanding the nature of the species; but it is unlikely to cause a change in what the thing is called—bird. Presumably, this animal was originally named at a time when the nature of the animal’s digestive system was unknown; this “original dubbing ceremony”
216
Barbara Landau
Figure 13.1. Strange Bird Must Think It’s a Cow (Reproduced with permission of Dr. Grajal)
(Kripke 1977) may have been conducted on the basis of perceptual similarities. What was known about the animal and what therefore may have determined its name was its appearance and probably, how it behaved. Similarly, in Quine’s example, the marsupial mouse is similar to ordinary mice in its appearance, hence it we call it “mouse,” despite its closer theoretical similarity to the kangaroo. These uncontroversial facts raise an important question: Why is intuitive similarity sometimes a better predictor of an object’s name than theoretical similarity? The answer may have to do with learning by young children. First, even if theoretical similarities do play an important role in early development, such similarities can be hard for young learners to discover. Even when they are relatively easy to discover, there must be some mechanism for them to be linked with a relatively quick and reliable “identification” function—the function that gives us our first hypotheses about which of the objects in the world do belong to a given category (see, e.g., Armstrong, Gleitman, and Gleitman 1983; Landau 1982; Smith and Medin 1981; Keil 1994). For it to be a useful function for learning by infants and young children, the mechanism should select a property or properties that are easily picked up by learners and highly predictive of category membership. If so, then the learner will not go astray even in the earliest stages of word learning. 2.1.0 Object shape: A privileged kind of similarity for early object naming What kinds of similarities could serve this function? Abundant research
Language and Space
217
in visual perception tells us that three-dimensional object shape is critical to object recognition in adults (Marr 1982; Biederman 1987). In some current theories, basic-level objects—airplanes, cups, cars—are recognized through decomposition into parts, either by analysis of contour minima (Hoffman and Richards 1984) or by specific arrangements of volumetric primitives (Biederman 1987). Perhaps not coincidentally, the basic level seems to provide an easy entry point in object naming (Brown 1957; Rosch, Mervis, Gray, Johnson, and Boyes-Braem 1976; Waxman and Markow 1995). Two-dimensional outline drawings can engage representations of objects as well, producing rapid and errorfree object recognition and identification at the basic level in adults. Surface color appears to be much less important in the process of identification (Biederman and Ju 1988). Could representations of object shape underly object recognition by infants as well—and hence be a plausible candidate representation to be engaged during early learning of object names? Recent results show that four-month-old infants can recognize the similarity among chairs, compared with couches or tables, even when the objects in question have quite complex configurations (Behl-Chadha 1996), thus suggesting that complex perceptual similarities are computed well before the child learns names for things. This idea is consistent with the classic findings of Hochberg and Brooks (1962): These investigators prevented their own infant from observing any two-dimensional representations of objects over the first year and a half of life. At the end of the period, the child had acquired a reasonable vocabulary of object names, presumably on the basis of observing real three-dimensional objects and hearing them named. His parents now showed him line drawings of familiar objects, and asked him to name them—which he did. Results such as these suggest that object shape may provide a privileged kind of similarity in the early acquisition of object names. Although there is in principle an infinite number of possible interpretations of a novel word (one manifestation of the indeterminacy problem for language learning), learners who entertained each of these interpretations would be lost in the wilderness while trying to learn the word “dog.” Fortunately, this does not appear to happen. 2.1.1 The shape bias In a number of studies, my collaborators and I have shown that young children do show a preference for generalizing the names of novel objects on the basis of shape (Landau, Smith, and Jones 1988). The task here is simple: Two- and three-year-olds are shown a novel artifact-type object and hear it named, for example, “See this? This is a dax.” Then they are shown a series of test objects, one at a time, and asked each time, “Is this a dax?” When a test object is the same shape as the original
218
Barbara Landau
object but differs from it in size, color, or surface texture, subjects as young as two years of age and as old as adults will accept the object as an instance of “a dax” (Landau et al. 1988; Landau et al. 1992; Smith, Jones, and Landau 1992). However, when a test object has a different shape from the original—even if it is just the same in size, color, or texture—children and adults alike tend to reject it, saying it’s “not a dax.” It is important to note that shape is not equally salient across all contexts, but rather appears to be especially salient in the context of object naming. For example, it is not the preferred pattern of generalization when the task is converted to a similarity task that does not involve a word: If just asked whether a test item “matches” or “goes with” or is “the same as” a standard object, children are much more likely to show preference patterns that are based on overall stimulus salience (such as brightness or surface texture; Smith et al. 1992), thematic sorting preferences (Markman and Hutchinson 1984), or perhaps the overall framework of stimulus choices and context, which serves as a mental pointer to the “relevant” dimension of similarity for adults (Medin, Goldstone, and Gentner 1990). The preference for shape in object naming shows up not only in experimental contexts, but in very many naturalistic contexts that reflect the use of our mental representations of objects for goals other than language. One need not be a scholar of art to recognize the important role that shape plays in explicit representations of objects. Two sculptures of Claes von Oldenburg are excellent examples: One is his monumental metal “The Clothespin” in downtown Philadelphia. Sixty feet tall and sculpted of steel, The Clothespin looks just like one. A second is his “Bicyclette Ensevelie” (Buried Bicycle)—sections of handlebar, wheel and seat arranged on the ground over the span of a large outdoor park in Paris, yet immediately recognizable as a representation of a bicycle protruding from the ground. Our eager adoption and quick understanding of these names as they label explicit external representations of objects suggests a profound importance for object shape in the task of object naming. 2.1.2 The critical role of syntax The results on shape and naming begin to tell us about preferred object representations as the entry point for language learners, but they also tell us about constraints on the linguistic side: The preference for shape is specific to syntactic contexts that are appropriate for object naming. Landau and Gleitman (1985) and Gleitman (1990) proposed that syntactic contexts are critical for establishing the meanings of verbs. Our recent work on object naming extends work started by Brown (1958),
Language and Space
219
showing that count nouns, mass nouns, and adjectives also serve as mental pointers to different basic aspects of meaning. In English, the count noun context is appropriate for object naming; in these contexts, a noun is combined with determiners such as “a” and “an” and quantifiers such as numerals. This syntactic context marks the fact that the named entity is discrete and countable; such entities range over concrete and abstract objects (such as “dog” and “belief,” respectively), and might best be characterized as “individuated entities” (Bloom 1996b). For entities that are not countable—such as substances—English uses the mass noun context, in which the noun is introduced by determiners and quantifiers such as “some” and “more.” Mass nouns can also be quantified by classifiers such as “a piece of,” “a pile of,” “a hunk of,” (granite, sand, chocolate). Adjectives name properties, including object properties such as specific shape, texture, and color. Young children are quite sensitive to the syntactic context in which a word occurs, and their generalization differs accordingly. Although children generalize on the basis of shape in the count noun context, they generalize on the basis of surface texture or coloration in the context of adjectives (e.g., “This is a daxy one,” Landau et al. 1992; Smith et al. 1992), on the basis of material substance in the context of mass nouns (e.g., “This is some dax,” Subrahmanyam, Landau, and Gelman 1997), and on the basis of the object’s location in the context of prepositions (e.g., “This is adax the box,” using a novel form that is morphologically similar to known prepositions such as “across” or “adjacent” [Landau and Stecker 1990]). In each of these contexts, children’s attention to particular properties is strongly modulated by syntactic context. Thus the syntactic context serves as a critical mental pointer to different fundamental ontological categories—object, property, substance, and place. Many results now indicate that the influence of syntactic context in constructing meaning grows over development, allowing children to move beyond their initial biases for representing objects and events. Consider objects. If a speaker wishes to refer to the object itself, and does so describing it with the sentence “This is a dax” or “What a nice dax,” the child’s preferred representation—one engaging the object’s shape—will often be sufficient for the learner to attend to just what the speaker has in mind, namely, the object itself. But suppose the speaker wishes to talk about the material of which the object is made, rather than the object itself. If object shape is a preferred representation, then other properties—including material—might be ranked lower, and therefore it might be more difficult to switch attention from one’s preferred representation to the one that the speaker actually has in mind.
220
Barbara Landau
The argument here is not that material, color, surface texture, or location cannot be represented, nor that they are somehow less “natural” than object shape. Rather, understanding what someone is saying requires that the listener direct his or her attention to just that interpretation intended by the speaker. If object shape is highly salient under a variety of conditions, then it should be relatively difficult for the young learner to pry his or her attention away from shape toward some other property, until syntax plays a strong enough role. Recent results from Subrahmanyam et al. (1999) have shown significant growth in children’s ability to use syntactic context to modulate their interpretations of the speaker’s intended meaning. At three years of age, children who observe a rigid three-dimensional object are biased to generalize a novel noun on the basis of shape whether they hear the noun in the context of count or mass noun. By five years of age, however, children who observe such an object will generalize on the basis of shape when they hear a count noun, but on the basis of material substance when they hear a mass noun. Adults do the same, strongly and absolutely. A very similar developmental course has been found in the domain of events. Fisher, Hall, Rakowitz, and Gleitman (1994) found that threeyear-olds show a strong “agency” bias in interpreting the meanings of novel action verbs that are presented without syntactic context. For example, if children observe a scene in which one toy animal hands a ball to another toy animal, and they hear “Look! Ziffing!” the children assume that “ziffing” means “giving” rather than “taking,” even though both verbs are plausible descriptors of the scene. This bias to encode the verb as one that focuses on a causal agent is present among adults as well. However, when the novel verb is presented in a syntactic context, this agency bias is overriden: Subjects at all ages interpret the verb as “give” if they hear “The elephant is ziffing the ball to the bunny” or “take” if they hear “The bunny is ziffing the ball from the elephant.” Importantly, Fisher et al. found that the role of syntax grows over development, starting out by modulating children’s interpretations only probabilistically, but ending by modulating adults’ interpretations strongly and absolutely. Thus, for object names and for action verbs, it appears that the developmental course begins with young children interpreting new words in concert with their perceptual and conceptual biases—especially when these correspond to the interpretation offered by the syntactic context. The developmental course ends with older children (and adults) depending quite strongly on syntactic context, overruling preferred perceptual interpretations. The great genuis of language, of course, is to carry us beyond our perceptual biases. At the same time, the great ge-
Language and Space
221
nius of perceptual biases may be to allow learners a wedge into the linguistic system at all. 3.0 Objections and Responses In the Gleitman research seminar, empirical findings were always treated with respect but also tempered with a healthy dose of skepticism. With a scowl, Henry might ask: “Let us suppose that shape is the preferred dimension of generalization for object naming. Still, I am worried. . . . What could this mean?” Indeed, since the earliest publication of this work on shape and naming, it has met with many objections and challenges. These are the most critical: 1. The objects used in most of the shape studies are novel artifacts invented for the purposes of experimentation—a poor representation of the artifacts that actually exist in the world. Even as artifacts, their simple geometric design suggests no plausible function. Because they do not belong to any existing natural category, they force a preference for shape in the absence of any information suggesting alternatives (p.c., audience for virtually every colloquium in which I have presented these findings). 2. In any case, the true representations underlying an object’s name are representations of its “kind.” Young children do not seek to put together objects of the “same shape,” but rather, objects of the “same kind” (Soja, Carey, and Spelke 1992). In the case of artifacts, our true criterion for membership in the same kind is neither apparent function nor appearance, but rather the creator’s intention (Bloom 1996a). These objections call for empirical and theoretical response. First, the idea that lack of functional information may lead to a default reliance on shape calls for a direct empirical test: We can provide learners with additional, richer information and find out whether their patterns of generalization change. Second, the idea that young children are really seeking to name objects of the same kind with the same name calls for more explicit theoretical discussion of the possible links among same shape, same kind, and same name. 3.1 The Role of Function and General World Knowledge Recently, we have investigated the role of functional information in challenging the shape bias (Landau, Smith, and Jones 1997; Smith, Jones, and Landau 1996). Simply put, we have asked whether providing clear functional information about an object will lead children and adults to generalize its name on the basis of properties that can support its function. If so, this would suggest that people’s reliance on shape
222
Barbara Landau
occurs only in circumstances in which they have relatively impoverished information about other characteristics of the object. If function does not enter into naming, however, this would suggest a rather strong hypothesis about the nature of naming and the role of shape perception, specifically, that naming might be cut off from more thoughtful, reflective processes that act to store and manipulate our general knowledge about objects. To anticipate, we have gone to lengths to make functional information salient—even using familiar functions—but have found that young children are quite resistant to naming on the basis of functional properties. This is despite the fact that the same young children are perfectly capable of using functional information to make other (nonnaming) judgments about objects. In contrast to the pattern found among young children, functional information does readily enter into adults’ naming judgments, suggesting dramatic and important developmental changes in the kinds of information that enter into the learning of object names. In one set of experiments, we studied two-, three-, and five-year-olds’ and adults’ naming patterns with and without functional information (Landau et al. 1997). Subjects in the Function condition were provided with information about function while they heard the objects named, whereas subjects in the No Function condition only heard the name and thus knew nothing about the objects’ intended functions. Subjects in the Function condition were told very explicitly what the objects were for, for example: “This is a dax. Daxes are made by a special company just so they can mop up water”; or “This is a rif. And this is what I do with it. I use it to pull toys from across the table.” After hearing the standard object named (and, in the Function conditon, observing the function and hearing it described), all subjects were asked to generalize the object’s name to new objects. Some test objects were the same shape as the standard, but could not carry out its designated function; others were a different shape from the standard, but could carry out the function. In addition to asking subjects whether each of these objects was “a dax,” we also independently checked on how much subjects knew about the objects’ functions by asking them directly which of the objects could carry out specific functions. Over three experiments, the objects varied in shape, material, and other properties relevant to function. For example, in one experiment, we used objects with simple geometric shapes, composed of materials that could support specific functions (cork for one set, with the function of holding stick pins; sponge for a second set, with the function of mopping up water). In another experiment, we used objects similar to known familiar objects with easily understood functions: novel containers (used to carry water) and canes (used to retrieve toys from
Language and Space
223
across a table). In a third experiment, we used bonafide artifacts: combs and clothespins. The results across the three experiments were remarkably consistent. Two- and three-year-olds, whether instructed about function or not, generalized the standard’s name on the basis of shape. This was true across the range of objects, from “nonsense” objects with simple geometric shapes to well-known objects such as the comb or clothespin. In the latter case, this meant that children were more willing to generalize the name “comb” to a paper cut-out having the identical shape and size of the original comb than to objects that could carry out the function (but were clearly different shapes from the standard comb). Similarly, the children were more likely to generalize a novel name “dax” or “rif” to a container that was the same shape as the standard, even if it had holes in the bottom and so could not carry out the designated function of carrying water. Thus the pattern of naming among young children was consistent with the critical role of shape in early object naming. The pattern among adults was quite different, however. Adults who saw novel objects and were not told about their functions generalized freely on the basis of shape—just as they did in earlier shape studies. Adults who saw novel objects and were told about the objects’ functions, however, generalized the name on the basis of the objects’ functional properties—either material and substance or global object properties such as length and rigidity that were critical to the demonstrated functions. Finally, adults who observed real, familiar objects (the comb, clothespin) also generalized on the basis of shape, but importantly, they were very conservative, rejecting many same-shape objects as well as same-function objects. For example, adults were willing to call a paper cut-out a “comb,” but they did so only with reluctance, as shown by their lower rates of acceptance. Those who rejected these items tended to spontaneously add comments such as “Well, yes, you could call it a comb, but it’s really a piece of paper in the shape of a comb.” Five-yearolds showed a similar pattern to adults, though somewhat weaker. Thus the developmental picture is complex. For young children, naming seems to be governed by shape similarity (in these contexts), and functional information is unlikely to enter into naming decisions (but see Kemler-Nelson 1995 for a different set of findings). For older children and especially adults, the importance of shape similarity seems to be strongly modulated by a variety of factors: whether the object is familiar, how much functional information is known, and general world knowledge (see also Malt and Johnson 1992). Adults appear to have a rather complex metric for deciding whether form or function matters the most, but children’s decisions appear to be considerably simpler.
224
Barbara Landau
It is important to note that when children were directly queried about function, their responses were quite different. They did not generalize on the basis of shape, and they often made the correct judgments about function—especially when the objects were familiar. This is consistent with research showing that appreciation for object functions begins in infancy (Brown 1990; Kolstad and Baillargeon 1991). So children clearly understood something about the objects’ functions. Note, however, they were not perfect; and there was pronounced development in how the children articulated their knowledge. For example, five-year-olds in our studies were quite knowledgeable, able to tell whether each of the objects could carry out specific functions whether they had been instructed with the standard or not. But two- and three-year-olds’ knowledge was much spottier: Three-year-olds could determine whether an object could carry water or retrieve a toy, and whether something would work to comb the hair or hang clothes. They were less good at determining which objects could mop up water (sponge) or hold a stick pin (corkboard). Two-year-olds’ knowledge was even shallower: They could tell which objects would work to comb the hair or hang clothes, but only by actually trying out the objects in question (on a model head of hair or a toy clothesline). Furthermore, their reasoning as they decided what would “work to comb your hair” revealed immature knowledge of what it would take: In many cases, merely contacting the hair with the comb appeared to suffice for the judgment that “it worked” to “comb hair,” confirming what every parent knows. To summarize, shape and function appear to play different roles in object naming for children than for adults. Functional information does not appear to enter into object naming among young children even when—in other tasks—the same age children can show that they do understand some aspects of the objects’ functions. A variety of studies, including ours, have found that functional information begins to be firmly integrated into object-naming judgments from about age four or five on (Gentner 1978; Merriman, Scott, and Marazita 1993; Landau et al. 1997; Smith et al. 1996; but see Kemler-Nelson 1995 for a different timetable). This suggests that early object naming may be cut off from the influences of many kinds of general world knowledge. Furthermore, children’s understanding of functions undergoes considerable enrichment between the ages of two and five years, only beginning to approximate adult knowledge well past the time when the object vocabulary is first learned (see also Keleman 1995; Matan 1996). In contrast, functional information does appear to play an important role in object naming among adults, particularly when one must decide whether and how to extend a novel object name. For familiar objects, shape and function appear to cohabit adult mental representations, with each
Language and Space
225
dominating the other under different circumstances (Malt and Johnson 1992). 3.2 How Are Same Shape, Same Kind, and Same Name Related in Language Learning and Mature Naming? For a shape bias to make sense, it is necessary to link it to the notion of “same kind.” Names for categories of things are names for things of the same kind: We call “cups” those things that are grouped together in virtue of their membership in some kind. This kind of observation has led some to argue that the “true” basis for object naming in young learners and in adults is “same kind,” not “same shape” (Soja, Carey, and Spelke 1992). That is, children seek to name objects in accord with their kind, not in accord with their shape. Assuming that this is true for learners as well as adults, it is still incomplete. Although it postulates a link between the notion of object kind and object name, it does not solve the problem of how the child can tell which objects do, in fact, belong to the same kind. Because there are large differences between what young children know and what their caregivers know, it is likely that the “true” criteria for selecting what belongs in the same kind will change over development. We have seen an example of this with function: Although young children do not appear to consider object function in their judgments of what a thing is called, adults do. The same is undoubtedly true for natural kind objects: Because children’s knowledge changes dramatically over development (Carey 1985), it is likely that older children and adults will use different properties—and possibly even different principles—to categorize members of these categories. Recent evidence shows that, even among adults with different background, there are large differences across groups in their criteria for categorizing different kinds of plants (Medin et al. 1997). Why, then, does young children’s naming of objects seem to match— more or less—the naming of adults around them? It seems likely that shape similarity plays an important role here: If young children generalize on the basis of similarity in object shape, then their object naming will often match that of adults. Thus communication can proceed because of a relatively straightforward commonality in the mind of the child learner and the adult. Whether or not young children are truly searching for same kind, an initial reliance on shape similarity will set them on the road to acquiring an object name vocabulary. I say “initial” because it is clear that same shape is neither necessary nor sufficient in mature judgments of same kind objects (see Bloom 1996a for some controversies and examples that push our own intuitions).
226
Barbara Landau
However, shape is an excellent beginning, because it correlates quite strongly with same kind: Objects of the same shape very often are members of the same kind. Objects of the same shape and same kind are often called by the same name by the child’s linguistic community. This means that any child who is sensitive to the correlations between same shape, same kind, and same name will often generalize object names correctly, in agreement with those around them. Of course, objects of the same kind do not always share the same shape, and this is why a shape bias can only provide an initial, though crucial, bootstrap into the object-naming process. What happens to push children toward more complex information in considering which objects should have the same name? In some cases, children will have the opportunity to hear the same object name applied to two objects having very different shapes. Elizabeth Shipley and I have investigated what kinds of generalization occurs in such cases, and we have found that young children are likely to “fill in” the intermediate space, generalizing the name to all objects along the similarity line that fits between the two standards (Landau and Shipley 1996). In contrast, if children hear two different object names applied to the same two objects, they will generalize as if two separate categories exist. This latter finding is consistent with the fact that, across languages, children will have to create somewhat different distinctions (Imai and Gentner 1997). Thus, even starting with a bias to generalize on the basis of shape, young children are still free to modulate this bias in accord with the distribution of names for objects in his or her language. The importance of a shape bias is that it provides an initial guide to same kind, and thus same category, through a completely natural link: The same representational system that underlies object recognition is linked to the system that underlies object naming. Note that this guide may be quite strong: Our initial studies of object function suggest that the tendency to map same shape to same name may be separated, cut off from the influences of many other kinds of knowledge. If so, this would prove beneficial as well—a simple hypothesis, which is often correct, may be better for young learners than a complex one that requires sustained, thoughtful reflection. Without such a constraint, the learner might choose any similarity as the basis for belonging in the same kind—and the chances would be that different learners would have different conjectures, different also from those of the language community around them. By engaging the simple hypothesis that same shape licenses same name, the learner is provided guidance—in advance of knowing the category—to “highlight certain construals of experience and suppress others.” Such a conjecture, one that naturally
Language and Space
227
links same shape with same kind, is necessary; for no matter how much innate knowledge the child has, she will require a means to map this knowledge to things in the world. Thus the shape bias can serve as a mechanism for getting learners started. Once a common vocabulary is established, it is possible to communicate other kinds of information about objects—artifact functions, the intentions of those who design artifacts, animal behaviors, mechanisms of respiration in plants, etc. Without a common vocabulary, however, none of this is possible. 4.0 Conclusions In some respects, work on object naming and its link to the human object recognition system might seem a far cry from research designed to determine how the blind child learns language. However, the principles that are revealed in the two cases are surprisingly similar: In both cases, it has seemed important to specify not just what the child’s innate knowledge might be like, but also, how he or she might use that knowledge to learn the words for objects and events. In both cases, the role of spatial representation is prominent: We cannot answer the question of how one learns words for objects and events without understanding the representational systems that underlie our nonlinguistic knowledge of these. In both cases, the role of linguistic representation is prominent: We cannot know how learning proceeds without understanding how formal linguistic devices are used by learners to “point” toward different aspects of meaning. My own need to understand both systems of knowledge and their interaction stems directly from the questions raised by Henry and Lila early in my career: Is this work really about language, or is it really about space? The ensuing framework has provided me with direction over the years since work on the blind child—direction which continually reminds me how deep, complex, and mysterious it is that any child, blind or sighted, can learn to talk about what she perceives. For this, I thank them. Acknowledgments Preparation of this paper was supported by grants RO1 HD-28675 from NICHD, RO1 MH-55240 from NIMH, and a General University Research Grant from the University of Delaware. I wish to thank Cynthia Fisher, Elissa Newport, and Elizabeth Shipley for helpful comments on the paper.
228
Barbara Landau
References Armstrong, S., Gleitman, L. R., and Gleitman, H. (1983) What some concepts might not be. Cognition 13:263–308. Behl-Chadha, G. (1996) Basic-level and superordinate-like categorical representations in early infancy. Cognition 60:105–141. Biederman, I. (1987) Recognition-by-components: A theory of human image understanding. Psychological Review 94:115–147. Biederman, I and Ju, G. (1988) Surface vs. edge-based determinants of visual recognition. Cognitive Psychology 20:38–64. Bloom, P. (1996a) Intention, history, and artifact concepts. Cognition 60:1–29. Bloom, P. (1996b) Controversies in language acquisition: Word learning and the part of speech. In Perceptual and Cognitive Development, ed. R. Gelman and T. Kit-Fong Au. San Diego: Academic Press. Bowerman, M. (1996) Learning to structure space for language: A cross-linguistic perspective. In Language and Space, ed. P. Bloom, M. A. Peterson, L. Nadel, and M. F. Garrett. Cambridge, MA: MIT Press. Brown, A. (1990) Domain-specific principles affect learning and transfer in children. Cognitive Science 14:107–133. Brown, R. (1958) Words and Things: An Introduction to Language. New York: Free Press. Carey, S. (1985) Conceptual Change in Childhood. Cambridge, MA: Bradford Books/MIT Press. Fisher, C. (2000) Partial sentence structure as an early constraint on language acquisition. (Chapter 16 in this volume.) Fisher, C., Hall, G., Rakowitz, S., and Gleitman, L.R. (1994) When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. In The Acquisition of the Lexicon, ed. L. R. Gleitman and B. Landau. Cambridge, MA: MIT Press. Gallistel, C. R. (1990) The Organization of Learning. Cambridge, MA: MIT Press. Gelman, S. and Wellman, H. (1991) Insides and essences: Early understandings of the non-obvious. Cognition 38:213–244. Gentner, D. (1978) What looks like a jiggy but acts like a zimbo? A study of early word meaning using artificial objects. Papers and reports on child language development 15:1–6. Gleitman, L. R. (1990) The structural sources of verb meanings. Language Acquisition 1:3–55. Gleitman, L. R., and Gleitman, H. (1997) What is a language made out of? Lingua 99:1–27. Goldin-Meadow, S. (2000) Learning with and without a helping hand. (Chapter 9 in this volume.) Goodman, N. (1972) Problems and Projects. Indianapolis: Bobbs-Merrill. Hermer, L. and Spelke, E. (1994) A geometric process for spatial reorientation in young children. Nature 370:57–59. Hirsh-Pasek, K. (2000) Beyond Shipley, Smith, and Gleitman: Young children’s comprehension of bound morphemes. (Chapter 12 in this volume.) Hochberg, J. and Brooks, V. (1962) Pictorial recognition as an unlearned ability: A study of one child’s performance. American Journal of Psychology 75:624–628. Hoffman, D. and Richards, W. (1984) Parts of recognition. Cognition 18:65–96. Imai, M. and Gentner, D. (1997) A cross-linguistic study of early word meaning: Universal ontology and linguistic influence. Cognition 62(2):169–200. Keil, F. (1979) Semantic and Conceptual Development: An Ontological Perspective. Cambridge, MA: Harvard University Press.
Language and Space
229
Keil, F. (1989) Concepts, Kinds, and Cognitive Development. Cambridge, MA: Cambridge University Press. Keil, F. (1994) Explanation, association, and the acquisition of word meaning. In Acquisition of the Lexicon, ed. L. R. Gleitman and B. Landau. Cambridge, MA: MIT Press, 169–198. Keleman, D. (1995) The Teleological Stance. Ph.D. thesis, University of Arizona. Kemler-Nelson, D. (1995) Principle-based inferences in young children’s categorization: Revisiting the impact of function on the naming of artifacts. Cognitive Development 10:347–354. Kolstad, V. and Baillargeon, R. (1991) Appearance and knowledge-based responses to containers in infants. Unpublished manuscript. Kripke, S. (1977) Identity and necessity. In Naming, Necessity, and Natural Kinds, ed. S. Schwartz. Ithaca: Cornell University Press. Landau, B. (1982) Will the real grandmother please stand up? The psychological reality of dual meaning representations. Journal of Psycholinguistic Research 11(1):47–62. Landau, B. (1996) Multiple geometric representations of objects in languages and language learners. In Language and Space, ed. P. Bloom, M. A. Peterson, L. Nadel, and M. Garrett. Cambridge, MA: MIT Press. Landau, B. and Gleitman, L. R. (1985) Language and Experience. Cambridge, MA: Harvard University Press. Landau, B. and Jackendoff, R. (1993) “What” and “where” in spatial language and spatial cognition. Behavioral and Brain Sciences 16:217–265. Landau, B. and Shipley, E. (1996) Object naming and category boundaries. In Proceedings of the Boston University Conference on Language Development, ed. A. Stringfellow. Brookline, MA: Cascadilla Press. Landau, B., Smith, L., and Jones, S. (1988) The importance of shape in early lexical learning. Cognitive Development 3:299–321. Landau, B., Smith, L., and Jones, S. (1992) Syntactic context and the shape bias in children’s and adults’ lexical learning. Journal of Memory and Language 31:807–825. Landau, Smith, L., and Jones, S. (1997) Object shape, object function, and object name. Journal of Memory and Language 36(1):1–27. Landau, B., Spelke, E., and Gleitman, H. (1984) Spatial knowledge in a young blind child. Cognition 16:225–260. Landau, B. and Stecker, D. (1990) Objects and places: Syntactic and geometric representations in early lexical learning. Cognitive Development 5:287–312. Levinson, S. (1992) Vision, shape, and linguistic description: Tzeltal body-part terminology and object description. Working paper no. 12, Cognitive Anthropology Research Group, Max Planck Institute for Psycholinguistics. Locke, J. (1964) An Essay Concerning Human Understanding. Ed. A. D. Woozley. Cleveland: Meridian Books. Malt, B. and Johnson, E. C. (1992) Do artifact concepts have cores? Journal of Memory and Language 31:195–217. Markman, E. (1989) Categorization in Children: Problems of Induction. Cambridge, MA: Bradford Books/MIT Press. Markman, E., and Hutchinson, J. (1984) Children’s sensitivity to constraints on word meaning: Taxonomic versus thematic relations. Cognitive Psychology 20:121–157. Marr, D. (1982) Vision. New York: Freeman. Matan, A. (1996) Knowledge of function in young children. Ph.D. thesis, MIT. Medin, D. and Coley, J. (1998) Concepts and categorization. In Handbook of Perception and Cognition. Perception and Cognition at Century’s End: History, Philosophy, Theory, ed. J. Hochberg and J. E. Cutting. San Diego: Academic Press.
230
Barbara Landau
Medin, D., Lynch, E., and Coley, J. (1997) Categorization and reasoning among tree experts: Do all roads lead to Rome? Cognitive Psychology 32:49–96. Medin, D., Goldstone, R., and Gentner, D. (1993) Respects for similarity. Psychological Review 100(2): 254–278. Merriman, W., Scott, P., and Marazita, J. (1993) An appearance-function shift in children’s object naming. Journal of Child Language 20:101–118. Naigles, L. (2000) Manipulating the input: Studies in mental verb acquisition. (Chapter 15 in this volume.) Newport, E. (2000) Biological bases of language learning. (Chapter 8 in this volume.) Putnam, H. (1977) Is semantics possible? In Naming, Necessity, and Natural Kinds, ed. S. Schwartz. Ithaca: Cornell University Press. Quine, W. V. (1960) Word and Object. Cambridge, MA: MIT Press. Quine, W. V. (1969) Natural kinds. Reprinted in S. Schwartz (ed.), Naming, Necessity, and Natural Kinds, ed. S. Schwartz. Ithaca: Cornell University Press, 1977. Rosch, E., Mervis, C., Gray, W., Johnson, D., and Boyes-Braem, P. (1976) Basic objects in natural categories. Cognitive Psychology 8:382–439. Shipley, E. (2000) Children’s categorization of objects: The relevance of behavior, surface appearance, and insides. (Chapter 6 in this volume.) Smith, L.B., Jones, S., and Landau, B. (1992) Count nouns, adjectives, and perceptual properties in children’s novel word interpretations. Developmental Psychology 28:273–286. Smith, L.B., Jones, S., and Landau, B. (1996) Naming in young children: A dumb attentional mechanism? Cognition 60(2): 143–171. Smith, E. and Medin, D. (1981) Categories and Concepts. Cambridge, MA: Harvard University Press. Soja, N., Carey, S., and Spelke, E. (1992) Perception, ontology, and word meaning. Cognition 45:101–107. Subrahmanyam, K., Landau, B., and Gelman, R. (1999) Shape, material, and syntax: Interacting forces in children’s learning of novel words for objects and substances. Language and Cognitive Processes 14(3): 249–281. Waxman, S. R., and Markov, D. (1995) Words as invitations to form categories: Evidence from 12- to 13-month old infants. Cognitive Psychology 29(3):257–302.
Chapter 14 The Psychologist of Avon: Emotion in Elizabethan Psychology and the Plays of Shakespeare W. Gerrod Parrott I Introduction When considering what I should write for this collection of essays, I knew right away that I wanted somehow to pay tribute to Henry’s interest in drama. For, as everyone knows, part of what makes Henry Henry is his interest and skill in acting and directing. And part of what guides Henry’s approach to psychology are intuitions springing from his appreciation of the themes of great drama, from his appreciation of the psychological complexity inherent in an actor’s ability to convey character and emotion, and in an audience’s ability to comprehend and vicariously experience a character’s situation all within the framework of “as-if.” Part of what makes Henry’s textbook special is its use of drama to illustrate psychological principles. And, most important of all for me, Henry’s interest in drama led to my becoming his student. After spending my first year at Penn researching a purely cognitive topic, I found myself being much more interested in human emotion, and I found Henry interested in talking about it. It turned out that Henry’s interest in drama and mine in emotion overlapped nicely in an area we called “the quiet emotions,” which included aesthetic emotions, humor, and play. After some preliminary experiments on humor, we began the research that formed my dissertation, investigating the infant’s game of “peek-a-boo.” In its simple structure of “appearance disappearance reappearance,” Henry and I saw a prototype of the sort of structure typical of adult’s dramatic narratives: a suspenseful conflict that is then resolved. Perhaps, we thought, we might understand the developmental roots of drama in this simple game. Henry and I published our peek-a-boo findings in the journal Cognition and Emotion, and even that developed into something nice for me: I published other research there, became one of the associate editors, and three years ago I took over as the editor. So, clearly, I thought I should try to pay tribute to Henry’s interest in drama. But, how to do so? My interests have continued to be focused on human emotion, but not on peek-a-boo or drama per se, and, try as I
232
W. Gerrod Parrott
might, I really cannot relate any of my empirical research to drama. For example, I am interested in the emotion of embarrassment, and my approach is rooted in the theory of Erving Goffman, and that emphasizes dramaturgy, but that is too much of a stretch. I also have some research on mood and memory, showing that people sometimes recall sad memories when happy and happy memories when sad; that is the reverse of the usual finding and hence is possibly dramatic, but, it is not about drama. And I have lately been content-analyzing people’s reports of intense jealousy, and these accounts are often quite melodramatic, but that is not right either. So I cannot pay tribute to Henry by describing research about drama per se, but I can bring drama into this essay another way. Lately I have begun to study folk psychologies of emotion, examining the historical development of ideas about emotion in Western cultures, and trying to see how contemporary American conceptions of emotion evolved from them. One can track the development of Western conceptions of emotion through a multitude of sources, from legal traditions to works of fiction, from medical beliefs to academic philosophy, and I have been looking to some of these to learn the history of everyday ideas about emotion. One of the periods I have found particularly interesting is Elizabethan England, and one of the best sources of examples of Elizabethan ideas about emotion are the dramatic works of William Shakespeare. It is by describing this aspect of my research that I would like to pay tribute to Henry’s interest in drama. My topic for this essay will be Elizabethan ideas about psychology, particularly about emotion, particularly as they are evidenced in the plays of Shakespeare. In honor of Henry, I shall focus on the Shakespeare part, but I shall also indicate some of the ways in which the Shakespeare is relevant to the contemporary psychology of emotion. II Psychology in the English Renaissance In the plays of Shakespeare one finds many expressions of ideas about psychology that were current during the English Renaissance. These expressions reflected shared, everyday conceptions about people’s behavior and mental activities, and could be called the Elizabethan folk psychology. The best documentation of these conceptions is found in the moral and scholarly writings of the time, which were widely read by Shakespeare’s patrons and audiences. A number of Spanish and French works from the sixteenth century had become available in English translation by Shakespeare’s time. These books were the Renaissance equivalent of Henry’s introductory psychology text: influential, profound, and selling large numbers for their day.
The Psychologist of Avon
233
One was Juan Luis Vives’s 1520 Introduction to Wisdom, a spiritual and educational treatise that was translated into English in 1540 (Fantazzi 1979). A more physiological approach could be found in a book by the Spanish physician Juan de Huarte Navarro, whose Examen de Ingenios, written in 1578, was translated into English by Richard Carew in 1594, going into its fourth edition by 1616 (Newbold 1986). Huarte’s popular book proposed an innate basis for humors and temperament that made certain passions and careers more suitable for some individuals than for others. Pierre de la Primaudaye’s The French Academie, written in 1586, was first translated into English by Thomas Bowes in 1594. A moral work, it discussed the psychology and self-control of emotions in great detail and with insight. Just to give the flavor of the book, I shall share one quotation, retaining original spelling: Now against the passion of euill Hatred, amongst a great number of remedies which may very well be applied thereunto, we haue two principall ones that are very good and profitable. The first remedy is, the example of the loue of God. . . . The second remedy is, the contempt of all earthly things. . . . For if we shall set light by all mortall and corruptible things, and lift vp our hearts to higher things, we shall very easily breake off all hatred and enmity, neither will wee take any thing greatly to heart, but when we see God offended. (La Primaudaye 1618, p. 500) Another moral work from France was Pierre Charron’s Of Wisdome, written in 1608 and first translated into English by Samson Lennard about five years later. This work discusses the causes and effects of a wide range of passions, including envy, jealousy, revenge, fear, sorrow, compassion, choler, and hatred, and the work’s final part considers the virtue of temperance and methods of controlling the passions (Charron 1608). About this same time, English authors were writing books catering to the Renaissance interest in psychology and ethics. Through these books, ideas about psychology and emotion from the works of Aristotle and Plato, Hippocrates and Galen, Cicero, Augustine, and Aquinas were distilled and mixed with English folk notions. Sir Thomas Elyot’s The Castel of Helthe (1541/1937) was a popular medical book for laymen that appeared in nearly twenty editions between 1539 and 1610, and it contained sections on “affectes and passions of the mynde” (Tannenbaum 1937). In 1586 Timothy Bright’s A Treatise of Melancholie announced a deliberate choice to publish in English rather than in Latin so that “the benefit might be more common”; the success of this rambling, medically oriented book attests to the popular interest in psychology in general and melancholy in particular (Newbold 1986). Thomas
234
W. Gerrod Parrott
Wright’s book, The Passions of the Minde in Generall, was first published in 1601 and is a much more impressive work. A former Jesuit, Wright wrote in English so that the wisdom he gleaned from classical sources, both psychological and moral, might help the English people to practice virtue, achieve self-knowledge and self-control, and use their passions for good purposes. The greatest psychological book of the English Renaissance was surely Robert Burton’s Anatomy of Melancholy, but its publication in 1621 makes it a bit late to have influenced Shakespeare directly. But, masterful though it was, it was based in large part on the works already mentioned, and thus illustrates the sorts of ideas that were in the air during the time that Shakespeare wrote his plays. These ideas are certainly what we would call psychology, but they are not only psychology: they are intertwined with what we would now call ethics, religion, medicine, philosophy and even astrology. These ideas have partly shaped our present culture, and one way they did so was by infiltrating the plays of the Bard of Avon.1 III Emotion in Elizabethan Psychology and in Shakespeare’s Plays The psychological writings of the Elizabethan period addressed many aspects of human emotion, from physiology to mental bias and selfregulation. The texts of Shakespeare’s plays suggest that Shakespeare and his audiences knew this psychology well and took it for granted, for the plays contain a wealth of allusions to it. In this section I shall present a sampling of the Elizabethan psychology of emotion and illustrate its presence in Shakespeare’s plays. In Elizabethan psychology there still persisted the Aristotelian idea that there were three types of soul, hierarchically nested: the vegetable (life), the sensible (consisting of life plus feeling, which includes perception, common sense, imagination, instinct, and memory), and the rational (consisting of life and feeling, plus reason). The treatises circulating in Elizabethan England maintained that the rational soul of humans operated primarily via three organs of the body, each of which was specialized for activities corresponding to the three types of soul: the liver for the vegetal, the heart for the sensible, and the brain for the rational. The liver and heart were therefore associated with basic biology and the emotions, whereas the brain served rational thought and the will. Thus, in the opening scene of Twelfth Night, we see the Duke of Illyria ask: [Orsino]: How will she love when the rich golden shaft Hath killed the flock of all affections else That live in her when liver, brain, and heart,
The Psychologist of Avon
235
These sovereign thrones, are all supplied, and filled Her sweet perfections with one self king! (Twelfth Night, I, i, 34–38)2 In Renaissance psychology, the liver, spleen, and gall were all thought to be related to the emotions. The liver, when supplied with blood, produced courage and love; the gall produced wrath and bitterness; the spleen purged melancholy and thus was linked to mirth. Knowing this physiology is a great help in understanding otherwise cryptic passages in Shakespeare. For example, to cite some passages culled from Anderson (1927/1966), in Macbeth, the title character speaks to a fearful servant (earlier described as a “cream-faced loon”) to announce the approach of ten thousand soldiers: Go prick thy face and over-red thy fear, Thou lily-livered boy. What soldiers, patch? Death of thy soul, those linen cheeks of thine Are counsellors to fear. What soldiers, whey-face? (Macbeth, V, iii, 16–19) Or in Henry V: Grey: Those that were your father’s enemies Have steeped their galls in honey, and do serve you With hearts created of duty and of zeal. (Henry V, II, ii, 29–31) Or in Measure for Measure: Isabella: His glassy essence, like an angry ape Plays such fantastic tricks before high heaven As makes the angels weep, who, with our spleens, Would all themselves laugh mortal. (Measure for Measure, II, ii, 123–126) The Elizabethans’ psychophysiology has not fared particularly well in light of modern biology, but their insights about the more mental aspects of emotion have fared considerably better. Regarding the expression of emotion and the possibility of deception about one’s emotions, Elizabethan psychology asserted that there should be a correspondence between the appearance of the body and the state of the soul, an idea that had its origins in Plato. The ability to conceal emotions was believed to be quite limited, so when a person did not seem to be moved by matters that normally cause shame or guilt or regret, it was assumed
236
W. Gerrod Parrott
that the person had learned not to have the emotion, not that the emotion was present but not expressed (Anderson 1927/1966). Yet not to have one of these moral emotions is to become an immoral person, and there are wonderful passages in Shakespeare expressing this idea. In 3 Henry VI, York says to Queen Margaret: But that thy face is visor-like, unchanging, Made impudent with use of evil deeds, I would essay, proud Queen, to make thee blush. To tell thee whence thou cam’st, of whom derived, Were shame enough to shame thee wert thou not shameless. (3 Henry VI, I, iv, 117–121) Even better for demonstrating the process of character alteration in perfecting deception is Macbeth. Early in the play Lady Macbeth begins coaching her husband: Your face, my thane, is as a book where men May read strange matters. To beguile the time, Look like the time; bear welcome in your eye, Your hand, your tongue; look like the innocent flower, But be the serpent under’t.” (Macbeth, I, v, 61–65) And Macbeth resolves to do it: False face must hide what the false heart doth know. (Macbeth, I, vii, 82) By act 5, Macbeth no longer betrays his purposes with his emotions, yet it is not by hiding the emotions that he succeeds, but by no longer having them: I have almost forgot the taste of fears. The time has been my senses would have cooled To hear a night-shriek, and my fell of hair Would at a dismal treatise rouse and stir As life were in’t. I have supped full with horrors. Direness, familiar to my slaughterous thoughts, Cannot once start me. (Macbeth, V, v, 9–15) In some respects the Elizabethan view of deception is similar to prevalent contemporary views, maintaining that emotional deception is possible but imperfect (e.g., Ekman 1985). In one respect it is notably different, however, because the education of the emotions and its moral implications are not emphasized in contemporary psychology.
The Psychologist of Avon
237
Another tenet of the Elizabethan psychology of emotion was that concrete objects and events can stir passion and action more readily than can less vivid stimuli. This tenet was a special case of a more general belief in the dependence of reason and the imagination on information supplied by the senses (Anderson 1927/1966). Disrupt the input, and the whole system veers off course. Thus we have Oberon streaking Titania’s eyes with juice to alter her perceptions, and in numerous plays characters are bound and placed in darkness to aid recovery of their wits. The role of vivid stimuli in producing angry aggression is illustrated in King John: King John: Witness against us to damnation! How oft the sight of means to do ill deeds Make deeds ill done! Hadst not thou been by, A fellow by the hand of nature marked, Quoted, and signed to do a deed of shame, This murder had not come into my mind. (King John, IV, iii, 220–224) And, for the emotion of fear, there is the reaction of Macbeth, who was terrified by Banquo’s ghost but calmed immediately after his disappearance: Macbeth: Take any shape but that, and my firm nerves Shall never tremble. . . . Unreal mock’ry, hence! Exit Ghost Why so, being gone, I am a man again. (Macbeth, III, iv, 101–102, 106–107) The Elizabethan’s point about the effectiveness of concrete perceptions in arousing emotions seems quite consistent with some modern ideas about cognition and action (e.g., production systems), but about emotion in particular modern academic psychology is oddly quiet. Perhaps this is a case where the Elizabethan writings identify a phenomenon underappreciated in our time. The Elizabethan psychologists drew on Aquinas for a sense that emotions can perform useful functions by guiding people toward their goals and motivating them to overcome frustration or to resign themselves to irrevocable loss. An emotion such as grief was believed to be expressible either as angry frustration or sad resignation, and thus there was an element of choice concerning the direction of emotional impulses. Shakespeare, in 3 Henry VI, has Richard resolve not to weep away his grief but to vent it in revenge:
238
W. Gerrod Parrott
I cannot weep, for all my body’s moisture Scarce serves to quench my furnace-burning heart; ... To weep is to make less the depth of grief; Tears, then, for babes blows and revenge for me! (3 Henry VI, II, i, 79–80, 85–86) King Lear likewise vows not to weep, but he is unable to obtain revenge. And, on some interpretations at least, Hamlet’s failure to redirect his grief may be understood as contributing to his inability to seek revenge (Anderson 1927/1966). This aspect of the Elizabethan psychology of emotion is particularly consonant with modern psychology, which ever since Darwin has emphasized the adaptive function of emotions. The psychology textbooks of the Renaissance observed that when passions become too intense they can bias thinking. In Shakespeare we see this phenomenon in an exchange between Bushy and the Queen in Richard II. Bushy: Madam, your majesty is too much sad. ... Queen: . . . Yet I know no cause Why I should welcome such a guest as grief. ... Bushy: Each substance of a grief hath twenty shadows Which shows like grief itself but is not so. For sorrow’s eye, glazèd with blinding tears, Divides one thing entire to many objects Like perspectives, which, rightly gazed upon, Show nothing but confusion; eyed awry, Distinguish form. So your sweet majesty, Looking awry upon your lord’s departure, Find shapes of grief more than himself to wail, Which, looked on as it is, is naught but shadows Of what it is not. Then, thrice-gracious Queen, More than your lord’s departure weep not: more is not seen, Or if it be, ‘tis with false sorrow’s eye, Which for things true weeps things imaginary. (Richard II, II, ii, 1, 6–8, 14–27) That passage nicely depicts the bias of sadness; for that of jealousy, there are Iago’s descriptions of the force of inflamed suspicion:
The Psychologist of Avon
239
I will in Cassio’s lodging lose this napkin, And let him find it. Trifles light as air Are to the jealous confirmations strong As proofs of holy Writ. (Othello, III, iii, 325–328) And later: As he [Cassio] shall smile, Othello shall go mad; And his unbookish jealousy must conster Poor Cassio’s smiles, gestures, and light behaviours Quite in the wrong. (Othello, IV, i, 99–102) Emotion’s ability to bias thought has been rediscovered recently. Influential researchers such as Gordon Bower (1981) have reintroduced the phenomenon after long neglect, and it is now an important part of theories of affective disorders, decision making, and memory (Teasdale and Barnard 1993). To have the passions control reason, to have the body directing the mind, is to upset one of the most important Renaissance ideas about the proper order of nature: Reason should govern the body as the king governs the kingdom and God’s laws govern the universe. The need to prevent such disorder leads to the final Renaissance topic I would like to consider: self-control. Self-control appeared as a virtue in Greek writings as early as the sixth century B.C., and by the time of Aeschylus was well established among the cardinal virtues; it was considered in depth by Plato and continued to evolve with the Stoics of later Greek and Roman culture, when it was incorporated into early Christian doctrine (North 1966). Called sophrosyne by the Greeks and temperantia by the Romans and early Christians, this virtue can be translated variously as self-control, moderation, temperance, or self-knowledge. Shakespeare’s Hamlet praised his friend Horatio for possessing just this Stoic virtue: Since my dear soul was mistress of her choice And could of men distinguish her election, S’hath sealed thee for herself, for thou hast been As one, in suff’ring all, that suffers nothing, A man that Fortune’s buffets and rewards Hast ta’en with equal thanks; and blest are those Whose blood and judgment are so well commeddled That they are not a pipe for Fortune’s finger To sound what stop she please. Give me that man That is not passions’ slave, and I will wear him In my heart’s core, ay, in my heart of heart,
240
W. Gerrod Parrott
As I do thee. (Hamlet, III, ii, 61–82) Later in the same scene Shakespeare has the Player King describe the character of one who has not developed this virtue: Purpose is but the slave to memory, Of violent birth, but poor validity, Which now like fruit unripe sticks on the tree, But fall unshaken when they mellow be. Most necessary ‘tis that we forget To pay ourselves what to ourselves is debt. What to ourselves in passion we propose, The passion ending, doth the purpose lose. The violence of either grief or joy Their own enactures with themselves destroy: Where joy most revels, grief doth most lament; Grief joys, joy grieves, on slender accident. (Hamlet, III, ii, 179–190) The modern psychology of emotion strays from this Stoic conception of self-regulation as virtue. More typical of modern psychology is a less moralistic, more hedonistic approach that focuses on the “maintenance” of positive emotions and the “repair” of negative emotions (Parrott 1993a). IV The Relevance of Folk Psychologies for the Psychology of Emotion I think there are valid reasons for a twentieth-century psychologist to consider Renaissance folk psychology and literature. Folk psychologies can play a valuable role in guiding and evaluating academic psychologies, although probably more with respect to general concepts than to detailed explanations (Fletcher 1995; Greenwood 1991). Regardless of the accuracy of the explanations, folk psychologies and literature help establish the meaning of everyday concepts. The basic concepts of psychology are all folk concepts: memory, attention, perception, emotion, and so on. Academic psychologists may establish new concepts, distinguished from the everyday concepts named by the same word, but they must make clear that their findings are not intended to address the everyday concept. An example might be autonomic arousal, which does not quite correspond to any of the everyday meanings of “arousal.” I would propose that establishing the everyday meaning of “emotion,” “emotional,” and the like is an important thing to do. Academic psychologists have, I believe, developed their own conceptions of emotion, conceptions that have begun to stray from everyday conceptions
The Psychologist of Avon
241
in important ways. As yet, however, the differences have not been made explicit or precise. At this point in the development of the psychology of emotion it would be good to note which aspects of the everyday conception are being retained and which abandoned, to clarify the benefits and costs of the new conceptual framework, and to remind ourselves that some aspects of the everyday conception are not addressed by contemporary research. Shakespeare’s plays and Renaissance folk psychologies can be used to evaluate the scope of modern academic theories. By this measure, the success of contemporary emotion research is rather mixed. For certain aspects of Elizabethan psychology contemporary research does an excellent job. Renaissance psychologies clearly included physiological reactions as part of emotion, and modern neuroscience is a distinct improvement on Renaissance efforts here (see Gray 1995). Similarly, the Renaissance psychologies delved into the ways in which emotion alters thought, and contemporary research on memory, perception, and judgment, and contemporary journals such as Cognition and Emotion, show a corresponding modern interest in just these phenomena (see Fox 1996; Nasby 1996). The Renaissance interest in the purpose of emotions shows an awareness that emotions can function in adaptive ways, and this interest is nicely reflected in modern treatments of functionalism (see Oatley 1992; Parrott and Schulkin 1993). For other aspects of Renaissance psychology, however, contemporary research does not fare so well. It was common in Renaissance psychology to distinguish between passion and reason, yet there is no corresponding distinction in academic approaches to emotion, although there is in contemporary folk theory. When I had my students ask acquaintances to recall an example of “being emotional” and to explain what made it so, we found that the most common qualities cited were “being irrational” and “being out of control.” Virtually all respondents conveyed a sense that “being emotional” carried a negative connotation (Parrott 1995). In contrast, modern academic theories of emotion, although distinguishing between emotional appraisal and unemotional deliberation, tend not take into account the rationality or social appropriateness of emotional thinking in making these classifications (Parrott 1993b). The Renaissance psychologies clearly linked emotion to ethics and virtue in ways not considered appropriate for a modern science, yet it is these connections that made emotion so important to the understanding of human nature, and so central to Shakespeare’s plays. The role of emotion in the development of the character of Macbeth consists not so much in his concealing or extinguishing emotion as it does in his becoming evil. The point of emotion for Shakespeare’s King John is not so much an isolated psychological event as it is as a part of the moral event
242
W. Gerrod Parrott
of temptation. Shakespeare was concerned with emotional dysfunction as well as emotional function, as when Othello’s jealousy overwhelms his reason. All in all, one finds the emphasis not so much on the nature of emotion per se as on what emotion reveals about character, including its role in how people come to do wrong: Consider how Iago’s reasoning is warped by his resentment and jealousy, or how Macbeth begins to do wrong only opportunistically but, through repetition, makes evil part of his character. In sum, the Renaissance psychology books and the plays of Shakespeare contrast with modern academic psychology in precisely that quality said to be most characteristic of the Renaissance, its interest in the entire person. By contrast, modern academic psychology appears excessively modular and mechanical, paying insufficient attention to the social and moral aspects of emotion. Thus my motive in investigating Shakespeare’s plays and Renaissance folk psychologies is to remind myself and others of issues that help make important a topic such as emotion. Now, where did I learn to do that? I have come to think that some of the most important lessons were learned at the research seminar Lila and Henry generously conducted in their home, which over the years benefited so many of us contributing to this volume. One point always emphasized to students by both Henry and Lila was that good research always maintains its connection to the issues that initially established its importance. In the research seminar Henry and Lila reminded their students to keep in mind the larger framework of their research, and discouraged them from pursuing laboratory phenomena for their own sake or because they were in vogue. In this brief tour of emotion in Shakespeare I hope to have both demonstrated this lesson and expressed my gratitude for it. Acknowledgment I am grateful to John Sabini for his helpful comments on a previous draft of this essay. Notes 1. I ought to make clear that there are two ways of researching a topic such as this. One can become an authority on Elizabethan culture and on the works of Shakespeare, or one can avail oneself of the many wonderful expositions of these topics that are available, and my method, in case it is not obvious, is necessarily the latter! So, before going on, I would like to acknowledge my indebtedness to the scholars whose writings have made this material accessible to the likes of me. In particular, I am indebted to books on Elizabethan culture and Shakespeare’s plays by such scholars as E. M. W. Tillyard (1944), Theodore Spencer (1949), William Webster Newbold (1986), and especially Ruth Leila Anderson (1927/1966). 2. All Shakespearian quotations and line references are drawn from the edition of Shakespeare’s complete works by Wells and Taylor (1986).
The Psychologist of Avon
243
References Anderson, R. L. (1966) Elizabethan Psychology and Shakespeare’s Plays. New York: Russell and Russell. (First published in 1927). Bower, G. H. (1981) Mood and memory. American Psychologist 36:129–148. Charron, P. (1608) Of Wisdome (Samson Lennard, trans.). London: E. Blount and W. Aspley. Ekman, P. (1985) Telling Lies: Clues to Deceit in the Marketplace, Politics, and Marriage. New York: Norton. Elyot, T. (1937) The Castel of Helthe. New York: Scholars’ Facsimiles and Reprints. (Original work published 1541.) Fantazzi, C. (1979) In Pseudodialecticos: A Critical Edition. Leiden: Brill. Fletcher, G. (1995) The Scientific Credibility of Folk Psychology. Mahwah, NJ: Erlbaum. Fox, E. (1996) Selective processing of threatening words in anxiety: The role of awareness. Cognition and Emotion 10:449–480. Gray, J. A. (1995) A model of the limbic system and basal ganglia: Applications to anxiety and schizophrenia. In M. S. Gazzaniga (ed.), The Cognitive Neurosciences (pp. 1165–1176). Cambridge, MA: MIT Press. Greenwood, J. D. (1991) The Future of Folk Psychology: Intentionality and Cognitive Science. Cambridge: Cambridge University Press. La Primaudaye, P. de (1618) The French Academie (T. Bowes, trans.). London: Thomas Adams. Nasby, W. (1996) Moderators of mood-congruent encoding and judgement: Evidence that elated and depressed moods implicate distinct processes. Cognition and Emotion 10:361–377. Newbold, W. W. (1986) General introduction to W. W. Newbold (ed.), The Passions of the Mind in General (pp. 1–50). New York: Garland Publishing. North, H. (1966) Sophrosyne: Self-Knowledge and Self-Restraint in Greek Literature. Ithaca, NY: Cornell University Press. Oatley, K. (1992) Best-Laid Schemes: The Psychology of Emotions. New York: Cambridge University Press. Parrott, W. G. (1993a) Beyond hedonism: Motives for inhibiting good moods and for maintaining bad moods. In Handbook of Mental Control, ed. D. M. Wegner and J. W. Pennebaker, pp. 278–305. Englewood Cliffs, NJ: Prentice-Hall. Parrott, W. G. (1993b) On the scientific study of angry organisms. In R. S. Wyer, Jr. and T. K. Srull (eds.), Perspectives on Anger and Emotion: Advances in Social Cognition (vol. 6, pp. 167–177). Hillsdale, NJ: Erlbaum. Parrott, W. G. (1995) The heart and the head: Everyday conceptions of being emotional. In J. A. Russell, J.-M. Fernández-Dols, A. S. R. Manstead, and J. C. Wellenkamp (eds.), Everyday Conceptions of Emotions: An Introduction to the Psychology, Anthropology, and Linguistics of Emotion (pp. 73–84). Dordrecht: Kluwer. Parrott, W. G. and Schulkin, J. (1993) Psychophysiology and the cognitive nature of the emotions. Cognition and Emotion 7:43–59. Spencer, T. (1949) Shakespeare and the Nature of Man. New York: Macmillan. Tannenbaum, S. A. (1937) Introduction. In T. Elyot, The Castel of Helthe (pp. iii–xi). New York: Scholars’ Facsimiles and Reprints. Teasdale, J. D. and Barnard, P. J. (1993) Affect, Cognition, and Change: Re-Modelling Depressive Thought. Hove: Lawrence Erlbaum Associates. Tillyard, E. M. W. (1944) The Elizabethan World Picture. New York: Macmillan. Wells, S. and Taylor, G. (eds.). (1986) William Shakespeare: The Complete Works. Oxford: Clarendon.
Chapter 15 Manipulating the Input: Studies in Mental Verb Acquisition Letitia R. Naigles “It is a truism, or ought to be, that language acquisition depends crucially on species-specific endowments . . . and at the same time is the strict outcome of specific learning opportunities” (L. Gleitman and H. Gleitman, 1997:29). It was in the service of figuring out just what role the endowments played and what role the learning opportunities played that Lila and Henry Gleitman developed the deprivation paradigm for the study of language acquisition. That is, if you think some aspect of input (linguistic, perceptual) or some aspect of human physiology (hearing, chromosomes) is important for children’s development of some linguistic structure or class or knowledge, then take that aspect away and see how the relevant language has been affected. As this is certainly not a paradigm that can ethically be imposed on any human population, the Gleitmans and their collaborators have relied on cases where the relevant deprivations occurred naturally, and have generated tremendous findings showing, for example, the resiliency of the sociable human to create his or her own language even in the absence of linguistic input, and the resiliency of the language to emerge even in humans missing some critical biological components (e.g., Feldman, Goldin-Meadow, and Gleitman 1975; Landau, Gleitman, and Spelke 1981; Landau and Gleitman 1985; Fowler, Gelman, and Gleitman 1994; see also Newport, Goldin-Meadow, Landau, this volume). These studies, performed by the Gleitmans with my predecessors at Penn, were the “milk” of my graduate school days. Among many other things, they showed me how much could be learned about language and language acquisition from such innovative manipulations of input or endowment. Recently, I have embarked upon a new line of research, concerning children’s acquisition of mental state verbs, which has seemed tailor-made for an application of the deprivation paradigm (albeit in a less dramatic fashion than these earlier studies), and I have thus begun to do some input manipulation of my own. It seems only
246
Letitia R. Naigles
fitting, then, that the first report of this research should be the topic of my chapter honoring the enormous contribution of Henry and Lila Gleitman to my scholarly life. In some ways, early research on mental verb acquisition was the antithesis of the early research on syntax acquisition that the Gleitmans and their colleagues were departing from. That is, one school held that early syntax acquisition was directed via the input of “Motherese” (e.g., Snow and Ferguson 1977); the Gleitmans’ work showed how much of this was actually directed by the children themselves (e.g., Newport, Gleitman, and Gleitman 1977). In contrast, input has played an astonishingly small role in theorizing about children’s developing mental states and mental verb understanding (but see Moore et al. 1994). Most theories have targeted aspects of children’s cognitive or emotional maturation as the prime instigating factors (e.g., Leslie 1991; Olson and Astington 1988; Wellman 1990). But too much emphasis on maturational change in children’s development of mental language and states may ultimately be just as obscuring as too much emphasis on Motherese input had been in the acquisition of syntax. My argument in this chapter will be that input has indeed a critical role to play at a critical transition point in children’s mental verb acquisition. But first, some background. A. Why Care about Mental State Verbs? Let Me Count the Ways Research on mental state verbs (MSVs) and their acquisition has grown exponentially over the past thirty years or so. Rationales for this research vary widely, depending at least in part on whether researchers come from a linguistic tradition or a cognitive psychological one, and whether the focus is on adults’ knowledge of mental terms or on children’s acquisition of them. With my bent toward language and cognitive development, I have found the following four rationales most compelling. (a) Mental verbs epitomize Quine’s (1960) problem of radical translation for the child learner, as thinking and knowing, for example, are never ostensively available. (b) Mental verbs are notoriously polysemous, in that each is associated with multiple senses. Consider, for example, that different senses of know are conveyed by the following sentences: (1) I know that song. (recognize) (2) I don’t know if that’s gonna come out too well. (conjecture) (3) I don’t know what you are saying. (understand)
Manipulating the Input
247
(4) I know you like that book. (believe) (5) You let me know if you want Mom to help you. (tell) (6) You know, the keys are over there. (shared information) (c) Mental verbs experience a long and gradual period of development during child language acquisition, apparently unlike that which occurs with concrete nouns and verbs. (d) Mental verbs provide insight into human cognition, as they can reveal our access to our own internal states and our notions about the internal states of others (i.e., a folk theory of mind). In what follows, I discuss these four rationales in terms of my ultimate goal, which is determining how mental verbs in general, and think, know, and guess in particular, are acquired. 1. A theoretical issue in the acquisition of mental verbs Even though all lexical words are subject to the Quinean problem of radical translation in child language acquisition, mental verbs such as think and know must be especially challenging in this respect. Whereas the meanings of verbs such as jump and cry must be at least sometimes manifested in the ostensive context (i.e., jumping and crying are sometimes going on when “jump” and “cry” are uttered), it is hard to imagine how the meanings of think and know could ever be ostensively available. These verbs refer to mental states and processes, which are by definition abstract and removed from purely sensory experience (see Scholnick 1987; Gleitman 1990 for more discussion). Even in the most explicit cases, thinking is just not perceivable. Imagine a child observing her mother standing in the middle of the living room, eyes darting back and forth, head wagging to and fro. Such a child might be moved to ask, “watcha doin’ mom?” only to be told “I’m thinking about where to put the new couch when it arrives.” Does this child conjecture that think means rapid eye and head movements? What if the mother was also pointing to various places around the room—would these points also be incorporated into the child’s meaning of think? No, somehow she figures out that think refers to the internal process that prompted the points and the rapid eye and head movements. Gleitman (1990; Landau and Gleitman 1985) has suggested that the presence of sentence complements with MSVs (in the above sentence, where to put the new couch when it arrives) provides children with an important clue that the verb refers to mental states or processes. In essence, the sentence complement instantiates the proposition to which the MSV pertains (see also Fisher, Gleitman, and Gleitman 1991). Moreover, studies of maternal speech to young children have shown
248
Letitia R. Naigles
that mental verbs appear with sentence complements more often than motion verbs do, and that sentence complements are more likely to follow mental verbs than motion verbs (Naigles and Hoff-Ginsberg 1995). At this point, there is little experimental evidence that children actually use this structural clue in determining that a verb refers to a mental state rather than a physically available one. However, learning that think (and know, guess, wonder, believe, etc.) refers to an mental state or process is actually only half the battle. As discussed in more detail below, MSVs also have to be distinguished from each other (what if the mother above had said “I now know where to put the couch when it arrives”?) and sometimes even from themselves, as none of the most common MSVs has a single, unitary meaning (e.g., know, (1) to (6) above). Clearly, ostention is of even less use for this part of the process. One of my goals in this chapter is to suggest another source of information for children’s acquisition of distinctions between MSVs. 2. The polysemy of mental verbs Mental verbs are acknowledged polysemists. Moreover, the senses associated with the verbs think, know, and guess are particularly numerous and varied. For example, most uses of these three verbs involve mental states or processes, but some appear to serve primarily conversational purposes (e.g., Y’know what? I think it’s time for your nap and the rhetorical Guess who I saw today!). Within the mental domain, the three verbs differ on a variety of dimensions, both continuous (e.g., certainty) and discrete (e.g., factivity, process/product). In what follows, I discuss first the linguistic-theoretic and then cognitive-psychological experimental approaches to understanding the complexity of MSV meaning and representation. These approaches have tended to operate in ignorance of each other’s work; my hope in presenting both of them is that, following Lila and Henry’s example, it can be seen that combining the linguistic and psychological traditions is an enormously fruitful endeavor. In the linguistics tradition, the polysemy of mental verbs has been partially captured by the participation (or lack thereof) of each verb in differing grammatical or discourse structures. That is, the structural differences are often used as diagnostics for semantic or pragmatic differences. Here, we briefly discuss three cases in which distinct semantic or pragmatic aspects of individual verbs are illuminated by consideration of structural differences between verbs. For example, take the factivity dimension. Know is considered to be a factive because it presupposes the truth of its complement, whereas think is nonfactive and allows no such presupposition. The syntactic phenomenon of “neg-raising”1 has been proposed as one diagnostic of the absence of factivity, and indeed,
Manipulating the Input
249
think allows neg-raising much more freely than know. That is, the sentence B doesn’t think it is raining outside is equivalent in many ways to B thinks it isn’t raining outside: It’s not B’s thinking that is being negated, but the conditions outside. In contrast, G doesn’t know it is raining outside is not equivalent to G knows it isn’t raining outside: In fact, given know’s factivity it must be raining outside (for more discussion, see Kiparsky and Kiparsky 1970; Kempson 1975; Horn 1978; Hooper 1975). Similarly, a process/product-type dimension of MSVs has been associated with distinct structures, both morphological and syntactic. For example, the morphological inflection “-ing,” which can appear freely with think but only in restricted contexts with know (Beverly is thinking about/*knowing the animal puzzle), has been linked to the processing, cogitating sense of think. And appearance with a direct object, which is possible with know but not think (Gregory knows/*thinks that song!), has been related to the product sense of know, that which captures the accomplishment of knowledge (Dixon 1990; Wierzbicka 1988). Finally, MSVs that appear in the discourse structure of parentheticals (e.g., Beverly went to the store, I think and Your house, I know, is very old) gain the additional sense of indicating the speaker’s attitude toward the statement in the subordinate clause. The three verbs under consideration each specify different modulations about that statement (cf. Shatz et al. 1983): Think indicates a rather uncertain attitude, a belief founded on relatively weak evidence, whereas know signals a certain attitude and a well-founded belief, and guess refers to a highly uncertain attitude and a belief with little if any foundation (Urmson 1963; Hooper 1975; Lysvag 1975; Moore and Furrow 1991). In sum, the structural differences between think, know, and guess each reveal distinct components of meaning, thus shedding light on the polysemy of each verb: Know includes the notions of factivity, accomplishment of knowledge, and certainty of attitude. Think and guess include the notions of nonfactivity and processing or accessing of information, and also implicate varying degrees of uncertainty of speaker attitude. The cognitive psychological tradition has approached the polysemy of MSVs somewhat differently. Here, the methodological emphasis has been empirical rather than analytic, and the theoretical focus more on contexts rather than forms of use. For example, Hall and his colleagues (Hall, Nagy, and Linn 1984; Hall and Nagy 1987; Hall, Scholnick, and Hughes 1987; Frank and Hall 1991; Booth and Hall 1995) have postulated distinctions both within and between MSVs along a continuum of internal processing. For example, they distinguish between knowing and thinking as perceptual experiences (e.g., I know his shirt is red/I think it burst), as cognitive products (e.g., I know that tune/I thought of the number; I know why he did that/I thought of how to do it), and as metacognitive
250
Letitia R. Naigles
or evaluative processes (e.g., I know that Charlie is happier now/I think this idea is better; I would like to know more than I do/Thinking can be hard work) (examples from Frank and Hall 1991, pp. 531–532). Evidence for the continuum is primarily developmental in nature, as children have been shown to understand the perceptual and cognitive aspects of know before its evaluative and metacognitive aspects (Booth and Hall 1995; see also Richards 1982). Morever, Frank and Hall (1991) found that adults’ spontaneous speech emphasized different aspects of the meanings of think and know: Think was most often used in an evaluative sense, whereas know’s usage was best captured by the perceptual and cognitive senses. The studies of Schwanenflugel and her colleagues (Schwanenflugel, Fabricus, Noyes, Bigler, and Alexander 1994; 1996) provide a nice example of how empirical studies in the cognitive psychological tradition can illuminate the same distinctions highlighted by linguists’ analyses. Schwanenflugel et al. (1994) gave adult subjects an intensional task, in which they were to judge the similarity of meanings of pairs of mental verbs, and an extensional task, in which they were given scenarios and asked to select any number of mental verbs that could apply to them. The judgments and selections were subjected to multidimensional scaling and hierarchical clustering analyses, and several orthogonal dimensions emerged. One dimension appeared to reflect the degree of certainty of the verb, as know and guess appeared as polar opposites with think situated in between them. Another dimension appeared to reflect the creativity of the mental process, in that discover and invent (highly creative processes) and guess and hear (minimally creative processes) were maximally distinguished. Think, know, and guess did not differ among themselves on this dimension. Furthermore, a complex information-processing dimension also emerged, in which perceptual verbs first were contrasted with conceptual ones, and think, know, and guess all clustered together as conceptual verbs. The more detailed hierarchical analysis yielded a hint of how know might differ from think and guess on information processing: Know, learn, and understand were grouped together as part of a hypothesized memory component, and think, guess, reason, and estimate emerged as a cluster related to a constructive processing component. In sum, this review of the linguistic and cognitive psychological traditions concerning MSVs leads to the prediction that children’s sources of information for the meanings of these verbs are to be found in both the forms and the contexts of MSV use. Moreover, some components of meaning—degree of certainty and type of information processing (process vs. product)—seem more central than others, insofar as they have emerged in both traditions with their very different methods and
Manipulating the Input
251
purposes. It is not surprising, then, that these dimensions, and particularily certainty, have been the primary focus of questions concerning MSV acquisition. 3. The developmental trajectory of mental verb understanding Given the rampant polysemy described above, it is perhaps not surprising that children’s acquisition of mental verbs encompasses such a long period of development, extending from age two until well into the elementary school years. In brief, children’s understanding of MSVs appears to begin with conversational senses, then extends to mental senses that are relatively undifferentiated, and then progresses to the more sophisticated senses distinguishing between the MSVs. For example, think, know, and guess typically begin to appear in children’s spontaneous speech between two and three years of age (Shatz, Wellman, and Silber 1983; Limber 1973; Bretherton and Beeghley 1982); however, these early uses seem more limited than adult uses. Shatz et al. (1983) tracked children’s uses of mental state verbs over time, and found that the first uses, early in the third year, typically served conversational functions rather than mental functions (e.g., “know what?” or “I don’t know”). By three years of age, mental state uses of these verbs (e.g., “She doesn’t know all this”) become more frequent, as do verb uses that contrast real and mental states (e.g., “I was teasing you; I was pretending ’cept you didn’t know that”). Analysis of three-year-olds’ production of verbs such as think and know, then, indicates that they have acquired the mental aspects of these verbs, and have distinguished them from verbs that refer to physical or affective states (see also Wellman and Estes 1986, 1987). However, three-year-olds have yet to learn how the different MSVs are distinguished among themselves, both semantically and pragmatically. As numerous researchers have demonstrated, three-year-olds do not distinguish think, know, and guess according to differences on either the factivity dimension (Johnson and Maratsos 1977; Miscione et al. 1978; Hopmann and Maratsos 1978; Abbeduto and Rosenberg 1985) or the certainty dimension (Moore et al. 1989; Moore and Davidge 1989).2 In contrast, it seems that four-year-olds are beginning to make these distinctions. They typically perform above chance, although not errorlessly, on tasks that ask them to distinguish between knowing that something must be true and thinking that it might be true or it might be false (e.g., Moore et al. 1989; Abbeduto and Rosenberg 1985). Furthermore, Frank and Hall (1991) have found that four-year-olds typically use the verbs think and know distinctively in their spontaneous speech. For example, their modal use of think is evaluative (e.g., I think this idea
252
Letitia R. Naigles
is better) whereas their modal use of know is perceptual (e.g., I know his shirt is red). Young grade-schoolers appear to be close to mastery on distinguishing the most common MSVs on the certainty and/or factivity dimensions (see note 2); for example, they consistently restrict uses of guess to instances where no evidence was provided about the location of a hidden object, and rely on the clues of a puppet who said he knew where objects were hidden over the clues of puppets who say they guessed or thought where objects were hidden (Miscione et al. 1978; Moore et al. 1989). Very recently, Schwanenflugel et al. (1996) have provided evidence that nine-year-olds include both the certainty and informationprocessing dimensions in their organization of MSVs, and Booth and Hall (1995) have demonstrated that grade-schoolers have begun to distinguish between some of the polysemous meanings of know (i.e., between knowing that a tree house wall is broken, knowing what the wall used to look like, and knowing how to fix it). Such a long developmental trajectory for children’s acquisition of think and know may be contrasted with the much shorter trajectory associated with such equally common but more concrete verbs as jump and cry, which hold apparently adultlike meanings in the lexicons of three-year-olds (Clark 1993). In this chapter, I will argue that the longer developmental trajectory of think and know results not only because these verbs are more abstract and more polysemous than jump and cry, but also because the input provided for think and know is more confusing—at least initially—than that provided for the concrete action verbs. However, before turning attention to children’s input, one more rationale for the study of mental verbs must be discussed. 4. Mental verbs and theories of mind Mental verbs have also been of interest because they can provide clues to people’s mental activity and to their conceptual and logical representations. Historically, developmental psychologists were the first to study mental verbs from this perspective. Their concern was to discover when children could differentiate opinion from fact, when children’s egocentrism had receded sufficiently to allow distinctions between their mental state and another’s, and when children began to have access to their internal processes or psychological experiences (Johnson and Wellman 1980; Miscione et al. 1978; Shatz et al. 1983; Wellman and Estes 1987). More recently, MSVs have been studied in the context of children’s developing theory of mind (TOM). The classic definition of a TOM is the notion that other people have minds and intentions and, crucially,
Manipulating the Input
253
that the contents of these other minds and intentions can differ from one’s own and from reality. Tasks employing contrasting mental state terms have provided the primary diagnostic for the existence of a TOM in four-year-olds: If children can contrast what person A thinks about the world from what person B knows to be true, then they are capable of holding a false belief, a representation that is different from reality (e.g., Wimmer and Perner 1983). It has also been noted that the developmental courses of TOM and early MSV acquisition appear to proceed in parallel. In broad brush, three-year-olds perform at chance on most TOM tasks (unexpected change of location, unexpected contents; Hogrefe, Wimmer, and Perner 1986), while four-year-olds perform above chance and five-year-olds are essentially perfect.3 This developmental course has been remarkably resistant to alteration; for example, attempts to explicitly instruct three-year-olds on how thoughts may be in conflict with reality have consistently met with failure (e.g., Sullivan and Winner 1991; Wimmer and Hartl 1991). And as mentioned earlier, children’s early understanding of the certainty and/or factivity distinctions between think, know, and guess appears to proceed along a similar course (Johnson and Maratsos 1977; Moore et al. 1989). Moreover, MSVs and TOM have been found to correlate with each other in development: When Moore and Furrow (1991) gave preschoolers a variety of TOM tasks as well as a task tapping the certainty distinction between think or guess and know, they found a significant correlation between the two types of tasks. That is, the children who passed the unexpected-contents and the unexpected-change-of-location TOM tasks tended to be the same children who performed above chance on the think/know and guess/know distinctions. A final parallel that has been noted between MSVs and TOM is representational. That is, MSVs must include representations at two independent levels: If I say I think it is raining outside, the truth of the embedded clause is independent of the truth of the sentence as a whole, which is based on think. Thus it may not in fact be raining, yet I can still think that it is. So a complete understanding of this sentence requires an understanding of the independence of the two clauses. (It’s not the case that all verbs that take embedded complements have this requirement: If I saw that it was raining outside, then both clauses of the sentence must be true—it is raining outside and my seeing this—in order for the sentence as a whole to be true.) The same independence-of-levels holds for TOM: For children to understand that someone else (erroneously) represents an object in location A while they (correctly) represent that same object in location B, two propositions with contradictory truth values must be represented. Some recent data have suggested that children’s
254
Letitia R. Naigles
passage of TOM tasks is correlated with their mastery of the structure of MSV embedded complements (see deVilliers 1994, 1995; Tager-Flusberg 1993 for more discussion). 5. Summary In this section I have discussed how mental verbs provide a challenge to children’s acquisition because their processes are invisible (i.e., mental), because each mental verb shares aspects of its meaning with other mental verbs yet is also distinct, and because mental verbs are themselves extremely polysemous. Moreover, the close empirical relation found between MSVs and TOM development suggests that children’s transition from realizing that think and know refer to mental objects to understanding how think and know differ is akin to their transition from realizing that thoughts exist abstractly to appreciating that thoughts may be in conflict with reality. All of these factors undoubtedly contribute to the mental verbs’ long period of acquisition; however, none provides an explanation for how the acquisition is ultimately accomplished. In the next section, I consider another factor that might play a more explanatory role in the acquisition of MSVs. B. One Hypothesis for MSV Acquisition So how ARE mental verbs acquired? Clearly, any theory of acquisition must require children to pay attention to both the forms and contexts of mental verb use, as these are what help distinguish the verbs in adult lexicons. And, indeed, Hall and Nagy (1987) suggest that adults’ explicit use of mental verbs in familiar contexts is what helps draw children’s attention to the mental processes underlying the verbs. However, as Scholnick (1987) points out, there is as yet no coherent theory of how children actually acquire the mental verbs. To be sure, I do not claim to have a well-fleshed-out theory of mental verb acquisition either. My goal is more modest; namely, to provide an explanation for an early transition children make in mental verb acquisition. This transition typically occurs around the age of four, when children first distinguish think and guess from know on the certainty and/or factivity dimensions. Early explanations for this shift focused on children’s cognitive development as the instigating factor. It has been suggested, for example, that before the age of four children are unable to distinguish uncertain from certain situations (Miscione et al. 1978; Johnson and Wellman 1980) and only understand about people’s differing desires and not their differing thoughts or beliefs (Wellman 1990; Leslie 1991). More recently, however, it has been pointed out that children, especially when
Manipulating the Input
255
very young, simply may not be hearing the verbs in the usages needed to make the appropriate distinctions.4 For example, Furrow, Moore, Davidge, and Chiasson (1992) coded maternal MSV use to two- and three-year-olds and found that almost 75% of their utterances containing think were conversational and served to direct the dyadic interaction (e.g., “don’t you think the block should go in here?”). Only 5% of think utterances instantiated a true mental state reference, and less than 1% of think or know utterances were relevant to the notion of uncertainty. In fact, many early parental usages of think may (unintentionally, I am sure) implicate exactly the wrong end of the certainty dimension. For example, if a parent says, “I think it’s time for your nap,” this is not usually intended to convey uncertainty about the temporal situation vis-a-vis the child’s nap. On the contrary, it actually means that it is time for the child’s nap, and she had better get to bed. Furrow et al. (1992) would probably code this usage of think as directing the interaction, but notice that from the child’s point of view “I think” in this context could also be interpreted as meaning I am certain. A different picture of input emerges when children are older. Frank and Hall (1991) studied adult (both parent and preschool teacher) utterances containing think and know in conversation with 4.5-year-olds, and found that think was primarily used in its evaluative sense, whereas know was primarily used in its perceptual and cognitive senses. Thus not only are think and know now distinguished on semantic (and probably pragmatic) grounds, but it is also likely that many of Frank and Hall’s evaluative uses of think highlighted its uncertain sense (see Scholnick 1987). In sum, adults typically use think in its conversational sense when speaking to very young children, but apparently shift this usage as the children mature, so that think typically manifests its certainty sense in speech to five-year-olds. So here’s the question: Might children’s change in mental verb understanding between the ages of three and five years be linked to this change in input they are experiencing? A first step in investigating this question would be to demonstrate that preschool-aged children were indeed sensitive to the ways (i.e., senses) that mental verbs are used. The only study performed thus far that has linked parental input and child mental verb understanding is that of Moore, Furrow, Chiasson, and Paquin (1994), who found a positive correlation between the sheer frequency of maternal belief-term use (i.e., think, know, and guess combined) when children were two years of age and those same children’s success at distinguishing the three verbs in a comprehension task when they were four. Unfortunately, Moore et al. did not investigate any relationship between the various uses of think, know, and guess in maternal speech and children’s subsequent
256
Letitia R. Naigles
performance on mental verb comprehension tasks. How might I show that children are sensitive to the ways these verbs are used? Because these verbs are attested (as opposed to nonsense) words, I could not completely control the type of mental verb usage children heard, and there was no “natural” case I knew of where parents continued to use a restricted set of MSV senses in speech to their children. However, a variation on the Gleitmans’ deprivation paradigm suggested itself: Rather than deprive children of specific types of usage, my colleagues and I (Naigles, Singer, Singer, Jean-Louis, Sells, and Rosen 1995) sought to enhance them, via the use of television input. Our idea was to provide additional MSV tokens within the context of a television show, but restrict the senses in which these verbs were used, and see if this additional input affected children’s MSV understanding (at least in the short term). Study 1: Does “Barney and Friends” influence mental verb understanding? While no one would claim that television input, even in these days of rampant television-watching, provides sufficient linguistic input for children to learn everything about a language, there is some suggestive evidence that some forms of television input have the potential to influence young children’s vocabulary development. For example, Rice et al. (1990) found that the amount of “Sesame Street”-watching by children from age three to age five was a positive predictor of growth in Peabody Picture Vocabulary Test-Revised (PPVT) scores over the two-year period. Moreover, Rice and Woodsmall (1988) found significant gains in preschoolers’ understanding of low-frequency nouns and adjectives after they watched short animated film clips from a children’s cable channel whose voice-over narration included those words. In sum, recent research suggests that contemporary television that has been designed for children has a significant effect on their overall vocabulary development. For this study of television input, we chose the show “Barney and Friends.” Earlier research by my colleagues had confirmed the popular perception that this show is extremely engaging to preschoolers; therefore, the episodes could be counted on to keep the children’s attention (Singer, Singer, Sells, and Rosen 1995). Moreover, Singer and Singer (1997) found that preschoolers who watched specific episodes of “Barney and Friends” showed significant gains in the number of nouns (all used in the episodes) they could define while those who had not watched the episodes showed no change. Thus we can conclude that the children were attending to at least some of the linguistic content of the episodes. Furthermore, Singer and Singer (1997) had already per-
Manipulating the Input
257
formed detailed codings of the social and cognitive content of 48 “Barney and Friends” episodes, and their ten top-ranked episodes were found to include numerous uses of our three target verbs, think (63 tokens), know (106 tokens), and guess (22 tokens). Our goal in this study was to see if providing “extra” input for the MSVs think, know, and guess would influence children’s understanding of these verbs. We assessed children’s current stage of mastery of the certainty distinction between these verbs and then had half of the children watch these “Barney and Friends” episodes over the course of two weeks. After the two week period, each child’s MSV understanding was assessed again. A pure frequency account, à la Moore et al. (1994), would yield the prediction that the children who were exposed to these ten episodes of “Barney and Friends” would perform better on mental verb comprehension tasks after exposure than before, and also better than the comparison group of children who received no special exposure. This is because simply hearing these verbs more frequently should promote children’s better understanding. However, an account based on the ways in which the verbs were used might yield a different prediction. When we coded the utterances containing the three verbs into the following five categories: (a) Certainty (e.g., “I know that I’m part of my neighborhood”) (b) Uncertainty (e.g., “I think I’ve seen this napkin somewhere before”) (c) Opinion (e.g., “I think your African clothes are pretty”) (d) Process (e.g., “You could think of a number for a guessing game”) (e) Accomplishment (e.g., “I know that song”) we found that uncertain uses of “think” and “guess” were rare, comprising only 8–9% of all tokens. In contrast, certain uses were much more prevalent, comprising almost 43% of “think” tokens, 27% of “guess” tokens, and 32% of “know” tokens (most of the other utterances containing these verbs invoked their process or accomplishment senses). Thus “think” and “guess” were used three to five times more frequently in certain contexts than in uncertain ones. Moreover, the percent of certain utterances was roughly equivalent for all three verbs: “know” was not distinguished from “think” or “guess” by appearing more often as pragmatically certain. If children are sensitive to the ways in which the verbs are used, rather than just their frequency, then the children watching these episodes of “Barney and Friends” might come
258
Letitia R. Naigles
away with the notion that all three mental verbs refer to certain mental states, in which case the children’s subsequent performance on a mental-verb-distinction task would not be expected to improve. Method Participants The final sample included 39 three-, four-, and five-year-old children drawn from three local preschools. All of the children were native speakers of American English; all but six were of EuropeanAmerican heritage. Twenty-two participated in the “Barney”-watching group (10 boys, 12 girls; MA = 47.73 months [SD = 6.18]) and seventeen in the nonwatching group (8 boys, 11 girls; MA = 49.35 months [SD = 7.57]). Because of their failure to reach criterion on the practice trials (see below), an additional 11 children were tested but then eliminated. Materials, design, procedure Moore et al.’s (1989) assessment of MSV understanding was used (see also Moore and Davidge 1989; Moore and Furrow 1991; Moore et al. 1994). The materials included two small boxes, one blue and one white; two novel hand puppets, named Jazz and George; and one small toy. Experimenters told the children the following: “We are going to play a hiding game. When you close your eyes, I will hide the toy in either the white or the blue box and you have to find it. Lucky for you, Jazz and George will watch me hide it so they can help you to find the toy. So if you want to find the toy, you need to listen carefully to what Jazz and George tell you.” During the practice trials, the puppets distinguished the boxes via the use of the negative; that is, Jazz says, “It’s in the blue box” and George says “It’s not in the white box.” When the children chose the correct box during these trials they were praised and given stickers, and if they chose the incorrect box they were corrected. To reach criterion, the children had to be correct on three practice trials in a row (out of six). As mentioned earlier, 11 children did not reach criterion during the pretest, posttest, or both. Once the practice trials were successfully completed, the test trials commenced in much the same format. Here, the puppets distinguished the boxes on the basis of the verbs think, know, and guess. That is, if the toy was in the white box, Jazz might say “I think it’s in the blue box,” while George would say “I know it’s in the white box.” Care was taken not to unduly emphasize the mental verbs; the experimenters maintained an even prosody throughout each utterance. Then the experimenter would ask, “Where is the toy?” Notice that in these test trials (and unlike the practice trials), the two puppets’ clues were at odds with each other, so the children’s task was to determine which was the
Manipulating the Input
259
correct box. The children were not told whether or not they were correct after each test trial; this was necessary to prevent the children from receiving direct feedback as to the correctness of their choices throughout the session. When the test trials were completed, each child was thanked for his or her participation and given some colorful stickers. Each child received twelve test trials in which two of the three verbs were contrasted; thus there were four presentations of each verb contrast (think/know, guess/know, think/guess). The particular puppet that made each statement, the order in which the puppets made their statements, and the box to which each referred were randomly varied throughout all trials. The trials were videotaped and then coded from the videos. The think/know and guess/know trials were coded for correctness. The correct response was to choose the box referred to by the puppet who said “I know.” The think/guess trials were not coded because Moore et al. (1989; see also Moore and Davidge 1989) had found that even eightyear-olds did not distinguish these verbs, and in fact, it is not obvious which should be considered more certain (cf. Furrow and Moore 1991; Schwanenflugel et al. 1994, 1996). Results and discussion Our first analysis compared the children’s percent of correct responses distinguishing think and guess from know for each age (three and four years), group (“Barney”-watchers and nonwatchers), and time (pretest and posttest). The results are shown in table 15.1. As the table shows, three-year-olds tended to perform more poorly than the four-year-olds during the pretest; across verb pairs, the three-year-olds chose correctly 59.7% of the time whereas the four-year-olds chose correctly 64.7% of the time. These scores are comparable to, albeit a bit lower than, those generated by the preschool-aged children of Moore et al. (1989). Did watching “Barney” (or not) affect the children’s responses? A four-way repeated-measures ANOVA was performed, in which the between-subjects variables were age (three vs. four years) and group (watchers versus nonwatchers), and the within-subjects variables were time (pretest vs. posttest) and verb pair (think/know vs. guess/know). Because of our substantial subject loss (often resulting in fewer than ten children per cell) and the exploratory nature of this study, we chose to designate an alpha level of 0.10 as our boundary of significance. Only the three-way interaction of age, group, and time reached significance (F(1,35) = 5.12, p < 0.05). Planned contrasts were performed for each age and group from pretest to posttest, collapsing across verb pair; the results are highlighted in the two graphs in figure 15.1. The
260
Letitia R. Naigles
Table 15.1 Mean percent correct (SD) on mental verb comprehension task Age
Group
Time (n)
Three
Watchers
Pretest (10) Post-test (10)
Nonwatchers
Prestest (8) Post-test (8)
Four
Watchers
Pretest (12) Post-test (12)
Nonwatchers
Pretest (9) Post-test (9)
Think/Know
Guess/Know Both verbs
60.00 (16.58) 70.00 (24.49) 64.63 (17.06) 59.38 (30.46)
55.00 (15.00) 52.50 (28.39) 60.38 (23.53) 46.88 (29.15)
57.50 (8.29) 61.25 (19.72) 62.50 (16.54) 53.13 (21.42)
68.75 (27.24) 64.58 (21.55) 55.56 (30.68) 72.22 (18.43)
75.00 (17.68) 68.75 (29.09) 54.67 (32.43) 67.56 (25.03)
71.88 (19.18) 66.67 (21.85) 55.11 (27.85) 69.89 (14.02)
top panel shows that the three-year-old children in either group changed little from pretest to posttest, but the bottom panel shows somewhat greater change within the four-year-olds. In essence, the watchers’ scores worsened while the nonwatchers’ scores improved. However, only the planned contrast involving the nonwatchers group was significant (t(8) = 1.96, p < 0.10). At the very least, these analyses suggest that watching these ten episodes of “Barney” provided no enhancement to our child participants, while not watching “Barney” facilitated those children’s improved mental verb understanding.5 However, the absence of an effect of watching “Barney” could be attributable to either of two factors: Either there really was no consistent effect, in that some children improved, some worsened, and some showed no change, or there really was a consistent effect, but it was fairly small and required a more highly powered sample to reveal itself statistically. To distinguish these possibilities, we performed a second analysis of the data in which the number of children whose scores improved, worsened, or stayed the same from pretest to posttest was tabulated. Because the previous analysis found no difference between the verb pairs, the children’s think/know and guess/know scores were averaged in this second analysis. The results are shown in figure 15.2.
Manipulating the Input
261
Figure 15.1. Percent of correct responses distinguishing think and guess from know, at Time 1 and Time 2.
As with the percent correct analysis, our three-year-old participants showed little consistent change in either experimental group. In contrast, the bottom graph of figure 15.2 shows that more “Barney”-watchers’ scores worsened than improved or stayed the same, from pretest to posttest, while more nonwatchers’ scores improved than worsened or stayed the same. A chi-square test revealed that these two distributions were significantly different (X2 = 5.96, p < 0.06). More importantly, a sign test revealed that significantly more watcher four-year-olds’ scores worsened (7) than improved (2; p = 0.07 using the binomial distribution). In summary, it appears that watching ten episodes of the TV show “Barney and Friends” did not affect three-year-olds’ understanding of
262
Letitia R. Naigles
Figure 15.2. Number of children whose mental-verb-distinction scores improve, worsen, or stay the same from Time 1 to Time 2.
the certainty distinction between the mental verbs think and guess, and know; however, such viewing did appear to affect the four-year-olds. Taken together, the percent correct and number who change analyses showed that the four-year-old children in the nonwatcher condition improved their scores, whereas the scores of many of those in the watcher condition declined. Thus, watching “Barney” seems to have led more four-year-olds to minimize the certainty distinction between think and guess, and know, whereas not watching “Barney” is associated with further progress on this distinction. These results suggest that, indeed, young children are sensitive to the ways mental verbs are used. It was not the case that simply presenting more instances of think, know, and guess yielded improved performance; in fact, more children who heard additional MSVs (the watchers) performed more poorly after exposure. What seemed to be happening to the watcher group was that the frequent certain uses of “think” and
Manipulating the Input
263
“guess” highlighted one way in which these verbs were equivalent to “know,” and so reinforced their undifferentiated status with respect to that verb. In other words, the “Barney” input could be viewed as temporarily shifting the balance of differentiating and nondifferentiating input the children received, so as to create a (one hopes) momentary delay or decrement in the watchers’ progress on the think/know and guess/know distinctions.6 Why did the nonwatchers, who received no special input, improve their scores from pretest to posttest? This question is really part and parcel of the larger one with which I began: Why do most children improve in their understanding of the certainty distinction between think, know, and guess after age four? Earlier, I hypothesized that this improvement could be attributed to a change in children’s input, specifically, an increase in the proportion of uncertain think and guess uses by adults. The results of the “Barney” study give this hypothesis some plausibility, in that children this age were shown to be sensitive to the ways MSVs are used; however, the study did not explain how the change in input actually occurs. That is, what is it that instigates this change? Do adults tap into some cognitive development that children have made and adjust their usage accordingly? Or do the children need less directing of their interactions, thus “freeing” adults’ use of think to manifest its other senses? Both of these factors might contribute, but it is hard to conceive of an entire population of parents deliberately altering their speech to their children at just the same age in order to facilitate this development. As a previous generation of Gleitman students has shown, parents’ talk to their children is primarily for the purposes of socialization and care, not for language teaching (Newport et al. 1977). However, it is the case that many children—especially those who are likely to be participants in developmental psychology studies—begin to receive a new form of input just around three to four years of age. This new input comes not from parents, but from preschool teachers. Study 2: Does preschool experience influence mental verb understanding? A major social change occurs in many children’s lives at around three to four years of age, in that they begin to attend preschool (or child care programs that include a preschool component) for anywhere from fifteen to forty hours per week. Before this time, most children are cared for either at home or in small family child care settings (Hofferth 1996). The preschool experience may be very different from this earlier type of care, in that (a) there are more children with whom to interact, especially more children close in age; (b) there is more structure to the day; and (c) teacher-child interactions tend to be more purposely instructive than mother-child interactions (e.g., about colors, numbers, and letters).
264
Letitia R. Naigles
Some recent studies have shown that preschool interactions potentially relevant to MSV development are different in kind from interactions at home with parents. Overall, the linguistic input provided in preschool by teachers has been found to be both more formal and more complex than that heard at home (Dickinson and Smith 1995). Moreover, when Hall et al. (1987) coded adult usage of MSVs as a class (i.e., not broken down by individual verb), they found that the typical parental usage was different from the typical teacher usage (this also varied by social class). Brown, Donelan-McCall, and Dunn (1996) compared MSV usage in mothers, siblings, and friends in conversation with four-year-olds, and found that friends’ (and siblings’) MSV use (again, not broken down by verb, although think and know were the most common) included more modulations of assertion than did mothers’. Finally, there has emerged recently some evidence that the experience of good quality child care or preschool matters in the pace of linguistic and cognitive development. Huttenlocher (1995) found that five-yearolds experience more growth in language comprehension over the part of the year that includes preschool attendance than over the part that includes the summer vacation. And Shatz, Behrend, Gelman, and Ebeling (1996) have found that two-year-olds who attend child care show better color-name understanding than their peers who are cared for at home. My hypothesis, then, was that the preschool environment plays a significant role in the observed progression of MSV understanding from age three to age four. It is possible that, for example, teachers of preschoolers may use think and guess in their uncertain senses more than mothers do. Morever, children may hear more of such uses from their peers, as three-year-olds and four-year-olds are often in the same class in American preschools. My conjecture was that such preschool experiences may provide a partial account for four-year-olds’ enhanced understanding and performance on MSV comprehension tasks relative to three-year-olds. The current literature on MSV development (and also TOM development, for that matter) cannot speak to this hypothesis, because all of the experimental studies that I know of have used preschool attendees as participants. What this means, though, is that the literature includes a potential confound: Is the developmental pattern that has been observed a factor of age, or of time spent in preschool? It was time for a “true” deprivation study. How could preschool input be manipulated, to see the extent to which it accounts for this transition in mental verb understanding? Luckily, here I could take advantage of a “natural experiment” in the world, because although most American three- and four-year-olds (particularly the latter) do attend some kind of preschool, sizeable numbers exist whose parents have chosen to keep them at home. Comparisons of
Manipulating the Input
265
the MSV understanding of children who have and who have not attended preschool might reveal differences in the onset of their understanding of MSV distinctions. My prediction was that children who attend preschool would show enhanced understanding of the degree of certainty distinction among the verbs think, guess, and know, relative to their agemates who have not yet attended preschool. Method Participants Twenty-four child subjects participated, twelve of whom were drawn from local preschools (MA = 52.5 months (SD = 3.28)). These children were enrolled in preschool full-time (i.e., 40 hours per week). The 12 home-reared children (MA = 53.7 months (SD = 4.66)) were recruited from playgrounds, flyers in doctors’ offices, and museums. These children had minimal experience with child care; what experience they had was in family child care (M = 8.79 hours per week). All of the children were monolingual speakers of American English, and all belonged to middle SES families. An additional three preschool children were eliminated because of their failure to reach criterion on the practice trials. The materials and procedure were the same as for Study 1. The preschool children were tested in their preschools and the home-reared children were tested at home. Results and discussion The responses were again tabulated for percent correct; the children’s performance on the think/know and guess/know distinctions were combined. The results are shown in figure 15.3. The performance of the preschoolers (M = 71.87% correct, SD = 19.18) was in line with that found by previous studies (e.g., Moore et al. 1989), and was significantly better than would be expected by chance (p < 0.05). Nine of the twelve children performed at 62.5% correct or better. The performance of the home-reared children was much lower (M = 55.21% correct, SD = 22.51), did not differ significantly from chance (p > 0.10), and was significantly worse than that of the preschoolers (t(22) = 1.95, p < 0.05). Only six of the twelve home-reared children performed at 62.5% correct or better. These findings support my prediction that preschoolers would perform better on MSV comprehension tasks than children of the same age who had not attended preschool. These four-year-old preschoolers correctly distinguished think and guess from know, in that they chose the box designated by the puppet who said “I know,” rather than the puppet who said “I think” or “I guess,” significantly more often than would
266
Letitia R. Naigles
Figure 15.3. Percent of correct responses distinguishing think and guess from know, for preschoolattending and home-reared four-year-olds.
be expected by chance. In contrast, the home-reared four-year-olds’ performance resembled that of the three-year-old subjects seen in other studies (e.g., Moore et al. 1989; Johnson and Maratsos 1977): They were equally likely to pick the boxes designated by puppets who used “know,” “think,” or “guess.” In other words, they did not distinguish these three verbs on the degree of certainty dimension. C. Discussion and Conclusions Thus far, these hypotheses concerning a role for input in children’s acquisition of MSV distinctions have received some preliminary support: Both television input and preschool experience affected children’s performance on a test requiring them to distinguish between mental verbs. That is, television input that minimized the certainty distinction between think, guess, and know evidently led more four-year-olds to treat the three verbs as equivalent on this dimension. Moreover, preschool input—broadly defined as full-time experience in preschool—evidently resulted in the relevant children treating the verbs more distinctively than their non-preschool-attending peers. The notion is, then, that one instigating factor for children’s development of the certainty distinction between think, guess, and know at age four is that their preschool-based input has gained some empirical as well as theoretical plausibility.
Manipulating the Input
267
Clearly, though, more research is needed to address some critical methodological and theoretical issues. For example, one methodological question concerns how well the two samples in Study 2, of preschool attendees and home-reared children, were equated. That is, just because the children were closely matched in age did not necessarily mean they were as closely matched in other aspects of development, be they social, linguistic, or cognitive. Of course, I could not randomly assign half of the children to attend preschool and the other half to stay at home; I was constrained by the parents’ decisions regarding whether to send their children to preschool or not. Thus it is possible that the preschool attendees were already ahead of their home-reared peers in language development, and this was why they were attending preschool. In other words, the time course of the children’s development may have caused their preschool attendance rather than the other way around. My collaborators and I are beginning to address this issue by conducting a longitudinal study in which three-year-old preschool attendees and home-reared children, now matched on language and cognitive development milestones as well as age, are being repeatedly assessed for their mental verb understanding over the course of 1.5 years. If preschool experience is a key factor in beginning to understand the certainty distinction between think, guess, and know, then preschool attendees should perform above chance on these tests at an earlier age than home-reared children. Our preliminary findings point in this direction (Marsland, Hohenstein, and Naigles 1997). More theoretical questions concern how the preschool experience, if real, exerts its influence. What is it about preschool that may be facilitating the acquisition of this MSV distinction? Any serious answer to this question must include detailed comparisons of teacher-preschooler and parent-child interactions, thereby highlighting how the language used by adults in preschool differs from that used at home. My collaborators and I have collected a corpus of such interactions and are in the process of performing such comparisons (see Hohenstein, Naigles, and Marsland 1998 for some preliminary findings). What we have uncovered so far are numerous interactions in the preschools, such as those below, which have the potential to be facilitative. (1) Teacher: What color is your ant? Child A: Black Child B: Brown Child A: No, black Child B: I said brown Teacher: Thank you. And I think there are brown ants, I’m almost positive!
268
Letitia R. Naigles
(2) Teacher: Well, here’s a page missing, but this is what I think the page said. (3) Teacher: Child A: Teacher: Child B: Teacher:
Now let’s count up here, one, two, three, four Four on one Are you reading behind my back? Let’s count here. Five on one. Wait a minute, now you’re guessing. Don’t do that.
In the first two extracts, the teacher’s use of think seems explicitly marked as less-than-certain because she is only “almost positive” in (1) and because a page in the storybook is missing in (2). In extract (3) the teacher is reading Bears on Wheels (Berenstain and Berenstain 1969) but the child is talking about a page yet to be read. The teacher’s use of guess in this context may serve to highlight her sense that the child must be uncertain about what she is saying. We expect to see fewer of such interactions in our home recordings, although we have not yet analyzed enough of them to come to any conclusions. In addition, in line with the linguistics tradition’s focus on MSV forms, we expect to find more syntactically distinctive uses of think, guess, and know—what Naigles and Hoff-Ginsberg (1995) have termed “syntactic diversity”—in teachers’ input than in mothers’ input. With these additional studies, we will have a clearer picture of when children learn what about mental state verbs, and how their input (as opposed to other aspects of their development) contributes to this learning. Notice again that I am proposing a very specific role for a very specific type of input here, namely, that preschool input, by virtue of its formality and didactic context, enables the appropriate contexts for the distinctive use of these mental verbs in a way that the usual maternal input, with its focus on socialization and care, does not. One would not necessarily expect that the preschool experience would matter for other aspects of language acquisition, such as the acquisition of argument structure or of yes-no questions, because these aspects seem less susceptible to the overly polite register often used with young children in this culture. However, given the correlations observed between MSV acquisition and theory of mind development, it is possible that the preschool experience may also facilitate children’s development of a TOM. Recent discussions of TOM development have begun to consider the child’s environment in more detail, and researchers have pointed to such possible instigating factors as siblings in general, intersibling conflict and trickery, pretense, and peer language use (Bartsch and Wellman 1995; Jenkins and Astington 1996; Perner et al. 1994; Brown et al. 1996; Lillard 1993; Lewis et al. 1997). Surprisingly, none has specifically mentioned the preschool experience, in which all of these factors ap-
Manipulating the Input
269
pear in combination. And yet preschool may turn out to be an important catalyst for many of the cognitive achievements children have been shown to make between the ages of three and five. In closing, the deprivation paradigm pioneered by Lila and Henry Gleitman for research in language learning has shown its worth once again, by highlighting and suggesting how to weight the joint roles of input and endowment in children’s acquisition of language. Acknowledgments I am grateful to all of the teachers, parents and children who participated in these studies. Much of this work was collaborative, performed with Dorothy Singer, Jerome Singer, Betina Jean-Louis, David Sells, and Craig Rosen; moreover, I thank Abigail Heitler and Nancy McGraw for their assistance in data collection. This research has also benefited greatly from conversations with many colleagues, most especially Jill Hohenstein, Kate Marsland, Alice Carter, Jill deVilliers, Larry Horn, Bonnie Leadbeater, and Susan Rakowitz. This research was supported by NIH FIRST Award HD26596 and a Yale University Social Science Research Fund Fellowship. Correspondence should be sent to Letitia Naigles, Department of Psychology, 406 Babbidge Road, U-20, University of Connecticut, Storrs, CT 06269-1020. Notes 1. In neg-raising, the negative element in the main clause of a complex sentence really serves to negate the verb in the subordinate clause. The general idea is that the negated element can be “raised” from the subordinate clause to the main clause, but the negation itself remains in the lower clause (see Horn 1978 for more discussion). 2. None of these studies has actually investigated whether children distinguish the factivity and certainty dimensions from each other, although Moore and Davidge (1989) claim that the certainty dimension is primary in these initial mental state distinctions (see also Tager-Flusberg et al. 1997). Moreover, researchers have not yet investigated the process/product dimension with children in this age group. 3. This is with first-order false beliefs, which are distinguished from second-order false beliefs in that they are not embedded (Wimmer and Perner 1983; Wellman 1990; Astington 1998). Thus She thinks that the chocolate is in the cabinet, even though it is really in the freezer is an example of a first-order false belief, whereas She thinks that he thinks that the chocolate is in the cabinet, even though it is really in the freezer is an example of a secondorder false belief. 4. Analyzing MSVs as a class, Brown and Dunn (1991) noticed that mothers of two-yearolds tend to use them more in commentary talk than in didactic talk, and more in reference to others than to the target child. This may result in the verbs being less salient to the child and so contribute to their delay in acquisition relative to social/emotional and concrete verbs. 5. The fact that the nonwatchers’ performance at pretest was considerably lower than that
270
Letitia R. Naigles
of the watchers’ raises the possibility that the former group’s improvement at posttest is attributable to regression to the mean. When we controlled for the children’s pretest scores with an ANCOVA, however, the interaction of age and group was still present, albeit at a lower level of significance. Furthermore, the estimated posttest scores for the nonwatchers were still higher than those for the watchers (72.8% vs. 62.9%). Thus it is unlikely that the nonwatchers’ improvement at posttest is solely a function of their depressed scores at pretest. 6. How can we be sure that it was the specific mental verb input of “Barney” that resulted in the decline in the watchers’ scores, and not just a general effect of watching “Barney” or any kind of television? One clue comes from the second language task these children participated in at pretest and posttest. They were asked to enact ungrammatical sentences in which transitive verbs were placed in intransitive frames and intransitive verbs were placed in transitive frames (cf. Naigles, Gleitman, and Gleitman 1993). Their enactments were coded as to whether they followed the demands of the syntactic frame (the usual preschool-aged child response) or the demands of the verb (the usual grade-schooler and adult response). On this task, the watcher group performed better from pretest to posttest (i.e., adhered more to the demands of the verb) while the nonwatchers showed no change (see Naigles, et al. 1995 and Naigles and Mayeux, in press, for more detail). At the very least, then, it is not the case that watching these ten episodes of “Barney” depresses language abilities or performance overall.
References Abbeduto, L. and Rosenberg, S. (1985) Children’s knowledge of the presuppositions of “know” and other cognitive verbs. Journal of Child Language 12:621–641. Astington, J. (1998) Theory of mind, Humpty Dumpty, and the icebox. Human Development 41:30–39. Bartsch, K. and Wellman, H. (1995) Children Talk About the Mind. Oxford: Oxford University Press. Berenstain, S. and Berenstain, J. (1969) Bears on Wheels. New York: Random House. Booth, J. and Hall, W. S. (1995) Development of the understanding of the polysemous meanings of the mental-state verb know. Cognitive Development 10:529–549. Bretherton, I. and Beeghly, M. (1982) Talking about internal states: The acquisition of an explicit theory of mind. Developmental Psychology 18:906–921. Brown, J. and Dunn, J. (1991) “You can cry, mum”: The social and developmental implications of talk about internal states. British Journal of Developmental Psychology 9:237–256. Brown, J., Donelan-McCall, N., and Dunn, J. (1996) Why talk abut mental states? The significance of children’s conversations with friends, siblings, and mothers. Child Development 67:836–849. Clark, E. (1993) The Lexicon in Acquisition. Cambridge: Cambridge University Press. deVilliers, J. (1994) Questioning minds and answering machines. In Proceedings of the 1994 Boston University Conference on Language Development. Somerville, MA: Casadilla Press. deVilliers, J. (1995) Steps in the mastery of sentence complements. Society for Research in Child Development, Indianapolis, IN. Dickinson, D. and Smith, M. (1995) Effects of preschool lexical environment on low-income children’s language skill at the end of kindergarten. Paper presented at the Bienniel Meeting of the Society for Research in Child Development, Indianapolis, IN.
Manipulating the Input
271
Dixon, R. M. W. (1991) A New Approach to English Grammar, on Semantic Principles. Oxford: Clarendon Press. Feldman, H., Goldin-Meadow, S., and Gleitman, L. R. (1978) Beyond Herodotus: The creation of language by linguistically deprived deaf children. In A. Locke (ed.), Action, Symbol, Gesture: The Emergence of Language. New York: Academic Press. Fisher, C., Gleitman, H., and Gleitman, L. R. (1991) On the semantic content of subcategorization frames. Cognitive Psychology 23:331–392. Fowler, A., Gelman, R., and Gleitman, L. R. (1994) The course of language learning in childrenwith Down Syndrome: Longitudinal and language level comparisons with young normally developing children. In H. Tager-Flusberg (ed.), Constraints on Language Acquisition: Studies of Atypical Children. Hillsdale, NJ: Erlbaum. Frank, R. and Hall, W. S. (1991) Polysemy and the acquisition of the cognitive internal state lexicon. Journal of Psycholinguistic Research 20:283–304. Furrow, D., Moore, C., Davidge, J., and Chiasson, L. (1992) Mental terms in mothers’ and children’s speech: Similarities and relationships. Journal of Child Language 19:617–631. Gleitman, L. (1990) The structural sources of verb meanings. Language Acquisition 1:3–56. Gleitman, L. and Gleitman, H. (1992) A picture is worth a thousand words, but that’s the problem: The role of syntax in vocabulary acquisition. Current Directions in Psychological Science 1:31–35. Gleitman, L. and Gleitman, H. (1997) What is a language made of? Lingua 100:29–67. Hall, W. S. and Nagy, W. E. (1987) The semantic-pragmatic distinction in the investigation of mental state words: The role of the situation. Discourse Processes 10:169–180. Hall, W. S., Nagy, W. E., and Linn, R. (1984) Spoken Words: Effects of Situation and Social Group on Oral Word Usage and Frequency. Hillsdale, NJ: Erlbaum. Hall, W. S., Scholnick, E., and Hughes, A. (1987) Contextual constraints on usage of cognitive words. Journal of Psycholinguistic Research 16:289–310. Hofferth, S. (1996) Child care in the United States today. The Future of Children: Financing Child Care 6(2):41–61. Hogrefe, G., Wimmer, H., and Perner, J. (1986) Ignorance versus false belief: A developmental lag in attribution of epistemic states. Child Development 57:567–582. Hohenstein, J., Naigles, L., and Marsland, K. (1998) Differences in mothers’ and preschool teachers’ use of mental verbs. Presented at the Meeting of the Linguistic Society of America, New York City, January 1998. Hooper, J. (1975) On assertive predicates. In J. Kimball (ed.), Syntax and Semantics, vol. 4. (pp. 91–124). New York: Academic Press. Hopmann, M. and Maratsos, M. (1978) A developmental study of factivity and negation in complex syntax. Journal of Child Language 5:295–309. Horn, L. (1978) Remarks on neg-raising. In P. Cole (ed.), Syntax and Semantics, vol. 9. (pp. 129–220). New York: Academic Press. Huttenlocher, J. (1995) Children’s language and relation to input. Paper presented at the Bienniel Meeting of the Society for Research in Child Development, Indianapolis, IN. Jenkins, J. and Astington, J.W. (1996) Cognitive factors and family structure associated with theory of mind development in young children. Developmental Psychology 32:70–78. Johnson, C. and Maratsos, M. (1977) Early comprehension of mental verbs: Think and Know. Child Development 48:1743–1747. Johnson, C. and Wellman, H. (1980) Children’s developing understanding of mental verbs: Remember, know, and guess. Child Development 51:1095–1102. Kiparsky, P. and Kiparsky, C. (1970) Fact. In M. Bierwisch and K. Heidolph (eds.), Progress in Linguistics (pp. 143–173). The Hague: Mouton.
272
Letitia R. Naigles
Kempson, R. (1975) Semantic Theory. Cambridge: Cambridge University Press. Landau, B. and Gleitman, L. (1985) Language and Experience. Cambridge: Harvard University Press. Landau, B., Gleitman, H., and Spelke, E. (1981) Spatial knowledge and geometric representation in a child blind from birth. Science 213:1275–1278. Leslie, A. (1991) The theory of mind impairment in autism: Evidence for a modular mechanism of development? In A. Whiten (ed.), Natural Theories of Mind. Blackwell. Lewis, C., Freeman, N., Kyriakidou, C., Maridaki-Kassotaki, K., and Berridge, D. (1996) Social influences on false belief access: specific sibling influences or general apprenticeship? Child Development 67:2930–2947. Lillard, A. (1993) Pretend play skills and the child’s theory of mind. Child Development 64:348–371. Limber, J. (1973) The genesis of complex sentences. In T. E. Moore (ed.), Cognitive Development and the Acquisition of Language (pp. 169–185). New York: Academic Press. Lysvag, P. (1975) Verbs of hedging. In J. Kimball (ed.), Syntax and Semantics, vol. 4. (pp. 125–154.) New York: Academic Press. Macnamara, J., Baker, E., and Olson, C. (1976) Four-year-olds’ understanding of pretend, forget, and know: Evidence for propositional operations. Child Development 47:62–70. Marsland, K., Hohenstein, J., and Naigles, L. (1997) Learning that thinking is not knowing: The impact of preschool. Society for Research in Child Development, Washington, D.C., April, 1997. Miscione, J., Marvin, R., O’Brien, R., and Greenberg, M. (1978) A developmental study of preschool chidren’s understanding of the words “know” and “guess.” Child Development 49:1107–1113. Moore, C., Bryant, D., and Furrow, D. (1989) Mental terms and the development of certainty. Child Development 60:167–171. Moore, C. and Davidge, J. (1989) The development of mental terms: Pragmatics or semantics? Journal of Child Language 1:633–642. Moore, C. and Furrow, D. (1991) The development of the language of belief: The expression of relative certainty. In D. Frye and C. Moore (eds), Children’s Theories of Mind: Mental States and Social Understanding (pp. 173–193). Hillsdale, NJ: Erlbaum. Moore, C., Furrow, D., Chiasson, L., and Patriquin, M. (1994) Developmental relationships between production and comprehension of mental terms. First Language 14:1–17. Naigles, L., Gleitman, H., and Gleitman, L. R. (1993). Children acquire word meaning components from syntactic evidence. In E. Dromi (ed.), Language and Development (pp. 104–140). Norwood, NJ: Ablex. Naigles, L. and Hoff-Ginsberg, E. (1995) Input to verb learning: Evidence for the plausibility of syntactic bootstrapping. Developmental Psychology 31:827–837. Naigles, L. and Mayeux, L. (in press) Television as incident language teacher. To appear in D.G. Singer and J. Singer (eds.), Handbook of Children and the Media. Beverly Hills, CA: Sage. Naigles, L., Singer, D., Singer, J., Jean-Louis, B., Sells, D., and Rosen, C. (1995) Barney says, “come, go, think, know”: Television influences specific aspects of language development. Presented at the American Psychological Society, New York, NY. Newport, E., Gleitman, H., and Gleitman, L. (1977) Mother, I’d rather do it myself: Some effects and noneffects of maternal speech style. In C. Snow and C. Ferguson (eds.), Talking to Children (pp. 109–150). Cambridge: Cambridge University Press.
Manipulating the Input
273
Perner, J., Ruffman, T., and Leekman, S. (1994) Theory of mind is contagious: You catch it from your sibs. Child Development 65:1228–1238. Quine, W. v. O. (1960) Word and Object. Cambridge, MA: MIT Press. Rice, M., Huston, A., Truglio, R., and Wright, J. (1990) Words from “Sesame Street”: Learning vocabulary while viewing. Developmental Psychology 20:421–428. Rice, M. and Woodsmall, L. (1988) Lessons from television: Children’s word learning when viewing. Child Development 59:420–429. Richards, M. (1982) Empiricism and learning to mean. In S. Kuczaj (ed.), Language Development Vol. 1, Syntax and Semantics (pp. 365–396). Hillsdale, NJ: Erlbaum Associates. Scholnick, E. (1987) The language of mind: Statements about mental states. Discourse Processes 10:181–192. Schwanenflugel, P., Fabricus, W., and Noyes, C. (1996) Developing organization of mental verbs: Evidence for the development of a constructivist thoery of mind in middle childhood. Cognitive Development 11:265–294. Schwanenflugel, P., Fabricus, W., Noyes, C., Bigler, K., and Alexander, J. (1994) The organization of mental verbs and folk theories of knowing. Journal of Memory and Language 33:376–395. Shatz, M., Behrend, D., Gelman, S., and Ebeling, K. (1996) Color term knowledge in twoyear-olds: Evidence for early competence. Journal of Child Language 23:177–200. Shatz, M., Wellman, H. and Silber, S. (1983) The acquisition of mental verbs: A systematic investigation of the first reference to mental state. Cognition 14:301–321. Singer, J. and Singer, D. (1997) “Barney and Friends” as entertainment and education: Evaluating the quality and effectiveness of a television series for preschool children. In W. K. Asamen and G. Berry (eds.), Research Paradigms in the Study of Television and Social Behavior. Beverly Hills, CA: Sage. Singer, J., Singer, D., Sells, D., and Rosen, C. (1995) “Barney and Friends” as education and entertainment: The comprehension study: Preschoolers’ cognitive responses immediately after viewing a Barney episode. New Haven, CT: Yale University Family Television Research and Consulation Center. Snow, C. and Ferguson, C. A. (1977) Talking to Children. Cambridge: Cambridge University Press. Sullivan, K. and Winner, E. (1991) When 3-year-olds understand ignorance, false belief, and representational change. British Journal of Developmental Psychology 9:159–171. Tager-Flusberg, H. (1993) What language reveals about the understanding of minds in children with autism. In Baron-Cohen, S., Tager-Flusberg, H., and Cohen, D. Understanding Other Minds: Perspectives from Autism. Oxford: Oxford University Press. Tager-Flusberg, H., Sullivan, K., Barker, J., Harris, A., and Boshart, J. (1997) Theory of mind and language acquisition: The development of cognitive verbs. Society for Research in Child Development. Washington, D.C. Urmson, J. (1963) Parenthetical verbs. In C. Caton (ed.), Philosophy and Ordinary Language (pp. 220–246). Urbana: University of Illinois Press. Wellman, H. (1990) The Child’s Theory of Mind. Cambridge, MA: MIT Press. Wellman, H. and Estes, D. (1986) Early understanding of mental entities: A reexamination of childhood realism. Child Development 57:910–923. Wellman, H. and Estes, D. (1987) Children’s early use of mental verbs and what they mean. Discourse Processes 10:141–156. Wierzbicka, A. (1988) The Semantics of Grammar. Philadelphia: John Benjamins Publishing Company.
274
Letitia R. Naigles
Wimmer, H. and Hartl, M. (1991) Against the Cartesian view on mind: Young children’s difficulty with own false beliefs. British Journal of Developmental Psychology 9:125–128. Wimmer, H. and Perner, J. (1983) Beliefs about beliefs: Representation and constraining function of wrong beliefs in young children’s understanding of deception. Cognition 13:103–128.
Chapter 16 Partial Sentence Structure as an Early Constraint on Language Acquisition Cynthia Fisher For the jokes alone, the students of Lila and Henry Gleitman would be forever in their debt. But the true debt, of course, is even greater. Lila and Henry, as teachers and scientists, encourage in their students both a thorough respect for the great complexity and elegant systematicity of human language, and an equal regard for the complexity and systematicity of human learning. Together, these themes invite a series of questions that characterize much of the research on language acquisition that has emerged from the group including the Gleitmans and their students. That is, what can the learner—a child who does not yet know the grammar or the lexicon of English or Greek, or whatever language is to be learned—begin with in learning any particular language? How will the child identify and take in the relevant data provided in the environment? How will the child analyze and interpret the data he or she can encode? These are fundamental questions about the acquisition of language, but they are also questions about how very young children perceive, remember, and learn from language experience. The need to find a perceptible starting point, and to specify how the child proceeds from this point, is unmistakable to all who turn their thoughts to this matter, and is clearly stated in the following words from Chomsky. This quote is particularly appropriate in this context since it was recently pointed out to me by Lila as a plain statement of the problem: [O]ne has to try to find the set of primitives which have the empirical property that the child can more or less tell in the data itself whether they apply before it knows the grammar. . . . So now take grammatical relations, say the notion subject. The question is: is it plausible to believe that in the flow of speech, in the noises that are presented, it is possible to pick out something of which one can say: here is the subject? That seems wildly implausible. Rather it seems that somehow you must be able to identify the subject on the basis of other things you’ve identified, maybe configurational notions which are somehow constructed out of accessible materials
276
Cynthia Fisher
or maybe out of semantic notions, which are primitive for the language faculty. (Chomsky 1982, 118–119) These primitives, whatever they turn out to be, are a part of what we have come to call Universal Grammar (UG), broadly conceived as the set of capacities and limitations, mechanisms, and constraints that permit a child to soak up languages like a sponge, and guarantee that all languages, various and mutually incomprehensible as they are, share a set of core properties. It goes without saying that the charge embodied in this quote is an extremely tall order. What I will do in this chapter is merely review evidence and arguments for a few potential primitives. The story I hope to tell—with some but not all of the relevant data already in—can be summarized as follows: Viewed in the way I will describe, both configurational and semantic notions can be constructed out of materials ready to the child’s hand, and arguments can be made that together they yield an appropriately constrained starting point for linguistic understanding and syntax acquisition. The ideas summarized here have grown out of years of collaboration with Lila and Henry, and follow directly from their previous and ongoing ground-breaking work on syntactic bootstrapping (e.g., Landau and Gleitman 1985; Gleitman 1990; Gleitman and Gleitman 1997). To the extent that this makes any sense now, it is owing to their teaching, inspiration, innovation, and continued collaboration. The Contribution of Sentence Structure to Meaning It is a truism that the syntactic structure of a sentence affects its interpretation. This is what syntax is for: Brutus killed Caeser and Caeser killed Brutus differ in both sense and truth value, and languages’ various techniques for signaling the role of each noun phrase relative to the verb constitute the basic grammatical relations of the clause. The contribution of sentence structure to meaning can be seen in some oftendescribed phenomena: First, the same verbs occurring in different structures have different meanings (see, e.g., Goldberg 1996; Rappaport and Levin 1988; Ritter and Rosen 1993, among many others). For example, sentences (1) through (3) below all use the main verb have. But Jane owns something in (1), causes an event in (2), and experiences a misfortune in (3) (examples adapted from Ritter and Rosen 1993). Not much of these various senses belongs directly to have. Second, adults readily and lawfully interpret novel uses of verbs like the ones in (4), adapted from Goldberg (1996; see also Fisher 1994). Presumably, to understand or produce these, we need not already know that laugh or frown can convey transfer of possession or position. Instead, the three-argument structure,
Partial Sentence Structure as an Early Constraint
277
in combination with the words in the sentence, gives it that meaning. Children produce (see, e.g., Bowerman 1982) and understand (Naigles, Fowler, and Helm 1992) these novel uses as well; some of Bowerman’s examples are shown in (5). Ritter and Rosen (1993) argue that the surface structure and lexical content of a sentence must always be consulted to interpret the verb in that sentence. However this knowledge may best be modeled in adult grammars, the contribution of sentence structure to sentence meaning is clear. Some set of links between syntax and semantics permits adults to infer aspects of a sentence’s meaning from its structure. (1) Jane had a brown dog. (2) Jane had her secretary get her a cup of coffee. (3) Jane had her dog get run over by a car. (4) The panel laughed the proposal off the table. Her father frowned away the compliment. (5) Don’t say me that or you’ll make me cry. Why didn’t you want to go your head under? Syntactic Bootstrapping: The Basic Claim The view known as syntactic bootstrapping (Gleitman 1990; Landau and Gleitman 1985) proposes that young children use precursors of the same links between sentence structure and meaning, in concert with observations of world events, to understand sentences and therefore to acquire the meanings of verbs. If part of the relational meaning of a verb in a sentence is predictable from the sentence structure itself, then a child who hears a sentence containing a novel verb could gain some information about the meaning of the sentence from its structure. This claim is supported by evidence that children from about two to five years of age take novel verbs in different sentence structures to mean different things (see, e.g., Fisher 1996; Fisher, Hall, Rakowitz, and Gleitman 1994; Naigles 1990; Naigles and Kako 1993). The semantic information gleaned from syntax will necessarily be very abstract. After all, many verbs with widely varying meanings occur in each syntactic structure: Transitive verbs include break and like, intransitive verbs include dance and sleep. The interpretive information that could be inferred from a sentence structure could be described as relevant to a sentence’s semantic structure—for example, how many participants are involved in the sentence?—rather than event-dependent
278
Cynthia Fisher
semantic content (see, e.g., Grimshaw 1993). Dance and sleep are similar, not in the specifics of the activities or states they describe, but in their formal structure: Both require only one participant. Moreover, as mentioned above, most verbs occur in more than one sentence frame. This information could further constrain interpretations of each verb, much as subcategorization frame set information has played such a powerful role in linguistic characterizations of semantics in the verb lexicon (see, e.g., Levin and Rappaport-Hovav 1995). That is, while explain in (6) shares an abstract semantic structure with other three-place predicates, explain also occurs with sentence complements (as in 7), and shares semantic structural properties with other sentencecomplement-taking verbs. This combination of frames more sharply limits the possible interpretations that are consistent with both sentence frames (Fisher, Gleitman, and Gleitman 1991; Gleitman and Gleitman 1997). Recent evidence suggests that young children differently interpret a novel verb that appears in the two related frames shown in (8), as opposed to the two frames shown in (9) (Naigles 1996). (6) Mary explained the program to John. (7) Mary explained that her computer had eaten her paper. (8) The ducki is pilking the bunny. The ducki is pilking. (9) The duck is pilking the bunnyi. The bunnyi is pilking. We have argued that such abstract hints from the syntax could help to solve some serious problems for verb learning (see, e.g., Fisher 1994; Gleitman 1990). For example, a verb in a sentence does not simply label an event, but instead describes a speaker’s perspective on that event. Thus sentences (10) and (11) could accompany the same events. The difference between them lies not in whether the event (in the world) has a cause, but in whether the speaker chooses to mention it. This is why even adults who already know the vocabulary of English cannot guess which verb a speaker utters when shown a set of events in which the verb was used, though they can reasonably accurately guess what noun was uttered given the same kind of information (Gillette, Gleitman, Gleitman, and Lederer, 1999). Observations of events alone do not provide the right kind of information to interpret a sentence. Sentence structure cues, on the other hand, bearing principled relations to the sentence’s semantic structure, could provide information directly relevant to the speaker’s intent.
Partial Sentence Structure as an Early Constraint
279
(10) The block goes in here. (11) I’m putting the block in here. How Does the Child Obtain Syntactic Evidence? But, as Lila and Henry might say, not so fast (Gleitman and Gleitman 1997). How could syntactic bootstrapping begin? A sentence structure is a complex object, constructed of elements that are quite implausible as primitives to the language acquisition system—notions like argument as opposed to adjunct noun phrase, and subject as opposed to object or oblique argument. In considering the possible role of sentence structure in the earliest comprehension of sentences, we must also keep in mind the need to seek plausible presyntactic primitives, and mechanisms by which these might influence comprehension before a true syntactic description of a sentence can be attained (Fisher et al. 1994). Recent evidence for presyntactic structural cues to verb meaning A recent series of experiments was designed to isolate features of a sentence’s structure, testing what aspects of sentence structure influenced young children’s interpretations of a novel verb. These studies provide evidence that a plausibly early description of the structure of a sentence—its number of noun phrases—is meaningful to young preschoolers. In several studies, three- and five-year-olds (Fisher 1996) and two-and-a-half- and three-year-olds (Fisher, in press) were taught novel transitive or intransitive verbs for unfamiliar agent-patient events. On each of four trials, children watched an event in which one participant moved another participant in some novel way. These events were described by a novel verb presented in a sentence context: One group of children heard intransitive sentences, while the other group heard transitive sentences. The key feature of this study was that the identity of the subject and object of these sentences was hidden by using ambiguous pronouns, yielding sentences that differed only in their number of noun phrases. An example is shown in (12). The critical sentence frame was repeated several times (in appropriate tenses) before, during, and after three repetitions of the same event. (12) Event: One person rolls another on a wheeled dolly by pulling with a crowbar. Transitive: She’s pilking her over there. Intransitive: She’s pilking over there.
280
Cynthia Fisher
Following this introduction, on each trial the children’s interpretations of the novel verb in its sentence context were assessed by asking them to point to the participant, in a still display of the midpoint of the event, whose role the verb described (e.g., “Which one was pilking the other one over there?” vs. “Which one was pilking over there?”). Both adults and children 2.5 and 3 years old were more likely to choose causal agents as the subjects of transitive than intransitive verbs, though neither sentence identified one participant in the event as the subject. A subsequent study replicated this finding for the 2.5-year-old group alone (28–32 months), finding that even this youngest group chose agents as the participant whose actions the verbs described significantly more often for transitive than intransitive verbs (Fisher, in press). In previous studies of the role of syntax in verb learning, the linguistic contexts of novel verbs have always specified the identity of the verbs’ arguments (as in “The duck is blicking the bunny,” describing a scene in which these characters participated; Fisher et al. 1994; Naigles 1990; Naigles and Kako 1993). Given this information, children might achieve structure-sensitive interpretations of verbs by relying on assumptions about the class of semantic roles associated with each grammatical position: Children could infer that the verb referred to the activities of the participant mentioned in subject position, on the grounds that grammatical subjects tend to be semantic agents. Such a procedure is plausible, and has sometimes been assumed in discussions of syntactic bootstrapping, in part for lack of any explicit alternative. Innate links between thematic roles (abstract relational concepts like agent and theme) and grammatical functions (like subject and direct object) have been proposed to explain cross-linguistic regularities in the assignments of semantic roles to sentence positions. Though various treatments of thematic roles differ significantly in their inventory of roles and in how they map onto syntax, some system of thematic roles constitutes a primary device in linguistic theory for expressing relations between verb syntax and semantics (see, e.g., Baker 1997; Dowty 1991; Grimshaw 1990; Jackendoff 1990; Rappaport and Levin 1988). In the studies described above, however, the entire structure of the sentence, the configuration of arguments itself, was shown to be meaningful to quite young children. Even 2.5-year-olds interpret the subject referent to “mean” different things—play different roles—in the same event, depending on the overall structure of the sentence. Subjects are not preferentially causal agents unless a verb has two noun phrase arguments. This finding gives strong support to the notion that sentence structures per se are meaningful, to adults and to children as young as 2.5 years, in a way not reducible to links between event roles like agent and patient or theme, and grammatical functions like subject or object.
Partial Sentence Structure as an Early Constraint
281
Sentence Interpretation Based on Partial Sentence Representations How could sentence structures provide information about the meanings of verbs in sentences, without the aid of links between thematic roles and particular grammatical positions? The approach taken by Fisher et al. (1994), and further supported by the findings described above (Fisher 1996, in press), capitalizes on the intrinsically relational or structural nature of sentences, conceptual representations, and verb meanings (see, e.g., Bloom 1970; Braine 1992; Fisher 1996; Fisher et al. 1994; Gentner 1982; Gleitman 1990; Grimshaw 1993; Jackendoff 1990). Given the following set of assumptions, the gross similarities among these structures could permit sentence structure to influence interpretation. Conceptual structures First, in common with most recent work in verb semantics, based on Jackendoff’s (1990) research, we assume that semantic structures of verbs are essentially of the same kind as the nonlinguistic conceptual structures by which humans represent events. Both verb semantic structures and conceptual representations of events demand a division between predicates and arguments, and thus between relations and the entities they relate (see Bierwisch and Schreuder 1992; Bloom 1970; Braine 1992; Fodor 1979). Even otherwise divergent views of language acquisition strongly assume that structured conceptual representations of events, fundamentally like linguistic semantic structures, are a driving force in language acquisition (see, e.g., Bloom 1970; Pinker 1989; and many others). The current view and any form of syntactic bootstrapping share this assumption (see, e.g., Fisher 1996; Gleitman 1990). Sentence structures Second, we assume that children learning their first verbs can (a) identify some familiar nouns in fluent speech, and (b) represent these as grouped within a larger utterance structure. Whenever a child manages to do this, she will have what we have called a partial sentence representation (PSR; Fisher et al. 1994). The early appearance of nouns in children’s productive vocabularies has long been noted (see, e.g., Gentner 1982). More to the point for present purposes is that evidence for the comprehension of object names (see, e.g., Waxman and Markow 1995) precedes comprehension of relational terms by a considerable margin (e.g., Hirsh-Pasek and Golinkoff 1996), and there is strong evidence that at least some concrete noun meanings can be acquired from observation of word/world contingencies alone (Gillette et al. 1999). The grouping of words into utterances has also typically been assumed
282
Cynthia Fisher
as a prerequisite to syntax acquisition. Recent explorations of utterance prosody have begun to cash out this assumption, suggesting that children could hear utterances as cohesive based on the familiar prosodic melodies of their language (see, e.g., Fisher and Tokura 1996; Jusczyk 1997; Morgan 1986). The influence of sentence structure on selection of a conceptual structure These two sets of assumptions have consequences for early sentence comprehension. When children interpret a sentence they link one structure with another. To the extent that these distinct representations—sentence and conceptual—have similar structures, a sentence could provide a rough structural analogy for its interpretation in conceptual terms (see, e.g., Gentner 1983). Assuming that conceptual and semantic structures are of like kind, the result of their alignment will be, again roughly, a semantic structure for the sentence. To illustrate, even prior to the identification of subject and object, sentences still contain some number of noun phrases. This simple structural fact could be informative. Once children can identify some nouns, they could assign different meanings to transitive and intransitive verbs by linking a sentence containing two noun phrases with a conceptual relation between the two named entities in the current scene, and a sentence containing one noun phrase with a conceptual predicate characterizing the single named entity in the current scene. The result would be a rough semantic structure for the sentence, with semantic content derived from the specifics of the observed situation. Structural alignment would allow children to map entire sentence structures onto possible semantic structures derived from observation of events, without requiring prior identification of the subject referent as a grammatical subject, and thus could account for the findings from the pronoun-disambiguation task described above (Fisher 1996; in press). Via structural alignment, merely identifying the set of nouns within a representation of a sentence could give the hearer a clue as to the speaker’s perspective on an event. This inference need not depend on true syntactic representations; thus if this description of the phenomenon is correct, it represents a potential presyntactic route whereby simple aspects of the structure of a sentence could influence interpretation. Structure-sensitivity of this simple kind could presumably be implemented in a working model in many ways. For example, Siskind’s (1996) model of the role of cross-situational observation in vocabulary learning relies on the constraints that (a) the input is processed one utterance (rather than one word) at a time, and (b) any previously acquired elements of the meanings of words in an utterance must be
Partial Sentence Structure as an Early Constraint
283
included in the interpretation selected from candidates available from world observation. As Brent (1996) points out, this pair of assumptions makes sentence interpretation a presyntactic mapping of sentence to world rather than word to world, much as suggested by work in syntactic bootstrapping (Gleitman 1990). A presyntactic division of the linguistic data A presyntactic structure-to-meaning mapping constitutes only a very rough take on argument linking, and leaves the child considerable room to maneuver in interpreting sentences. However, the structural alignment of sentence and scene representation as described above would permit a useful distinction between transitive and intransitive sentences, giving the child a significantly better chance of interpreting sentences as their speaker intended. If we assume that working out links between something like thematic roles and grammatical positions plays a key role in syntax acquisition (see, e.g., Bloom 1970; Grimshaw 1981; Pinker 1989), at least a rough presyntactic distinction between transitive and intransitive sentences may be essential. Discussions of linking regularities assume, either explicitly or implicitly, that a predicate’s number of arguments is known from the start, often by limiting discussion to either two-place or three-place predicates (see, e.g., Baker 1997; Dowty 1991). Without assuming a fixed number of arguments, links between thematic and grammatical roles are much less regular: As our 2.5-year-old subjects showed that they knew, causal agents are the most likely subjects only of predicates with at least two arguments. A presyntactic division of the linguistic data into (roughly) one-argument and two-argument sentences could allow the child to begin with the domains within which semantic/syntactic mappings will be most regular. Number of Nouns as a Presyntactic Primitive But again, not so fast. How could the child know—before learning the grammar of English—that these sentences contain one- versus twoargument predicates? Nouns in the sentence and arguments of a verb in the sentence are not the same thing. In (13) and (14), dance has one argument position but two nouns. Via conjunction in subject position in (13), and the addition of an adjunct prepositional phrase in (14), these sentences display more nouns than arguments. If children align a twonoun sentence with the most salient conceptual representation that relates the referents of those two nouns, then they should systematically err in interpreting such sentences. That is, before a child has learned what “with” and “and” mean, or that English transitive sentences cannot appear in NNV order, (13) and (14) should both yield the same interpretation as a transitive sentence.
284
Cynthia Fisher
(13) Fred and Ginger danced. (14) Ginger danced with Fred. Previous research has explored these sentence types extensively, and the overall pattern of results provides some preliminary evidence for the predicted errors in children just at or under 2 years. At 25 months, children can interpret sentences like (13) correctly: Naigles (1990) introduced 25-month-olds to causal and noncausal versions of the same event (e.g., two characters moving in some manner under their own power versus one causing another to move in the same manner). She found that the children looked longer at the causal version when they heard a novel transitive verb, as in (15), and looked longer at the noncausal version when they heard a novel intransitive verb, as in (16). The intransitive sentence (16) is of the problematic type alluded to above, an intransitive verb appearing with two nouns conjoined in subject position. Successful interpretation of both sentences tells us that, by 25 months, the children had learned enough about the word order and functional morphology of English to interpret this as an intransitive sentence despite its two nouns. Hirsh-Pasek and Golinkoff (1996), however, found that children at 19, 24, and 28 months did not interpret similar sentences correctly when not given redundant morphological cues to help identify the structure. An example is shown in (17): The subject noun phrase contains “and,” which should signal the conjoined subject structure to wise listeners, but does not also have the plural copula “are.” Apparently, without multiple clues that the unfamiliar verb is intransitive, even 28-month-olds can be fooled by a mismatch between number of argument positions and number of nouns. This suggests that number of nouns is a strong early cue for structure-guided interpretation, and also provides a tantalizing glimpse of young children’s growing use of language-specific morphological evidence to differentiate sentence structures. Similarly, at 24 months, boys (but not girls) systematically misinterpreted sentences like (18) as naming causal acts (HirshPasek and Golinkoff 1996). These two-year-olds, presumably unaware of the meaning of “with,” assume that a two-noun sentence is transitive. (15) The duck is gorping the bunny. (16) The duck and the bunny are gorping. (17) Find Big Bird and Cookie Monster gorping! (18) Find Big Bird gorping with Cookie Monster! In summary, the presyntactic mechanism for syntactic bootstrapping proposed above makes a unique prediction. Before children acquire
Partial Sentence Structure as an Early Constraint
285
much of the syntax and function morphology of a particular language, they should systematically misinterpret sentences that have more nouns than verb argument positions. Further research is needed to explore these errors more fully. However, as described above, prior research gives some preliminary evidence for this prediction. An Appropriately Constrained Starting Point Thus far, I have suggested that a basic, presyntactic distinction between transitive and intransitive sentences could be achieved simply by identifying the nouns in a sentence and representing them as parts of a larger utterance structure. This constitutes a partial sentence representation, which shares gross structural properties of the conceptual structures the sentence could convey. It is important to note that within this view, a great deal of work in the selection of an interpretation remains to be done by the child’s preferences in constructing conceptual representations. In principle, the set of referents named in a sentence could be involved in indefinitely many different conceptual representations. Thus, like virtually all other views of language acquisition, syntactic and presyntactic bootstrapping depend on the young language learner to share significant biases in the conceptualization of events with older humans. The addition proposed by syntactic and presyntactic bootstrapping is simply that sentence structures, however the child can represent them, can play an interesting role in language acquisition as well. In the remaining space I will briefly address one kind of objection to the proposed view, and argue that, contrary to the objection, this account provides a useful initial constraint on the alignment of sentence and conceptual representations. What about subjects? At first glance the separate treatment of transitive and intransitive sentences based on a presyntactic representation of their structures may seem to stand in the way of an important syntactic generalization—the notion of syntactic subject, encompassing both transitive and intransitive subjects. Subjects, after all, are picked out by a large constellation of linguistic generalizations, including subject-verb agreement, case markings, deletion in imperatives, special control properties, the affinity of discourse topics for subject position, and so on (see, e.g., Keenan 1976). This would suggest that even though the proposed presyntactic inference provides only a gross constraint on interpretation, it is nonetheless too specific to permit an important syntactic generalization. However, as already mentioned, a category of grammatical subject general enough to encompass both transitive and intransitive subjects is
286
Cynthia Fisher
not very useful for linking grammatical and semantic/conceptual structures. The purport of the experimental evidence described above is that quite young children (just like linguists) link grammatical and semantic roles within the constraints of the number of arguments provided in a sentence. On the presyntactic structural alignment view described above, the child need not initially assume that either argument of a transitive verb plays the same semantic role as the single argument of an intransitive verb. Moreover, it is not so clear that a category “subject” broad enough to span all sentence structures should be considered a unitary primitive category. It has long been noted that the constellation of syntactic subject properties alluded to above coheres imperfectly within and across languages (see, e.g., Keenan 1976). A particularly troublesome type of cross-linguistic variation concerns the phenomenon of so-called ergative languages (see, e.g., Dixon 1994). A majority of languages, including English, have nominative-accusative syntax: The subject of an intransitive sentence is treated, morphologically and syntactically, like the agent argument of a prototypical transitive sentence. This grouping of arguments defines the familiar category subject, with the set of special within- and across-sentence subject properties listed above: As in (19), the underlined elements are in nominative case, agree in number with the verb, control null subjects in conjunctions as shown in (20), and so on. But the ergative pattern is quite different. The agent argument of a prototypical transitive receives its own case (ergative), whereas the intransitive subject and the patient argument of a prototypical transitive receive the same case. A few strongly syntactically ergative languages even reverse the pattern shown in (20): Coreference across conjoined verb phrases mirrors the morphologically ergative pattern, producing the pattern glossed in (21), unimaginable in English (Baker 1997; Dixon 1994). (19) They see him. They flee. (20) Theyi see him and Øi flee *They see himj and Øj flees (21) *Theyi see him and Øi flee They see himj and Øj flees Some (e.g., Marantz 1984) have suggested that the subjects of intransitives and the object arguments of transitive sentences together constitute the syntactic subject for those languages. This solution maintains
Partial Sentence Structure as an Early Constraint
287
one syntactic definition of subjecthood—having the same case as the subject of an intransitive—while dropping the cross-linguistically widespread link between subjects and agents. This approach raises grave problems for the project of defining regular links between syntactic and semantic relations as a starting point for language acquisition. However, by other accounts the claim that ergative languages have “patient subjects” does not describe the linguistic phenomena very well. Not all of the syntactic properties associated cross-linguistically with the subject category exhibit the reversal predicted by the patient-subject hypothesis, even in the most strongly ergative languages (see, e.g., Dixon 1994). Such data cast doubt not on the linking of agents with subjects in two-argument predicates, but on the existence of a single, primitive category “subject” that applies to both transitive and intransitive sentences across languages. Recent accounts that encompass these facts propose two senses in which a constituent can be the subject of a clause (Baker 1997; Dixon 1994), only one of which maintains the traditional link between subject and agent. What is the significance of these phenomena for the current discussion? I have argued above that, given the polysemy of the subject category, an early presyntactic distinction between transitive and intransitive sentences is essential, giving the child the division of the data within which linking regularities will work out. Now it seems that the same presyntactic division of the linguistic data could be essential for syntax acquisition more generally. Languages differ in how they distribute important syntactic phenomena over the possible combinations of argument positions in transitive and intransitive sentences. If children can make a roughly accurate presyntactic distinction between transitive and intransitive sentences based on their number of nouns, then they could begin learning about the syntactic choices of their language without assuming that either argument of a transitive sentence should be treated syntactically like the single argument of an intransitive sentence. To establish the basic morphological and syntactic typology of a language, learners may have to begin with at least three basic structural positions in sentences (the two transitive argument positions and the intransitive subject), rather than two (subject and object) (see Dixon 1994). The developmental facts are at least roughly consistent with this more flexible view of the starting point for syntax acquisition: Children seem to have no special difficulty acquiring languages with the ergative pattern, or with a combination of ergative and nominative morphology and syntax (see, e.g., Rispoli 1991; Schieffelin 1985).
288
Cynthia Fisher
Concluding Remarks This proposal for presyntactic structural guidance in sentence interpretation is intended as a first example of what I believe will be a fruitful line to pursue in discovering the earliest integration of sentencestructural and event information in verb learning. Lila’s original proposal for syntactic bootstrapping, developed with Barbara Landau, presented the strikingly innovative idea that “verb learning, while partly a function of the ostensive evidence provided, feeds upon the conceptual representation of predicate-argument logic in the syntactic format of the sentence” (Landau and Gleitman 1985, p. 121). In later work (Fisher, Hall, Rakowitz, and Gleitman 1994), we proposed that one could think of sentences as having structure even before the learner knows enough about a particular grammar to build a true syntactic structure. This partial or presyntactic structure shares some nearly inescapable similarity with the range of conceptual structures that that sentence could convey. In the work reviewed here, I have argued that if we endow the learner with some very simple alignment biases, then this primitive structure will influence interpretation as soon as the child can identify some nouns and represent them as grouped within a larger utterance. The alignment of sentence and conceptual structure would provide a (rough) presyntactic distinction between transitive and intransitive sentences. This distinction is demonstrably helpful to young children in sentence interpretation, and I have suggested that it might be needed for syntax acquisition as well. To acquire a grammar the child must have some way to represent linguistic data presyntactically. The intuition explored here is that even these initial representations could help to constrain acquisition. By exploring the potential uses of partial information in each linguistic domain, we can move toward a more complete view of the information sources, constraints, and biases required to get the child started in the acquisition of language. Acknowledgment The research described in this paper was partially supported by NSF grant DBC 9113580, and by the University of Illinois. References Baker, M. C. (1997) Thematic Roles and Syntactic Structure. In L. Haegeman (ed.), Elements of Grammar (pp. 73–137). Boston: Kluwer. Bierwisch, M. and Schreuder, R. (1992) From concepts to lexical items. Cognition 42:23–60. Bloom, L. (1970) Language Development: Form and Function in Emerging Grammars. Cambridge, MA: MIT Press. Bowerman, M. (1982) Reorganizational processes in lexical and syntactic development. In E. Wanner and L. R. Gleitman (eds.), Language Acquisition: The State of the Art (pp. 319–346). New York: Cambridge University Press.
Partial Sentence Structure as an Early Constraint
289
Braine, M. D. S. (1992) What sort of innate structure is needed to “bootstrap” into syntax? Cognition 45:77–100. Brent, M. R. (1996) Advances in the computational study of language acquisition. Cognition 61:1–38. Chomsky, N. (1982) Noam Chomsky on the Generative Enterprise: A Discussion with R. Huybregts and H. van Riemsdijk. Dordrecht, Holland: Foris Publications. Dixon, R. M. W. (1994) Ergativity. Cambridge: Cambridge University Press. Dowty, D. (1991) Thematic proto-roles and argument selection. Language 67(3): 547–619. Fisher, C. (1994) Structure and meaning in the verb lexicon: Input for a syntax-aided verb learning procedure. Language and Cognitive Processes 9:473–518. Fisher, C. (1996) Structural limits on verb mapping: The role of analogy in children’s interpretation of sentences. Cognitive Psychology 31:41–81. Fisher, C. (in press) Simple structural guides for verb learning: On starting with next to nothing. In E. V. Clark (ed.), Proceedings of the 30th Stanford Child Language Research Forum. Stanford, CA: CSLI Publications. Fisher, C., Gleitman, H., and Gleitman, L. R. (1991) On the semantic content of subcategorization frames. Cognitive Psychology 23:331–392. Fisher, C., Hall, D. G., Rakowitz, S., and Gleitman, L. R. (1994) When it is better to receive than to give: Syntactic and conceptual constraints on vocabulary growth. Lingua 92:333–375. Fisher, C. and Tokura, H. (1996) Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development 67:3192–3218. Fodor, J. A. (1979) The Language of Thought. Cambridge, MA: Harvard University Press. Gentner, D. (1982) Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In K. Bean (ed.), Language, Thought, and Culture (pp. 301–334). Hillsdale, NJ: Erlbaum. Gentner, D. (1983). Structure-mapping: A theoretical framework for analogy. Cognitive Science 7:155–170. Gillette, J., Gleitman, H., Gleitman, L. R., and Lederer, A. (1999) Human Simulations of Lexical Acquisition. Cognition 73:135–176. Gleitman, L. R. (1990) The structural sources of verb meanings. Language Acquisition 1(1): 3–55. Gleitman, L. and Gleitman, H. (1997) What is a language made out of? Lingua 100:29–55. Goldberg, A. (1996) Constructions: A Construction Grammar Approach to Argument Structure. Chicago: The University of Chicago Press. Grimshaw, J. (1981) Form, function, and the language acquisition device. In C. L. Baker and J. J. McCarthy (eds.), The Logical Problem of Language Acquisition (pp. 165–182). Cambridge, MA: The MIT Press. Grimshaw, J. (1990) Argument Structure. Cambridge, MA: MIT Press. Grimshaw, J. (1993) Semantic structure and semantic content: A preliminary note. Paper presented at conference on Early Cognition and the Transition to Language. University of Texas at Austin. Hirsh-Pasek, K. and Golinkoff, R. (1996) The Origins of Grammar. Cambridge, MA: MIT Press. Jackendoff, R. (1990) Semantic Structures. Cambridge, MA: MIT Press. Jusczyk, P. W. (1997) The Discovery of Spoken Language. Cambridge, MA: MIT Press. Keenan, E. L. (1976) Toward a universal definition of “subject.” In C. N. Li (ed.), Subject and Topic (pp. 303–334). New York: Academic Press. Landau, B. and Gleitman, L. R. (1985). Language and Experience: Evidence from the Blind Child. Cambridge, MA: Harvard University Press. Levin, B. and Rappaport-Hovav, M. (1995) Unaccusativity: At the Syntax-Lexical Semantics Interface. Cambridge, MA: MIT Press.
290
Cynthia Fisher
Marantz, A. (1984) On the Nature of Grammatical Relations. Cambridge, MA: MIT Press. Morgan, J. L. (1986) From Simple Input to Complex Grammar. Cambridge, MA: MIT Press. Naigles, L. (1990) Children use syntax to learn verb meanings. Journal of Child Language 17:357–374. Naigles, L. (1996) The use of multiple frames in verb learning via syntactic bootstrapping. Cognition 58:221–251. Naigles, L. G. and Kako, E. T. (1993) First contact in verb acquisition: defining a role for syntax. Child Development 64(6):1665–1687. Naigles, L., Fowler, A., and Helm, A. (1992) Developmental shifts in the construction of verb meanings. Cognitive Development 7:403–427. Pinker, S. (1989) Learnability and Cognition. Cambridge, MA: MIT Press. Rappaport, M. and Levin, B. (1988) What to do with theta-roles. In W. Wilkins (ed.), Syntax and Semantics, volume 21: Thematic Relations. New York: Academic Press. Rispoli, M. (1991) The mosaic acquisition of grammatical relations. Journal of Child Language 18:517–551. Ritter, E. and Rosen, S. T. (1993) Deriving causation. Natural Language and Linguistic Theory 11:519–555. Schieffelin, B. (1985) The acquisition of Kaluli. In D. Slobin (ed.), The Cross-Linguistic Study of Language Acquisition: The Data. Hillsdale, NJ: Erlbaum. Siskind, J. (1996) A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition 61:39–91. Waxman, S. R. and Markow, D. B. (1995) Words as invitations to form categories: Evidence from 12- to 13-month-old infants. Cognitive Psychology 29:257–302.
Chapter 17 Perception of Persistence: Stability and Change Thomas F. Shipley Psychology has come to be seen by many as a fragmented discipline with apparently few core concepts that span the field. However, one concept that seems to show up at many levels is identity. Throughout their careers both Henry and Lila Gleitman have grappled with the problem of defining when an organism will treat two things as the same—psychologically identical. In research ranging from rats running around in mazes to children learning a language, these two psychologists (with a little help from their friends) have sought to provide accounts of psychological identities. To illustrate the pervasive nature of identity problems in psychology, consider some research from Henry and Lila’s past. In studies of how rats find their way around an environment, Henry has shown that being passively moved through a maze allows animals to successfully run through a maze with the same shape (Gleitman 1955). Successful performance in the second maze requires that the rat treats the two mazes as identical, despite the differing motor behavior. In the same vein, categorization is basically a problem in establishing identities; Henry and Lila have worried about how one decides whether or not two objects belong in the same category (Armstrong, Gleitman, and Gleitman 1983), and which of two categories (number or letter) will be used for one object (an “0”) (Jonides and Gleitman 1972). Henry and Lila have also addressed one of the central problems of language learning: How does a child identify words and phrases that mean the same thing? To learn the meaning of a word, a child must solve the matching problem—what words go with what events in the world. Their work on verb frames, which shows that children can infer causal properties of a novel verb when the verb is presented in a familiar sentence frame, offers an important clue into how children solve this identity problem (Gleitman and Gleitman 1992; Naigles, Gleitman, and Gleitman 1993). Finally, my own dissertation research with Henry on perceptual unit formation also addressed an identity problem. I was interested in how two objects could appear to have the same shape when one was fully
292
Thomas F. Shipley
visible and the other partially occluded. In this chapter I review some recent work on the perception of identity over time that grew out of this question. Identity and Perception In perception the two most familiar examples of identity problems are recognition—how we decide we are looking at something we have seen before—and the perceptual constancies (e.g., size, distance, and lightness constancy). When viewing a scene over time both processes are evident; the size, shape, and color of most objects appear unchanging over time, and an object will be recognized as the same one that occupied that location several seconds ago. These impressions of stability hold even as we move through the environment. When we drive and look out upon the road we are approaching (or in Henry’s case, with his propensity for talking to whoever is in the back seat, the road where he has just been), the size, shape, and spacing of objects remain the same, despite a changing viewpoint. How could we possibly see stable qualities given the massive changes that occur in the retinal image whenever we move? Stability is achieved by taking advantage of the fact that the changes are not random in nature, and using the regularities in the pattern of change to identify that which remains unchanged (Gibson 1979). Some of the earliest work on perceiving stable qualities in changing arrays focused on how dynamic changes provide information for threedimensional spatial relations. Hans Wallach, one of Henry’s colleagues at Swarthmore, described the aspects of dynamic two-dimensional displays that are necessary to perceive the three-dimensional shape of objects (Wallach and O’Connell 1953). Each static image from a motion sequence may look quite different since the two-dimensional distances between an object’s parts vary considerably in projected images, but when animated, a moving object with a stable shape is seen. For example, in biomechanical motion displays (e.g., point-light walkers), like those used by Johansson (1973), the appearance of human forms does not occur until the elements move. The pattern of element motions allows the global form (a human) to be seen. The visual processes responsible for perceiving structure from motion may be present whenever we move through the environment and thus play a central role in the apparent stability of the world. The dynamic information does not need to be continuously available for a stable three-dimensional form to be seen (Michotte, Thines, and Crabbe 1964). Brief periods of occlusion do not affect the apparent stability of an object. Henry’s driving illustrates this quite clearly, and his calm, while facing rearward, reveals the compelling and potentially er-
Perception of Persistence
293
roneous nature of this impression of stability. In this situation, Henry’s lack of concern about not being able to see where he is going does not reflect an absence of imagination, but rather the impression (or, one might say the conviction) that the world does not change simply because one has changed one’s view. Objects don’t cease to exist simply because they are not visible. Two general classes of explanations have been offered to account for this stability over time. The first and more widely accepted is based on representations in memory, the other on patterns of change that indicate stability. Internal representations The phenomenal persistence of objects, even when they are momentarily out of view, has led many researchers to propose, explicitly or implicitly, a memory that contains representations of all objects in a scene, for example, object files (Treisman and Gelade 1980) and visual buffers (McConkie and Rayner 1976). Stability is achieved by matching the present image of the world, with its various visible pieces, to the objects in memory. This type of approach has found broad support, perhaps because it is consistent with our phenomenal experience of the visual world extending all around us, even in regions where we have few or no receptors. If objects appear stable and continuously present despite their sensory absence, something inside the organism (i.e., the representation in memory) must be stable and continuously present. Change as information for stability Although it might be tempting to believe the visual system maintains representations of all aspects of the environment, this is not necessary since the environment changes in lawful ways. The visual system is constructed to operate in a world where objects don’t change as a function of the viewer’s direction of gaze, or with the presence of intervening objects. The visual system does not need to store a copy of an object if the object will be there to reexamine when necessary. A representation of an object is not needed to perceive the object as stable over time if there is information that that object persists even when not in sight. Theoretical alternatives to a representation-based approach have been offered by both Gibson and Michotte. Michotte et al. (1964) argued that stability was a perceptual phenomenon—the experience of stability was a consequence not of a memory for the object, but of some aspect of the stimulus. For the display illustrated in figure 17.1, observers almost uniformly experience a circle changing visibility, although in principle one could see a form changing shape. An additional aspect of this display that may be relevant for understanding stability is that a
294
Thomas F. Shipley
Figure 17.1. An illustration of Michotte’s kinetic screen effect (figure similar to figure 1 in Shipley and Kellman 1994).
second boundary, an edge that hides the circle, is seen. This edge has a phenomenal quality similar to the ones seen in illusory figure displays. Gibson, Kaplan, Reynolds, and Wheeler (1969) identified the characteristic pattern of change that occurs whenever an object disappears from view as the aspect of the stimulus responsible for the appearance of continued existence. The pattern associated with occlusion differs from the pattern observed with changes in existence (such as drying up, exploding, or corroding). Distinguishing between a circle changing visibility and one changing shape requires only that the visual system be able to distinguish between the patterns of change that occur in the two cases. Evidence against Internal Representations Aside from the enormous burden a memory-based scheme seems to place on the visual system, this approach has difficulty explaining some of the recent work on the perception of persistence. There are a number of observations that suggest humans are much less sensitive to change than one might think. In each case some aspects of a scene are remembered; however, the finding of particular interest is that these representations appear to be quite impoverished. Substantial changes can be made in a scene in such a way that the scene appears stable—the phenomenal stability in each case is illusory.
Perception of Persistence
295
Changes that occur during saccades A wide variety of changes can occur in text while the eyes are in motion (e.g., changes in case such as replacing “eStUaRiEs” with “ EsTuArIeS”) with little effect on reading, and the reader is generally unaware that any change has occurred (see, e.g., McConkie and Zola 1979). Using a simple procedure, moving a picture to produce eye movements, Blackmore, Brelstaff, Nelson, and Troscianko (1995) have shown that people are similarly unable to detect changes in natural scenes that occur during saccades. When a picture is shown, removed, and then an altered version of the picture displayed next to the original position, subjects fail to detect the alteration. For example a chair might be removed from a scene with three chairs. The change in spatial position of the picture—and the resulting saccade—were necessary for this effect. Alterations were readily detected when the picture did not shift location. Luminance masking Subjects’ ability to report changes in large arrays of familiar elements (e.g., letters) is also quite limited (Pashler 1988). Subjects’ ability to report which letter changes in an array of ten letters is close to the level expected on the basis of full report studies when the altered array appears more than 150 msec after the original array disappears. Accuracy levels for detecting a single change in a ten-item array was consistent with subjects remembering about four items, and using those four to compare the old and new array. A similar inability to detect changes was found at shorter intervals when a luminance mask was inserted between the target and altered array. Recently Rensink, O’Regan, and Clark (1996) reported a similar finding for natural scenes. They used pairs of pictures in which some aspect of the picture was altered (e.g., the engine of an airplane was present in one picture and not in the other). Subjects were very slow to detect the differences between the pictures when a luminance mask (a grey field) was presented between the first picture’s offset and the second picture’s onset. Subjects appeared to be serially searching the picture for the change, since the time to detect the change was directly related to the order that the changed item showed up in a verbal description of the image. Continuity errors When movies are filmed, scenes that will immediately follow each other in the final movie are often filmed at different times. Such a practice can result in a “continuity error,” when some detail in a scene changes across a cut. A classic continuity error is the disappearance of
296
Thomas F. Shipley
Noah around the forty-second minute of “The Grapes of Wrath.” One minute he is part of the party traveling to California, and the next he is gone, never to return. This was not noticed by most viewers, and in general, continuity errors are not noticed by audiences (Levin and Simons 1997). Simons has brought this phenomenon into the laboratory. Subjects shown a brief film in which objects in the scene change across cuts (e.g., a two-liter soda bottle was replaced by a box) consistently fail to notice anything wrong (Simons 1996). It is even possible to change the central character in a story, and if the change occurs between cuts, subjects will fail to note the change in identity of the actor in their descriptions of the story (Levin and Simons 1997). Recently Simons and Levin (1997) extended this work to real-world interactions. They found that changes in the identity of a person are detected less than half the time when the change occurs during occlusion (e.g., by an object passing between two people engaged in a conversation). “The world as visual memory” The phenomenal experience of an extended visual field in which the boundaries of objects appear clearly defined and surface characteristics are clear have led theorists to assume that perception depends on representations that capture all of the apparent richness of a scene. Illusory stability presents a problem for such accounts: Why can’t the visual system use its representations to detect changes by comparing the present visual image with the past image? A number of researchers and philosophers have used these finding to argue that models requiring detailed representations of the visual world must be abandoned (e.g., Dennett 1991; O’Regan 1992). Massive representational edifices are not needed since the world is always there available to be consulted as needed. If the world serves as the visual store then only minimal representations need be maintained by the perceiver. If, as Gibson and Michotte claim, change and stability can be discriminated on the basis of stimulus properties, then observers may rely on the fact that they can detect changes as they occur (e.g., changes may stimulate motion detectors), and representations of the previous state of the world are not required. On such an account, stability is not the result of a psychological process, but a consequence of the way the system is constructed. In the absence of perceptual evidence for change, stability is the default; we do not actively perceive that the world is unchanging. On such an account, perception is an active process in which attention guides the pick-up of whatever information is needed for the task at hand. Any role for representations in perception is then limited to guiding attention and the ongoing task. However, attention cannot be controlled solely by the observer. As noted by Neisser (1976), any model
Perception of Persistence
297
of perception that relies substantially on internal guidance would be susceptible to problems inherent in too much assimilation: If perception is guided by the organism alone, how can it detect and process unexpected events? Furthermore, Yantis and colleagues have found that abrupt appearances of new objects attract attention (Yantis and Jonides 1984; Yantis 1993). Attention must be guided by an interaction between the organism and the environment; the pick-up of information will be determined by the observer’s intentions and expectations, as well as by some events in the world (e.g., abrupt changes in luminance and the sudden appearance of objects). Illusions of stability are also problematic for theories of perception based on patterns of change. These theories must provide some account of why the change evident in all of the examples of illusory stability cited above is not picked up. Note that the important question here is not why the world appears stable in each case, but how these cases differ from everyday experience where we reliably distinguish change from stability. One thing common to all the illusory-stability cases is that massive motion signals are present when the undetected change occurs. Motion signals occur when the eye moves, and when luminance levels change abruptly (as would occur whenever one image is replaced by a different image). Perhaps these motion signals interfere with detecting the pattern of changes that would normally be experienced as changes in the world. In support of such a hypothesis, consider one more example of a failure to detect change. Occlusion and Object Constancy Even very young children appear to treat an object that has disappeared from view as continuing to exist (Baillargeon 1987). They are surprised if an object is hidden and does not reappear when the occluding surface is removed. However, if the object is not the focus of attention, object constancy may not be seen. Douglas Cunningham and I created a video tape in which five objects moved back and forth five times, and halfway through the tape, one of the objects did not return after a brief period of occlusion. Figure 17.2 shows four frames from this video. When sixty subjects were shown the tape, introduced as an example of motion parallax, none of the subjects spontaneously reported that one of the objects disappeared. When asked if they noticed anything odd about the video, only one subject noted the change. Unlike the other examples of illusory stability, this example does not contain motion signals spread over the entire visual field. Here, the motion signals that do occur appear in a pattern that is consistent with
298
Thomas F. Shipley
Figure 17.2. Four frames from a video sequence of five objects moving back and forth. The most distant object (a small cardboard box) disappears and reappears (images 1, 2, and 3) initially, but then disappears and does not reappear (image 4).
Perception of Persistence
299
occlusion. As a result, they are not treated as information for change. The pattern of local changes in this display are all consistent with a stable world, so the disappearance of an object is not detected. To investigate further the role of change in the perception of occlusion and stability, we employed displays in which an occluding form is dynamically specified. A moving form with well-defined boundaries is seen in displays in which the elements of a sparse texture field change in a systematic manner (Shipley and Kellman 1994). For example, an opaque surface will be seen if elements disappear along the leading edge of a moving form that is the same color as the background, and then reappear at its trailing edge (see figure 17.3). No bounded form is seen in static frames of such displays. An important aspect of these displays is that phenomenally, only the forward, occluding surface appears to move; the small background elements appeared stable. From the perspective of understanding perceptual stability, this is notable since the spatial and temporal relationships between the appearance and disappearance of the background elements would, if elements were presented in isolation, result in apparent
Figure 17.3. Three frame sequence illustrating dynamic occlusion. The dotted square represents an invisible form moving over the array of elements. Elements are only visible (black) when they are outside the form; they are invisible (gray) inside the form (figure similar to figure 2 in Shipley and Kellman 1994).
300
Thomas F. Shipley
motion—when one element disappears and another appears, motion between the two locations is normally experienced. If the apparent stability in dynamic occlusion displays is a result of the same perceptual processes that result in illusory stability, then we may understand perceptual stability by understanding the perceptual processes responsible for dynamic unit formation. To test for illusory stability in dynamic occlusion displays, we employed a free report procedure (Shipley, Cunningham, and Kellman 1994). We asked subjects to describe what they saw in displays in which the background elements either changed position, or returned to their original position, following occlusion. In one type of display, simulating an opaque form, elements were invisible while inside a moving circular region. We also included displays that simulated a wire circle (elements were invisible for only a brief period of time—66 ms) and displays that simulated transparency (elements changed to red inside the circle). In half of the displays, elements reappeared where they disappeared (in the case of transparency they did not change location when they changed color), as they would if an occluder had actually passed over them. In the other half of the displays, elements reappeared in a new location following occlusion (in the case of transparency, elements change location when they changed color). None of the ten subjects reported any difference between the displays where elements reappeared in their original location and displays where elements reappeared in new locations (even in the wire and transparency displays where the temporal gap between old and new locations was minimal). The elements in both sets of displays appeared stable. To test that occlusion was critical for this illusory stability we asked a new set of subjects to describe six control displays where an occluder was not seen and elements either changed location or stayed in the same location. Each control was created using two intermediate frames from each of the previous displays. For the two occlusion controls, elements within a circular region disappeared for 167 ms (the average time elements were invisible in the dynamic occlusion display), and then appeared, either in the same location or in a new location. For the transparency controls there was no temporal gap between changes—elements changed to red, appearing either in the same or in a new location for 167 ms, and then returned to their original color. For the wire figure controls, elements in a circular ring disappeared for 66 ms before reappearing in either an old or new location. Subjects could detect changes in element location in these displays, where occluders are not seen. They had no difficulty discriminating displays in which elements stayed in the same location from displays in which elements changed location. Eight out of ten subjects reported that the elements appeared
Perception of Persistence
301
to move in at least one of the displays in which elements changed location. A Motion-Based Model of Stability and Change Philip Kellman and I recently developed a model of boundary formation in dynamic displays that may help account for the apparent stability of dynamic occlusion displays (Shipley and Kellman 1997). The model is based, in part, on principles developed in our model of static unit formation (discussed by Kellman in his chapter for this volume). The dynamic unit formation model uses the pattern of motion signals that occur over time to define the boundaries of moving objects. As a consequence, it can offer a description of the pattern of motion that identifies changes in visibility. Below I review some of our recent work that indicates the visual system uses motion signals defined by sequential occlusion events to perceive a moving surface. Motion signals as information for boundaries The background elements in dynamic occlusion displays lose their phenomenal stability as frame duration increases. At short frame durations the elements appear stable, while at longer durations the elements appear to move. The clarity and phenomenal presence of a moving surface also decreases as the duration of each frame increases (Shipley and Kellman 1994). We used accuracy in a ten-alternative shape identification task to access figural clarity; the effect of varying frame duration on boundary formation is shown in figure 17.4. In earlier work on apparent motion Sigman and Rock (1974) and Petersik and McDill (1981) noted a similar relationship between appearance of an occluding edge and the apparent stability of elements: In all cases, when a moving form is seen, the background appears stable, and when no form is seen, the elements appear to move. This suggests that the visual processes responsible for seeing the edges may incorporate local motion signals that occur at the occluding edge. As a consequence, motion signals are not consciously experienced as motion in the world when they define a boundary, but when no boundary is formed we see the individual motion signals. To test the hypothesis that motion signals are used to perceive a moving boundary, we asked a fairly simple question: What happens to the perception of boundaries when additional motion signals that do not fit the pattern produced by the moving form are added (Shipley and Kellman 1997)? Displays consisted of a form translating over an array of stationary elements while eighteen elements rotated around the center of the screen. The motion signals generated by the rotating elements proved to be very effective at disrupting shape perception; subjects’
302
Thomas F. Shipley
Figure 17.4. Shape identification accuracy plotted as a function of frame duration for three background element densities. As density increases the number of changes per frame increases, and accuracy increases. As frame duration increases the number of changesthat occur within a given temporal window decreases, and accuracy decreases. There was no interaction between spatial and temporal density suggesting a fixed temporal integration window (figure similar to figure 7 in Shipley and Kellman 1994).
accuracies in identifying the translating form were much lower when the additional motion signals were present than when they were absent (figure 17.5). Furthermore, the effect of the additional motion signals did not depend on their global organization: Coherent motions in which all elements rotated in the same direction were as effective as random local motions. This suggests that the local motion signals themselves were the cause of the disruption. Motion signals are invariably present whenever one object occludes another because abrupt changes in the visibility of elements along the edges of a moving opaque object will always result in local motion signals. Local motion signals alone, however, are not sufficient to identify occlusion since motions signals also occur when objects change shape or location in the world. These two cases may be discriminated on the basis of the pattern of local motion signals. How do motion signals define a boundary? The pattern of motion that results from dynamic occlusion can be characterized by the pattern produced by the local occlusion of only three el-
Perception of Persistence
303
Figure 17.5. Shape identification accuracy plotted as a function of background element density for four conditions: No Motion, elements rotating in the Same direction as the target form, the Opposite direction, or in Random directions (figure similar to figure 4 in Shipley and Kellman 1997).
ements (illustrated in figure 17.6a). Each pair of disappearances results in a motion signal. The magnitude and direction of that signal will be a function of the spatial and temporal separation of changes. Thus local motion signals combine spatial and temporal information about element changes. If the two vectors representing the motion signal have a common origin, their tips define the orientation of the occluding boundary (Shipley and Kellman 1997). Thus the pattern of local motion signals provides information about the local orientation of an edge, and that the elements that disappeared were occluded. To find out if observers are sensitive to the sequential pattern of motion signals, we developed displays that were consistent with dynamic occlusion but contained degenerate motion patterns (Shipley and Kellman 1997). In these displays, elements were arrayed so that the local motion signals were sequentially similar in direction and magnitude (figure 17.7a illustrates a local edge segment approaching elements that when covered would produce similar motion signals). Such a pattern is degenerate because the orientation solution outlined in figure 17.6 would be very sensitive to small errors or noise when the vectors have a similar direction. Therefore the edge should be unstable and form recognition should be compromised. Indeed subjects’ ability to identify the shape of the form defined by sequentially similar motion signals
304
Thomas F. Shipley
Figure 17.6. An illustration of sequential occlusion by a local edge segment. a) As an edge moves from left to right, it sequentially covers three elements. The local motion signals, v12 and v23, are defined by the sequential disappearance of elements 1 and then 2, and 2 and then 3, respectively. b) The orientation of the occluding edge is defined by the length and orientation of the local motion signals (figure similar to figure 5 in Shipley and Kellman 1997).
Perception of Persistence
305
was severely impaired relative to displays with the usual randomly oriented motion signals (figure 17.7b illustrates a local edge segment approaching a set of elements that is identical in their spatial arrangement relative to the edge, but the sequence in which they will be occluded is random). The phenomenal appearance of these displays was also consistent with our hypothesis that local motion signals will be experienced when not incorporated into a boundary. Occlusion was not seen in the sequentially similar motion displays; instead motion of the elements was seen. In addition to providing information about the continued existence of an occluded surface and the shape of the occluding surface, motion signal patterns may also provide information about the opacity of moving surfaces. The pattern of motion signals produced by the movement of a partially transparent surface will resemble the one produced by an opaque surface. It will differ only in the magnitude of the temporal contrast modulation, relative to the average contrast. As an initial test to see if opacity could be dynamically specified, subjects were asked to describe what they saw in dynamic occlusion displays from which all static information for surfaces was removed (Cunningham, Shipley, and Kellman 1998). To remove static information while retaining the pattern of change over time, we added a large number of unchanging elements to a display in which elements disappeared as a form moved around the screen. One might conceive of such a display as two fields of elements with a form moving between the two. From the point of view of the observer only some of the elements disappear and reappear (the ones in the more distant field), so elements are seen both inside and outside the moving surface. This effectively masks static information for a surface hiding background elements. When the form moved, subjects reported seeing a dark surface with well-defined boundaries, and a specific location in depth (the form appeared to move between two layers of elements). The pattern of changes over time must have been responsible for the perception of shape, opacity, and depth. We are currently investigating whether subjects are sensitive to dynamic specification of degree of opacity. In sum, the pattern of motion signals that result from elements appearing and disappearing provides information about the occluding edge and about the continued existence of the elements. The standard accretion and deletion displays appear to have stable backgrounds because the local motion signals are integrated into the motion of the occluding figure. When this does not occur (e.g., when there is a long pause between frames) or when edges are unstable, then the local motions are experienced and no edge is seen. In occlusion-based displays of illusory stability, the background appears stable because the motion
306
Thomas F. Shipley
Figure 17.7. An illustration of a set of elements that when occluded will produce (a) similar and (b) random motion signals. a) Each element is shifted off the line defined by the previous two elements by 6 degrees. When a moving edge (indicated by a grey dashed line) occludes these elements the sequential motion signals will be similar in magnitude and orientation. b) These elements have the same location, relative to the occluding edge, as in Figure 7a but order of occlusion has been randomized so sequential motion signals will differ in orientation.
pattern is consistent with a moving boundary—the motion signals are integrated into the moving edge and are not interpreted as motion of the elements. Only when a boundary is not seen are changes in element locations noticed. The Role of Attention Finally, although subjects in our experiments do not seem to be sensitive to changes in location following occlusion, there are a number of perceptual phenomena for which observers do appear to maintain some representation over time that includes spatial position. Michotte’s demonstration of tunneling is one example (Michotte et al. 1964). In Michotte’s displays a moving dot disappeared behind an occluder and then reappeared on the other side. If the dot reappeared at a time and location consistent with a smooth continuous path behind the occluder, it appeared to continue to exist while out of sight. In contrast if the dot appeared at some other location or after a very long (or short) interval, subjects reported seeing two dots—one that disappeared and one that appeared. How does tunneling, in which the percept is sensitive to spatial changes during occlusion, differ from the displays presented here? One possibility is that the difference lies in the number of elements. Alternatively, as suggested earlier, attention may play an important role in
Perception of Persistence
307
detecting change when the pattern of motion signals cannot be used. Object permanence in occlusion displays may require attention to a particular object (or set of objects). Indeed, it is possible to see the changes in illusory-stability displays if the object that changes is the focus of attention. However, as noted by Levin and Simons (1997), attention to the object that changes is not sufficient—not all aspects of an object may be represented. The appearance of illusory stability in the examples discussed previously does appear to change with experience. For example, once the changes in Simons’s and Rensink et al.’s displays have been seen, they are almost immediately noticed when shown a second time. Conclusion Recent interest in illusory stability seems to reflect a hope that it will help us with a long-standing problem: How does perception relate to our conscious experience of the world? These particular illusions may have captured attention because the mismatch between reality and conscious experience is large and (for many accounts of perception) should be noticed. I have argued here that the psychological identity of objects over time is based on local motion information. However, the relationship between local motion signals and the conscious experience of something in the world going out of sight or changing shape is not direct. Information for stability and change are, to use Köhler’s term, “Ehrenfels qualities” (Köhler 1947). It is not motion per se that distinguishes persistence from change, but rather the pattern of motion signals: One pattern tells us about how things are changing in the world, and another pattern tells us that things are stable. So, the perception of the here and now depends on both the way things appear at the moment, and how things are changing over time. Acknowledgments The research and preparation of this manuscript were supported by NSF Research Grant BNS 93–96309. I would like to thank John Jonides and Daniel Reisberg for their extensive feedback on an earlier version of this chapter. References Armstrong, S. L., Gleitman, L. R., and Gleitman, H. (1983) What some concepts might not be. Cognition 13:263–308. Baillargeon, R. (1987) Object permanence in 3 1/2- and 4 1/2-month-old infants. Developmental Psychology 23:655–664.
308
Thomas F. Shipley
Blackmore, S. J., Brelstaff, G., Nelson, K., and Troscianko, T. (1995) Is the richness of our visual world an illusion? Transsaccadic memory for complex scenes. Perception 24:1075–1081. Cunningham, D. W., Shipley, T. F., and Kellman, P. J. (1998) The dynamic specification of surfaces and boundaries. Perception 27:403–415. Dennett, D. C. (1991) Consciousness Explained. Boston: Little, Brown. Gibson, J. J., Kaplan, G., Reynolds, H., and Wheeler, K. (1969) The change from visible to invisible: A study of optical transitions. Perception and Psychophysics 5(2):113–116. Gibson, J. J. (1979) The Ecological Approach to Visual Perception. Hillsdale, NJ: LEA. Gleitman, H. (1955) Place learning without prior reinforcement. Journal of Comparative and Physiological Psychology 48:77–89. Gleitman, L. R. and Gleitman, H. (1992) A picture is worth a thousand words, but that is the problem: The role of syntax in vocabulary acquisition. Current Directions in Psychological Science 1(1):31–35. Johansson, G. (1973) Visual perception of biological motion and a model for its analysis. Perception and Psychophysics 14:201–211. Jonides, J. and Gleitman, H. (1972) A conceptual category effect in visual search: O as a letter or a digit. Perception and Psychophysics 12:457–460. Köhler, W. (1947) Gestalt Psychology. New York: Liveright Publishing Levin, D. T. and Simons, D. J. (1997) Failure to detect changes to attended objects in motion pictures. Psychological Bulletin and Review 4(4):501–506. McConkie, G. W. and Rayner, K. (1976) Identifying the span of the effective stimulus in reading: Literature review and theories of reading. In Theoretical Models and Processing in Reading, ed. H. Singer and R. B. Ruddell. Newark, Del.: International Reading Association, 137–162. McConkie, G. W. and Zola, D. (1979) Is visual information integrated across successive fixations in reading. Perception and Psychophysics 25(3): 221–224. Michotte, A., Thines, G., and Crabbe, G. (1964) Les complements amodaux des structures perceptives. Studia Psycologica. Louvain: Publications Universitaires de Louvain. (English translation given in: Michotte, A. [1991] Michotte’s experimental phenomenology of perception, ed. and trans. by Thines, G., Ccostall, A., and Butterworth, G., pp. 140–169. Mahwah, NJ: Erlbaum.) Naigles, L., Gleitman, H., and Gleitman, L. R. (1993) Children acquire word meaning components from syntactic evidence. In Language and Cognition: A Developmental Perspective, ed. E. Dromi. Norwood, NJ: Ablex, 104–140. Neisser, U. (1976) Cognition and Reality. New York: Freeman. O’Regan, J. K. (1992) Solving the “real” mysteries of visual perception: The world as an outside memory. Canadian Journal of Psychology 46(3):461–488. Pashler, H. (1988) Familiarity and visual change detection. Perception and Psychophysics 44:369–378. Petersik, J. T. and McDill, M. (1981) A new bistable motion illusion based upon “kinetic optical occlusion.” Perception 10:563–572. Rensink, R. A., O’Regan, J. K., and Clark, J. J. (1996) To see or not to see: The need for attention to perceive change in scenes. Investigative Ophthalmology and Visual Science Supplement 37(3):S978. Shipley, T. F. and Kellman, P. J. (1994) Spatiotemporal boundary formation: Boundary, form, and motion perception from transformations of surface elements. Journal of Experimental Psychology: General 123(1):3–20. Shipley, T. F. and Kellman, P. J. (1997) Spatiotemporal boundary formation: The role of local motion signals in boundary perception. Vision Research 37(10):1281–1293.
Perception of Persistence
309
Shipley, T. F., Cunningham, D. W., and Kellman, P. J. (1994) Perception of stability in dynamic scenes. Paper presented at the 35th Annual Meeting of The Psychonomic Society, St. Louis, November 1994. Sigman, E. and Rock, I. (1974) Stroboscopic movement based on perceptual intelligence. Perception 3:9–28. Simons, D. J. (1996) In sight, out of mind: When object representations fail. Psychological Science 7(5): 301–305. Simons, D. J. and Levin, D. T. (1997) Failure to detect changes to attended objects. Investigative Ophthalmology and Visual Science Supplement 38(4):S707. Treisman, A. M. and Gelade, G., (1980) A feature-integration theory of attention. Cognitive Psychology 12:97–136. Wallach, H. and O’Connell, D. (1953) The kinetic depth effect. Journal of Experimental Psychology 45(4):205–217. Yantis, S. and Jonides, J. (1984) Abrupt visual onsets and selective attention: Voluntary versus automatic allocation. Journal of Experimental Psychology: Human Perception and Performance 10:601–621. Yantis, S. (1993) Stimulus-driven attentional capture. Current Directions in Psychological Science 2(5):156–161.
Chapter 18 Putting some Oberon into Cognitive Science Michael Kelly In directing A Midsummer Night’s Dream a few years ago, Henry Gleitman cast the same actor in the roles of Theseus and Oberon. The former represents rationality distilled to its essence, the scientist Apollo who grasps as much as “cool reason ever comprehends,” but no further. The latter is the artist Dionysus, imaginative beyond reason, but consequently self-indulgent, undisciplined, and lazy in the natural luxury of his forest realm. These two figures, reason and imagination, are failures, dead ends that cast a shadow of pessimism over the celebration at the end of the play. I never understood this impression fully until I experienced Henry’s version. Throughout the play, Lysander, Demetrius, Hermia, and Helena fret and scheme and argue and moan about whom they love and who should love them. In the end, though, all the pieces seem in place and the lovers twitter happily. However, Henry’s insightful casting made clear that one couple is still divorced: Theseus and Oberon. In the world of the play, Theseus and Oberon will never be united. After all, they don’t even seriously acknowledge each other’s existence. In contrast, Henry Gleitman has consistently rejected a fundamental opposition between science and art. In Henry’s educational philosophy, one student might enroll in college as a premed Theseus and another as an actor Oberon. Given the proper environment, each student should nonetheless graduate as a “Theseron,” and be better doctors and actors as a consequence. Henry has practiced this philosophy throughout his teaching career. In his text Psychology, he frequently uses art to illustrate psychological principles. However, as might be expected from someone who has worked on the concept of symmetry (Gleitman, Gleitman, Miller, and Ostrin 1996), Henry is aiming for reciprocal effects here by encouraging students to think about artwork in a novel way. In his seminar on the psychology of drama, Henry brings together psychology majors and students of theater and literature. As the students struggle to communicate and understand their diverse perspectives on Hamlet and Othello,
312
Michael Kelly
they develop an appreciation of human achievement that is both broader and deeper than could have been attained in a class that separated the arts from the sciences. When you see Henry carving out a complex ANOVA as though it’s some kind of classical sculpture, you realize that the distinction between art and science is as meaningless to him in research as in teaching. Though she differs from Henry on many other issues, like the worthiness of various activities to be deemed sports, Lila Gleitman has the same attitude. Indeed, they both live that view to the hilt. It’s hard to think of two people who merge so much passion for their objet d’art— language—with analytical talents that are relentless in determining how it’s learned, and then put to use in both work and play. In keeping with this theme of science and art . . . well, if not united, at least aligned in “fearful symmetry,” I will in this chapter present some examples of how cognitive principles can illuminate certain aspects of creative language use. The examples are far from exhaustive; they are more like a sampling of cheeses at the Gleitman research seminar. However, the topics do correspond roughly with aspects of language that Henry and Lila have examined over the years, such as lexical and phrasal stress (Gleitman and Gleitman 1970; Gleitman, Gleitman, Landau, and Wanner 1988), orthography (Gleitman and Rozin 1977), phrasal conjuncts (Gleitman 1965), and associative learning (Meier and Gleitman 1967). I hope to show through these case studies that basic research in cognitive science can be applied productively to language innovation, and might even be worthy of discussion in future versions of Henry’s psychology of drama course. The Rhythmic Structure of Verse Like Shakespeare’s other poetry, the verse portions of A Midsummer Night’s Dream are generally written in iambic pentameter. This meter has had a distinguished history in English literature because it forms the rhythmic basis for much of our poetry, including the greatest works of Chaucer, Shakespeare, and Milton. Given the prominence and prevalence of iambic pentameter in English verse, poeticists have placed high priority on understanding its structure. A canonical line in iambic pentameter consists of five disyllabic feet, with each foot beginning with a weak beat and ending with a strong beat. However, few lines actually fit this pattern perfectly. For example, in (1) the adjective “wise” appears in a weak position even though, as an open-class word, it should be prosodically salient. (1) And, after that wise prince, Henry V (3HVI.III.iii.)
Putting some Oberon into Cognitive Science
313
However, given its context, this positioning is understandable. In particular, phrases like “wise prince” and “black bird” generally have an iambic rhythm in speech, and this rhythm is respected in poetry by aligning such phrases in weak-strong position. In contrast, compound words like “blackbird” are pronounced with a trochaic rhythm, and consequently are set in strong-weak position in verse (Kiparksy 1975, 1977). This analysis assumes that the rhythmic structure of verse generally respects the prosodic principles of speech. This link could provide a powerful heuristic for proposing and testing hypotheses about poetic meter. For example, spoken stress is associated with information value (see Levelt 1989, for summary). If this relationship is preserved in poetry, then relatively informative words should appear in strong position more often than less informative words. For instance, marked adjectives like “short” are more informative than unmarked adjectives like “tall” in that they pick out a particular region of a dimension such as height whereas the unmarked adjective often refers to the dimension as a whole. Thus a question like “How tall is Theseus?” does not presuppose that Theseus is especially tall. However, the use of “short” would imply that Theseus is low on the height dimension (relative to some category, such as predemocracy Athenians). Given this information difference between marked and unmarked adjectives, one would predict that the former should be more likely to appear in stressed position in poetry. In Kelly (1989), I tested this hypothesis by examining where 17 dimensional adjective pairs like short-tall, cold-hot, and smooth-rough appeared in the Shakespeare selections printed in Bartlett’s Quotations. Overall 70% of the uses of marked adjectives appeared in stressed position compared with 49% of unmarked adjectives. Furthermore, in 14 of the 17 pairs, the marked member was more likely to appear in stressed position. As another example of how informativeness might influence the alignment of words with poetic meter, consider (2). Theseus’s opening lines in A Midsummer Night’s Dream contain two instances of “moon,” with the first appearing in a stressed position and the second appearing in an unstressed position. (2) Now, fair Hippolyta, our nuptial hour Draws on apace; four happy days bring in Another moon: but, O, methinks, how slow This old moon wanes! she lingers my desires. (MSD, I.i.1–4) This difference might reflect prosodic effects of the given-new distinction. In particular, Fowler and Housum (1987) found that the first occurrence of a word in speech, corresponding with new information,
314
Michael Kelly
receives more stress than the second occurrence, corresponding with given information. If this relationship between givenness and stress operates in poetry as well as spoken prose, then one might expect patterns like that shown in (2). A detailed test of this hypothesis remains to be performed, but it further illustrates the manner in which our knowledge of prosody can be applied to verse. Spelling and Stress Proper names like “Claire” are often padded with extra letters that do not affect pronunciation but, like word-initial capitalization, provide a distinguishing mark for names (Carney 1994). This phenomenon is illustrated most clearly in homophones that involve proper and common nouns such as /web/ and /faks/. In contrast with the common nouns “web” and “fox,” the surnames “Webb” and “Foxx” double the final letter. This distinction exploits creatively the oft-derided variability in English orthography. In particular, when properly manipulated, spellings like “Penn” can make orthographic distinctions between homophones and mark certain words as particularly salient while at the same time preserving the correct phonemic structure. There are many distinctions that could be represented in the orthography by systematically selecting different spellings of a phoneme or phonemic sequence. Although the choice between single and double letters might be the most obvious method, others are available. For instance, word-final /k/ could be represented by “k” as in “kiosk” or “que” as in “burlesque.” Word-final /m/ can be spelled “m” as in “velum” or “mb” as in “succumb.” My students and I have recently argued that the longer versions of such alternatives are used to represent lexical stress. Analyses of the English vocabulary have revealed that syllables ending in spellings like “que,” “mb,” and various letter doublings are more likely to be stressed than syllables ending in “k,” “m,” and various letter singletons (Verrekia 1996; Verrekia and Kelly 1996). Subsequent experiments documented that literate English speakers have learned these relationships and might use them in reading. For example, subjects are more likely to pronounce disyllabic pseudowords with iambic stress if they are spelled “fofvesque,” “zertumb,” or “filrass” rather than “fofvesk,” “zertum,” or “filras” (Verrekia and Kelly 1996). Furthermore, disyllabic real words whose spelling patterns are consistent with their stress patterns show advantages in naming over words that have inconsistent relations between these domains. Thus trochaic words like “pellet” and iambic words like “dinette” are named more quickly and accurately than trochaic words like “palette” and iambic words like “duet” (Kelly, Morris, and Verrekia 1998).
Putting some Oberon into Cognitive Science
315
Although we have claimed from such results that English spelling can directly encode lexical stress, Lila Gleitman has often countered that many of the spelling patterns that we have studied correspond with morphemes. Since the morphemic structure of a word has clear and well-documented effects on stress (see Gleitman and Rozin 1977 for review), one might say that English spelling only affects stress indirectly through its representation of morphemes. For example, “ette” represents a morpheme meaning small or diminutive. Furthermore, this morpheme is usually stressed. Hence when readers encounter a pseudoword like “rinvette,” the actual morpheme nested within it is recognized and its typical stress level assigned. No direct link between orthography and stress needs to be proposed. Although the morphemic account works well for spelling patterns like word-final “ette” and “ee,” it has difficulty with other cases. For example, word-final /o/ is typically stressed when it is spelled as “eau” rather than “o,” but “eau” is not a morpheme. Consider also the morphemes /™bl/, meaning capable of a specified action, and /Ins/, meaning a state or condition. The former can be spelled using “able” or “ible” whereas the latter can be spelled with “ance” or “ence.” There is no known difference in meaning associated with the spelling alternatives, and yet Verrekia (1996) has shown that they do have consequences for stress. For example, she found in a dictionary analysis that 65% of trisyllabic words ending in “ance” had stress on the second syllable whereas 67% of trisyllabic words ending in “ence” had stress on the first syllable. I could cite other evidence for a direct link between spelling and stress in English, but in many ways the clearest and most interesting example can be found in early editions of Milton’s Paradise Lost. English spelling in the seventeenth century was still far from standardized (Brengelman 1980), and hence texts from this and earlier periods often contain multiple spellings of a particular word. Milton’s works are no exception, and so early editions of Paradise Lost have alternations like “he-hee,” “me-mee” and “star-starr.” However, the variability in spelling choice is not random. Rather, the longer version of each pair is more likely to appear in stressed positions in Milton’s verse (Darbishire 1952). For example, I surveyed all instances of “he” and “hee” in an electronic version of the first edition of Paradise Lost.1 Since the poem was written in iambic pentameter, the pronoun was considered stressed if it appeared in even syllable positions and unstressed if it appeared in odd syllable positions. Whereas “hee” appeared in stressed positions 61% of the time, “he” occurred in such positions only 27% of the time. Similar patterns can be found in other alternations. Thus “mee” and
316
Michael Kelly
“starr” occurred in stressed positions 77% and 95% of the time respectively. In contrast, their shorter versions “me” and “star” occurred in stressed positions 41% and 67% of the time. These spelling differences clearly do not reflect morphemic differences but creatively link spelling to metrically strong positions in verse. This systematic relation between stress and spelling could be used to examine more fine-grained aspects of Milton’s meter. In general, however, literature scholars have not performed detailed analyses of Milton’s spelling variations because it is possible that their source is not the poet himself, but his printers. Although Darbishire emphasizes the meticulous care with which Milton handled the publication of his works, Adams (1954) responds sarcastically, “This hypothesis [that Milton was involved intimately in selecting between spelling options] puts blind Milton, his amanuenses, and his manuscript in the middle of a busy printshop, adding and subtracting e’s, changing small letters to caps and vice versa, altering spellings, correcting type fonts, and breaking in upon the sweaty printers as the sheets were being run off, to loosen the forms and drag out or insert tiny bits of inky lead” (p. 87). More generally, authors in Milton’s time simply did not follow the typesetting of their manuscripts with much diligence or even concern. Furthermore, given Milton’s blindness at the time, he is more likely to have proofheard rather than proofread Paradise Lost. In considering the spellings of words in Paradise Lost, we should not become excessively distracted by who precisely added an “e” or doubled an “r.” Suppose, for the sake of argument, that the printers and not the author were responsible for the spelling variants in Paradise Lost. One could still argue that Milton was their ultimate source. In particular, after reading thousands of lines of Milton’s verse, the printers may have abstracted schematic knowledge of his meter. This knowledge might then have subtly influenced spelling choices. If so, then we could still use spelling variability to infer characteristics of iambic pentameter in general and Milton’s use of it in particular. For example, when “hee” does appear in unstressed positions, its distribution is not random. Instead, it occurs most often in the first syllable of a line. This position makes sense given that the opening beat in iambic meter is more likely to be stressed than other odd locations (Newton 1975). As another example, consider (3): (3) Thine shall submit, hee over thee shall rule. (PL IX.196) Even though “hee” occurs in a position that is typically unstressed in iambic pentameter, the longer spelling may have been chosen because of the contrast with “thee,” and such contrasting situations are associated with prosodic prominence (Selkirk 1984).
Putting some Oberon into Cognitive Science
317
In sum, spelling variability should not necessarily be judged derogatively, as a sign of sloppiness in the orthography or its users. English orthography can and does encode more than phonemic information. Indeed, its flexibility allows one to represent morphology, stress, salience, gender,2 and perhaps other factors without sacrificing its ability to represent segmental phonology. Consequently, systematic variability (i.e., creativity) in spelling, both synchronically and diachronically, could be a rich source of evidence for testing diverse hypotheses about language structure and use. A Verb by Any Other Name? Toward the end of an especially festive affair in Lower Merion, I overheard a guest say, “They sure out-gleitmaned themselves this time,” meaning that the hosts had surpassed their own benchmark standards for throwing parties that illustrate every chapter in a psych 1 text: Sensation, learning, social cognition, maybe even psychopathology and its treatment with food, wine, and engaging company. These events have also spawned a large catch of linguistic novelties, such as the use of “Gleitman” as a verb. Extending the usage of a word into another grammatical class is a common form of lexical innovation in English, as Clark and Clark (1979) documented in their classic study of denominal verbs. For example, nine of the top twenty animal nouns in Battig and Montague’s (1969) category dominance norms have verb uses listed in The American Heritage Electronic Dictionary. However, mere frequency is not necessarily a sign of unprincipled promiscuity. As Clark and Clark first showed, many factors can influence the likelihood with which a word will join another grammatical class. For instance, nouns seem to be blocked from developing verb uses if their new meaning would be synonymous with an existing verb. Thus many vehicle terms are used as verbs to mean “to travel by X,” where X is the vehicle. However, despite its high noun frequency, “car” has not acquired a verb usage. Clark and Clark argued that “car” has been kept out of the verb category because its most straightforward verb meaning would be synonymous with “drive,” and speakers have a bias against the existence of synonyms. Most investigations of grammatical category extensions have focused on semantic and pragmatic factors that constrain their use (e.g., Clark and Clark 1979; Kelly 1998). This orientation is consistent with more general work on nouns and verbs that emphasize their semantic differences (e.g., Langacker 1987; Pinker 1989). However, analyses of the English lexicon have shown that these classes can also contrast phonologically. Thus English nouns and verbs differ in stress patterns, vowel
318
Michael Kelly
distributions, and the number of syllables they contain (see Kelly 1992, for review). These distinctions are so informative that formal classification models can learn to assign words to the noun and verb categories with high accuracy using only phonological information (Kelly, in preparation). I will focus here on a stress difference between English nouns and verbs and examine its implications for denominal verb and deverbal noun formation. Whereas the vast majority of disyllabic English nouns have first-syllable, or trochaic, stress, most verbs have second-syllable, or iambic, stress. This contrast can be best illustrated by contrasting the stress patterns of certain noun-verb homographs like “record,” “contest,” and “permit.” In all cases where noun and verb homographs differ in stress, the noun version has a trochaic pattern and the verb version has an iambic pattern (Sherman 1975). Many studies have shown that native speakers (and for that matter, nonnatives; Davis and Kelly 1997) have implicitly learned the nounverb stress difference. Most relevant here is a study in which subjects listened to a series of disyllabic pseudowords that varied in stress (Kelly 1988). After hearing each word, the subjects were asked to use it in a sentence. The stress patterns of the pseudowords affected the grammatical roles to which they were assigned in the sentences. In particular, iambic words were more likely to be used as verbs rather than nouns. Thus the phonological structure of a word draws it toward a particular grammatical class. When applied to grammatical category extensions, this conclusion leads to the prediction that a word should be more likely to develop a use in a new grammatical class if it has phonological properties typical of that class. In terms of the noun-verb stress difference, one would predict that iambic nouns should be more likely than trochaic nouns to develop verb uses. In contrast, trochaic verbs should be more likely than iambic verbs to develop noun uses. Both predictions were confirmed in an historical analysis of English denominal verb and deverbal noun formation (Kelly 1988). Furthermore, the diachronic survey was translated into an experiment with current English speakers. Subjects were presented with pairs of disyllabic nouns that lacked verb uses in English and disyllabic verbs that lacked noun uses. One member of each pair had trochaic stress and one had iambic stress, with some other factors controlled. For example, the noun pairs were drawn from the same category (e.g., universities) and did not differ in prototypicality or word frequency. Subjects were asked to select one member of each noun pair and use it as a verb in a sentence and one member of each verb pair and use it as a noun in a sentence. Knowledge of the noun-verb stress difference affected their choices, as iambic nouns and trochaic verbs were se-
Putting some Oberon into Cognitive Science
319
lected for grammatical transfers more often than trochaic nouns and iambic verbs. For instance, subjects were more likely to say “I cornelled for my degree” rather than “I dartmouthed for my degree,” and “I did a grovel for a grade” rather than “I did a beseech for a grade.” Based on such findings, I would predict that “gleitman” should not sound particularly melodious as a verb, however apt in meaning. Word Blends In 1911, a cartoonist for the Minneapolis Tribune created a new word “donkephant” by combining parts of “donkey” and “elephant.” However amusing the thought might be, this wordsmith was not referring to the offspring of a probably uncomfortable liaison. No, the reference was to that dreaded and all too real chimera: A politician whose views don’t seem to distinguish between the Democratic and Republican Parties (Pound 1914; aka “republicrat”). English contains hundreds of blend words like “donkephant,” such as “smog” (“smoke” + “fog”), “Jacobethan” (“Jacobean” + “Elizabethan”) and, newly minted for this occasion, “Gleitschrift” (“Gleitman” + “festschrift”).3 However, linguists have had little to say about factors that might influence blend structure. For example, one could just as well say “eledonk” instead of “donkephant” or “foke” instead of “smog.” Idiosyncratic aspects of blends could certainly be relevant to their structure. Thus Lewis Carroll may have chosen “mimsy” rather than “flimserable” because this blend of “miserable” and “flimsy” created a more euphonic rhythm for the line “All mimsy were the borogoves.” However, one could still ask whether any general principles could explain why existing forms won out over other alternatives. Bauer (1983), for example, recognized that some blends are probably blocked because they would be homophonous with existing words. Thus “damn” and “hang” combined to form “dang” rather than “hamn” because the latter could be confused with “ham.” Other than this general bias against making confusions with existing words, however, Bauer (p. 235) states that blend formations are “random” and “fairly arbitrary.” In this section, I will present evidence that certain patterns in blends can be predicted if we think of them as contractions of conjunctive phrases. Thus “fratority” and “jazzercise” are contracted forms of “fraternity and sorority” and “jazz and exercise.” On first inspection, the structure of conjunctive phrases seems as arbitrary as that of blends. In particular, from the standpoint of grammar, word order in conjuncts can vary freely. Thus both “Henry and Lila” and “Lila and Henry” are equally grammatical (Gleitman 1965). However, analyses of large corpora of conjuncts have revealed that certain word order patterns are
320
Michael Kelly
more common than others (Cooper and Ross 1975; Kelly 1986). In particular, words with certain phonological and semantic characteristics tend to appear first in conjuncts. For example, the first elements of conjuncts tend to contain fewer syllables and denote more prototypical objects than the second elements of conjuncts. Thus phrases like “salt and pepper” and “apple and lemon” are more common than phrases like “pepper and salt” and “lemon and apple.” Bock (1982) has induced the following generalization from these patterns: The first elements in conjuncts tend to be more accessible in memory than the second elements. This difference reflects a speech production strategy to produce words in the order in which they are retrieved from memory, within the constraints imposed by grammar. Since grammar imposes few constraints on word order in conjuncts, it is fairly easy to see the effects of memory accessibility here. However, accessibility can also affect more complex structures, like the choice of active over passive voice and prepositional over double object datives (Bock and Warren 1985). This analysis could be extended to the order of elements in blends. Thus “smog” may have had an advantage over “foke” because “smoke” is a more frequent word than “fog,” and frequency is directly related to accessibility. Similarly, “donkephant” may have won out over “eledonk” because “donkey” contains fewer syllables than “elephant.” In order to examine the relation between these accessibility variables and blend structure, I supplemented Pound’s (1914) collection of blends with a set obtained by searching the electronic version of the Oxford English Dictionary. This search was conducted by retrieving all words that contained “blend” or “portmanteau” in their definitions. Note that the resulting list was not exhaustive because many blends did not have these search words in their entries, but there was no other systematic way to sift these remaining blends out from other words. Blends were excluded from the corpus if they involved more than two words (e.g., “compushity” is composed of “compulsion,” “push,” and “necessity”) or if they could not be sensibly expanded into conjunctive phrases. For instance, “Westralia” is based on the adjective-noun phrase “West Australia,” and the early appearance of the adjective in the blend was more likely driven by grammatical constraints than frequency or syllable number. The words that composed each of the remaining 320 blends were scored for their syllable numbers, word frequencies (Francis and Kucera 1982), and whether they appeared first or second in their respective blends. Based on the analogy with word order in phrases, I predict that shorter and more frequent words should be cannibalized for the first part of the blends. Both predictions were supported as the words represented early in the blends averaged 2.2 syllables and 40.1 occur-
Putting some Oberon into Cognitive Science
321
rences per million words whereas the words represented later averaged 2.7 syllables and 14.8 occurrences per million (syllable number: t(319) = –8.33, Word frequency: t(319) = 3.99, with raw frequencies converted to natural log values; both ps < 0.0001 two-tailed). One problem with this initial analysis is that syllable number and word frequency are not independent in that shorter words tend to have higher frequencies (Zipf 1935). In order to examine word frequency separately from syllable number, blends were only included if their constituent words contained the same number of syllables. The first elements of blends were still more frequent than the second elements (t(116) = 2.34, p < .03 two-tailed). Syllable number could not be examined by using blends whose constituents were equal in frequency because there were very few blends of this type. So, syllable number was separated from frequency by analyzing blends if the frequency of the second element was greater than or equal to the frequency of the first element. Even with word frequency controlled in this way, blends typically placed the shorter word before the longer word (t(148) = –4.48, p < .001, two-tailed). In sum, this analysis demonstrates that general aspects of blend structure can indeed be predicted by psycholinguistic principles that are broad enough to affect other aspects of language, such as word order. However, it will be difficult to test more detailed hypotheses using naturally occurring blends because of likely confounds between variables of interest. One could imagine, however, taking blend formation into the laboratory by asking subjects to construct blends from properly controlled words or pseudowords, such as “Theseus” and “Oberon” or “Claire” and “Ellen.” Rhyme Patterns in Child Verse Throughout the world, children chant little poems while they jump rope or choose who’s “it” in games like kick-the-can and tag (see Abrahams and Rankin 1980; Opie and Opie 1959, for review). A wellknown example of the latter class of “counting-out” verse is (4): (4) One potato, two potato, three potato, four; Five potato, six potato, seven potato, more. One of the most interesting aspects of these poems is that they are part of an oral tradition, and hence must be recited from memory. One would therefore expect such poems to be structured in ways that would ease recall. For example, there are many historical and geographical variants of “eeny, meeny, miney, mo,” which is the first line of the most common counting-out poem among English speaking children around
322
Michael Kelly
the world. However, all of these variants preserve the line’s regular rhythmic pattern, assonance, and alliteration. Thus versions include “eena, deena, dina, doe” but not “eeny deena miney moe” (Rubin 1995). Owing to its greater use of poetic devices, the former line has a more predictable structure, which should aid recall. Indeed, the most common form of the entire poem makes the greatest use of poetic devices (Kelly and Rubin 1988). In this section, I will exploit our knowledge of human memory to propose hypotheses about the rhyme patterns in jump rope and countingout poems. In particular, I will assume that rhyming words in oral poetry share some properties with paired associates in that the successful retrieval of the first word in a rhyme pair cues recall for the second word. Under this description, the first word can be considered a stimulus for retrieval of the response word. If so, then factors that increase the effectiveness of recall cues should cluster primarily on the first word in a rhyme pair. Ideally, such factors should also increase the intrinsic memorability of the first word since, after all, a cue is useless if it is not available. To illustrate this idea in a relatively pure form of paired associate learning, consider an experiment by Paivio, Smythe, and Yuille (1968). Subjects first studied a set of word pairs and then, in the recall phase, had to provide the “response” member of a pair when prompted with the “stimulus” member. The stimulus and response words could either be high or low in rated imagery. Recall was best for the condition in which both stimulus and response words were highly imageable and worst for the condition in which both words were poorly imageable. This finding replicates many experiments that show memory advantages for words rated high in imagery. Of most relevance here, however, are the mixed conditions in which one word of the paired associate was high imagery and the other low imagery. Recall scores were significantly better when the stimulus word was high imagery and the response word was low imagery than vice versa. High imagery words are therefore better recall cues than low imagery words. When applied to counting out and jump rope poems, these findings lead to the prediction that the first member of rhyme pairs should be higher in imagery than the second member, as in (5). (5) As I went up the brandy hill I met my father with good will. More generally, first rhymes should have characteristics that increase memory accessibility. This hypothesis was tested by examining two such variables: Imagery and syllable number. As discussed in the section on word blends, syllable number is inversely related to accessibil-
Putting some Oberon into Cognitive Science
323
ity. Hence, the first word in a rhyme pair should tend to contain fewer syllables than the second word in the pair, as in (6): (6) A bottle of pop, big banana We’re from southern Louisiana. All rhymes consisting of noun pairs like “rat-cat” or “boat-petticoat” were recorded from corpora of jump rope (Abrahams 1969) and countingout poems (Abrahams and Rankin 1980). The analysis was restricted to noun pairs because of the definition of imagery given below. Also, since variables like syllable number (Cassidy and Kelly 1991) are associated with grammatical class, the use of mixed grammar pairs like “meadowgrow” could introduce undesirable confounds into the results for the syllable variable. The overall survey consisted of 231 jump rope and 221 counting-out rhyme pairs. The analyses combined results from both corpora to increase statistical power. However, the same patterns of results appeared in both the jump rope and counting-out samples. Since only a small proportion of the words were listed in imagery norms (e.g., Paivio, Yuille, and Madigan 1968), a very general, binary definition of imagery was used to classify each word into either a high or low imagery category. In particular, if physical object predicates like “is red” could be applied to a particular word sensibly, then that word was classified as high imagery. If such predicates could not be applied, then the word was considered low imagery. Note that “sensibly” does not mean “truthfully.” Thus the statement “Milk is red” is literally false for unadulterated milk, but the attribution is sensible since milk does have a color. Examples of words that fit the criterion for high imagery are “milk,” “fork,” “door,” and “belly.” Examples of low imagery words are “truth,” “duty,” “prayers,” and “noise.” The words in most rhyme pairs had the same imagery value, namely high. However, when the words differed in imagery, the first word was high imagery and the second low imagery 62% of the time (53 out of 86 cases), which was significantly greater than chance (z = 2.12, p < 0.05). The results with the syllable number variable also supported the memory accessibility hypothesis. As in the case with imagery, the words in the rhyme pairs generally contained the same number of syllables, as in “bed-head” and “tomato-potato.” However, when rhyme pairs contained words that differed in syllable length, the shorter word tended to be first, as in “melon-persimmon” and “wine-turpentine.” This pattern of short word before long word occurred 58 times whereas the reverse occurred only 21 times (z = 3.98, p < 0.01). In sum, oral traditions of poetry and storytelling offer a rich domain for studying memory in a naturalistic setting and for examining how
324
Michael Kelly
memory requirements could affect the structure of such forms of creative cognition (see Rubin 1995, for more details). Indeed, analyses of such traditions have been well represented in volumes that examine memory at work outside the laboratory (e.g., Neisser 1980). However, these analyses have focused almost exclusively on adult traditions, such as oral poetry in the Balkans (Lord 1960) or oral history in Liberia (D’Azevedo 1962). Child verse, such as counting-out poetry, has been relatively ignored even though these poetic forms are apparently universal, part of oral traditions, and, most importantly for research purposes, well documented by anthropologists. Large corpora of these poems are available for analysis, and as this section and other research (Rubin 1995; Kelly and Rubin 1988) show, specific hypotheses about their structure can be motivated by psychological principles and tested. Conclusion My concluding remark is simply to thank Henry and Lila Gleitman for the wealth of helpful contributions they have made to my research and, more importantly, to that of my students over the years. They exemplify the honored goals of life in the Academy: to learn and to teach with devoted reason and passion. The Greeks have a word for their temperament: arete. Notes 1. I conducted my own counts because Darbishire did not provide detailed results of her investigation. 2. For example, word final /i/ is sometimes spelled “y” in male names but “ie” in female names. Thus English has contrasts like “Billy” and “Billie.” 3. It is not entirely clear that “gleitschrift” involves blending of whole words or morpheme compounding at the sublexical level. In particular, the frequent use of words like “Gleitpeople” and “Gleitfest” in certain circles may have led to the extraction of a new morpheme “gleit” just as “scape” was extracted from the original Dutch, borrowing “landscape” to form “cityscape” and “seascape” (Algeo 1977). So, is the word “Gleitscape,” meaning the intellectual world from the Gleitman perspective, a blend of “Gleitman” and “landscape” or a concatenation of the morphemes “Gleit” and “scape?” Unfortunately, the issue cannot be decided in a short gleitnote.
References Abrahams, R. D. (1969) Jump-Rope Rhymes. Austin: University of Texas Press. Abrahams, R. D. and Rankin, L. (1980) Counting-Out Rhymes: A Dictionary. Austin: University of Texas Press. Adams, R. M. (1954) The text of Paradise Lost: Emphatic and unemphatic spellings. Modern Philology 52:84–91. Algeo, J. (1977) Blends, a structural and systemic view. American Speech 52:47–64.
Putting some Oberon into Cognitive Science
325
Battig, W. F. and Montague, W. E. (1969) Category norms for verbal items in 56 categories: A replication of the Connecticut category norms. Journal of Experimental Psychology 80(3):1–46. Bauer, L. (1983) English Word-Formation. Cambridge: Cambridge University Press. Bock, J. K. (1982) Toward a cognitive psychology of syntax: Information processing contributions to sentence formulation. Psychological Review 89:1–47. Bock, J. K. and Warren, R. K. (1985) Conceptual accessibility and syntactic structure in sentence formulation. Cognition 21:47–67. Brengelman, F. H. (1980) Orthoepists, printers, and the rationalization of English spelling. Journal of English and German Philology 79:332–354. Carney, E. (1994) A Survey of English Spelling. London: Routledge. Cassidy, K. W. and Kelly, M. H. (1991) Phonological information for grammatical category assignments. Journal of Memory and Language 30:348–369. Clark, E. V. and Clark, H. H. (1979) When nouns surface as verbs. Language 55:767–811. Cooper, W. E. and Ross, J. R. (1975) World order. In Papers from the parasession on functionalism, ed. R. E. Grossman, L. J. San, and T. J. Vance. Chicago: Chicago Linguistic Society, 63–111. Darbishire, H. (1952) Milton’s Poetical Works. Oxford: Oxford University Press. Davis, S. M. and Kelly, M. H. (1997) Knowledge of the English noun-verb stress difference by native and nonnative speakers. Journal of Memory and Language 36:445–460. D’Azevedo, W. L. (1962) Uses of the past in Gola discourse. Journal of African History 3:11–34. Francis, W. N. and Kucera, H. (1982) Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton-Mifflin. Fowler, C. A. and Housum, J. (1987) Talkers signaling of “new” and “old” words in speech and listeners’ perception and use of the distinction. Journal of Memory and Language 26:489–504. Gleitman, L.R. (1965) Coordinating conjunctions in English. Language 41:260–293. Gleitman, L. R. and Gleitman, H. (1970) Phrase and Paraphrase. New York: Norton. Gleitman, L. R., Gleitman, H., Miller, C., and Ostrin, R. (1996) Similar, and similar concepts. Cognition 58:321–376. Gleitman, L. R., Gleitman, H., Landau, B., and Wanner, E. (1988) Where learning begins: Initial representations for language learning. In Linguistics: The Cambridge survey. Vol. 3: Language: Psychological and biological aspects, ed. F. Newmeyer. Cambridge: Cambridge University Press. Gleitman, L. R. and Rozin, P. (1977) The structure and acquisition of reading I: Relations between orthographies and the structure of language. In Toward a Psychology of Reading: The Proceedings of the CUNY Conferences, ed. A. S. Reber and D. L. Scarborough. Hillsdale, NJ: Erlbaum. Kelly, M. H. (1986) On the selection of linguistic options. Unpublished doctoral dissertation, Cornell University. Kelly, M. H. (1988) Phonological biases in grammatical category shifts. Journal of Memory and Language 27:343–358. Kelly, M. H. (1989) Review of Phonetics and Phonology: Volume 1: Rhythm and Meter. Language and Speech 32:171–178. Kelly, M. H. (1992) Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review 99:349–364. Kelly, M. H. (1998) Rule and idiosyncratically derived denominal verbs: Effects on language production and comprehension. Memory and Cognition 26:369–381. Kelly, M. H., Morris, J., and Verrekia, L. (1998) Orthographic cues to lexical stress: Effects on naming and lexical decision. Memory and Cognition 26:822–832.
326
Michael Kelly
Kelly, M. H. and Rubin, D. C. (1988) Natural rhythmic patterns in English verse: Evidence from child counting-out rhymes. Journal of Memory and Language 27:718–840. Kiparsky, P. (1975) Stress, syntax, and meter. Language 51:576–616. Kiparsky, P. (1977) The rhythmic structure of English verse. Linguistic Inquiry 8:189–247. Langacker, R. W. (1987) Nouns and verbs. Language 63:53–94. Levelt, W. J. M. (1989) Speaking: From Intention to Articulation. Cambridge, MA: MIT Press. Lord, A. B. (1960) The Singer of Tales. Cambridge, MA: Harvard University Press. Meier, S. F. and Gleitman, H. (1967) Proactive interference in rats. Psychonomic Science 7:25–26. Neisser, U. (1982) Memory Observed: Remembering in Natural Contexts. San Francisco: W. H. Freeman. Newton, R. P. (1975) Trochaic and iambic. Language and Style 8:127–156. Opie, I. and Opie, P. (1959) The Lore and Language of Schoolchildren. London: Oxford University Press. Paivio, A., Smythe, P. C., and Yuille, J. C. (1968) Imagery versus meaningfulness of norms in paired-associate learning. Canadian Journal of Psychology 22:427–441. Paivio, A., Yuille, J. C., and Madigan, S. A. (1968) Concreteness, imagery, and meaningfulness values for 925 nouns. Journal of Experimental Psychology Monograph Supplement 76, part 2. 1–25. Pinker, S. (1989) Learnability and Cognition. Cambridge, MA: MIT Press. Pound, L. (1914) Blends: Their Relation to English Word Formation. Heidelberg: Carl Winter’s Universitätsbuchhandlung. Rubin, D. C. (1995) Memory in Oral Traditions: The Cognitive Psychology of Epic, Ballads, and Counting-Out Rhymes. New York: Oxford University Press. Selkirk, E. O. (1984) Phonology and Syntax. Cambridge, MA: MIT Press. Sherman, D. (1975) Noun-verb stress alternation: An example of lexical diffusion of sound change. Linguistics 159:43–81. Verrekia, L. (1996) Orthography and English stress. Unpublished doctoral dissertation, University of Pennsylvania. Verrekia, L. and Kelly, M. H. (1996) Orthographic information for lexical stress in English. Unpublished manuscript. Zipf, G. K. (1935) The Psycho-Biology of Language: An Introduction to Dynamic Philology. Boston: Houghton-Mifflin.
Chapter 19 The Organization and Use of the Lexicon for Language Comprehension John C. Trueswell Our intuitions tell us that language comprehension is an incremental and integrative process. As we read or listen to a sentence, we have the strong sense that we are constantly updating our estimation of the intended meaning of the utterance, perhaps on a word-by-word basis. In addition, we make these rapid decisions by integrating a wide range of knowledge, including grammatical knowledge of the language, “referential” knowledge about what the expressions refer to in the world, and even pragmatic and semantic knowledge about what is plausible or likely given the situation. One of the best illustrations of the incremental nature of language comprehension comes from the so-called garden-path effect, which can sometimes occur when a reader or listener is faced with a temporarily ambiguous phrase. For instance, temporary syntactic ambiguities can be found in the following sentence fragments, which are highlighted by examples of possible continuations. (1) Henry forgot Lila . . .1 (a) . . . at her office. (direct object interpretation) (b) . . . was almost always right. (sentence complement interpretation) (2) The man awarded the prize . . . (a) . . . to his friend and colleague of many years. (main clause interpretation) (b) . . . was deeply moved by the honor. (reduced relative clause interpretation) In the first example, the noun phrase “Lila” could be the direct object of the verb, as in (1a), or the subject of an embedded sentence, as in (1b). In the second example, the entire fragment could make up a main clause, as in (2a), in which case the man is doing the awarding. Or, the phrase “awarded the prize” could be modifying “The man” as a reduced relative clause, in which case the man is being awarded (2b). When faced with syntactic ambiguities like these, readers and listeners show clear
328
John C. Trueswell
signs of incremental interpretation in that they tend to pick a single interpretation at the point of ambiguity. Evidence for this comes from the fact that readers and listeners show systematic preferences, which need to be revised when incorrect (see, e.g., Bever 1970; Frazier and Fodor 1978). This revision (or garden-path) effect is revealed by increases in processing difficulty, such as long fixation times and regressive eye movements in reading (Frazier and Rayner 1982). For instance, readers prefer the direct object interpretation in examples like (1), resulting in difficulty with (1b). And, readers prefer the main clause interpretation in examples like (2), resulting in difficulty with (2b). Although garden-path effects illustrate the incremental nature of interpretation, there has been considerable debate over whether readers’ and listeners’ initial decisions about ambiguous phrases are the result of integrative processes. For instance, one could argue that these decisions need to happen so quickly that only a subset of the most highly relevant information is initially consulted. Knowledge about the details of how particular words combine together (e.g., verb argument structure), as well as semantic and pragmatic knowledge, may either be too slow to access or too difficult to deal with during the rapid flow of incoming speech or text. Advocates of this approach have proposed that only basic syntactic knowledge (e.g., major category information and phrase structure rules) is used to structure the input, and that a decision metric of some type is used to select among ambiguous structures, for example, pick the simplest structure (see, e.g., Frazier 1989), or pick the most common structure (see, e.g., Mitchell, Cuetos, Corley, and Brysbaert 1995). Support for an encapsulated syntactic processor of this type has come from studies suggesting the existence of garden-path structures (e.g., a more complex or a less common syntactic alternative), which, when presented, always cause a garden path, regardless of the presence of biasing lexical or contextual information (see, e.g., Ferreira and Clifton 1986; Rayner, Carlson, and Frazier 1983). These studies have been appealing to those who support modular approaches to language and cognition, especially given the existence of neurological data indicating a dissociation between syntactic and semantic processing (see, e.g., Levy 1996; Schwartz, Marin, and Saffran 1979; Hodges, Patterson, and Tyler 1994; but cf. Bates, Harris, Marchman, Wulfeck, and Kritchevsky 1995). Alternatives to Encapsulated Parsing A number of recent experimental findings have, however, drawn into question the basic assumptions behind an encapsulated structural stage of processing (e.g., Juliano and Tanenhaus 1994; Pearlmutter and
The Organization and Use of the Lexicon
329
MacDonald 1995; Taraban and McClelland 1988; Trueswell, Tanenhaus, and Garnsey 1994; Trueswell, Tanenhaus, and Kello 1993). Much of this work has focused on the use of lexical information, demonstrating that detailed syntactic and semantic information about individual words can have a rapid impact on parsing decisions. While space precludes a full description of these findings, it is important for this chapter to consider briefly two prior studies that I have conducted on this issue—one on lexically specific syntactic information, and the other on lexically specific semantic information. First, Trueswell, Tanenhaus and Kello (1993) looked at lexically specific syntactic constraints by examining how people dealt with the direct object / sentence complement ambiguity, as in example (1) above. We had people read ambiguous sentences that resolved toward the sentence complement alternative (e.g., “Henry forgot Lila was almost always right”). In this research, we compared two groups of verbs: DO-bias and SC-bias verbs, which differ in their tendency to be used with a direct object or sentence complement. DO-bias verbs permit a sentence complement, but have a strong tendency to be used with a direct object (e.g., “forgot”). SC-bias verbs tend to be used with a sentence complement and rarely use a direct object (e.g., “realized”). These tendencies were determined by syntactically analyzing how a separate group of participants used these verbs in a sentence production study. In the reading experiments, sentences with DO-bias verbs (e.g., “. . . forgot Lila was almost always right”) showed the typical garden-path effect (i.e., long fixations and regressive eye movements in the “disambiguating” region, “was almost always . . .”), suggesting that readers had incorrectly taken the noun as the direct object and were revising their commitment. Sentences with SC-bias verbs (e.g., “. . . realized Lila was almost always right”) showed no signs of difficulty in this region, suggesting that the noun was initially taken as the subject of a sentence complement. Thus specific syntactic knowledge about verbs was used quite rapidly to inform the decision about an ambiguous phrase. Likewise, Trueswell, Tanenhaus, and Garnsey (1994) found rapid use of lexically specific semantic information. This research examined the reading of ambiguous reduced relative clauses, like the second example above. It was found that the usual garden path associated with reduced relative clauses (e.g., “The defendant examined by the lawyer was unreliable”) could be eliminated when the initial noun was a poor subject and good object of the verb (e.g., “The evidence examined by the lawyer was unreliable”). What little difficulty that was observed with these items correlated with ratings of how plausible the noun was as the object (theme role) of the verb. Thus semantic information about what
330
John C. Trueswell
makes a good subject or object of a verb can also be used to inform the early stages of syntactic ambiguity resolution. These and other findings have helped to develop a “lexicalist” theory of sentence processing that emphasizes the integrative nature of interpretation (the constraint-based lexicalist theory; MacDonald, Pearlmutter, and Seidenberg 1994; Trueswell and Tanenhaus 1994). The framework assumes a constraint-based approach to ambiguity resolution (Marslen-Wilson and Tyler 1987; McClelland 1987), in which multiple sources of information can be used to converge on a single interpretation. The central claim of this approach is that word recognition includes the activation of rich lexical structures, including the parallel activation of lexically specific syntactic and semantic information (e.g., verb argument structure). Syntactic ambiguities hinge upon one or more of these lexical ambiguities, which define the initial set of possible interpretations. Frequency of usage determines the initial availability of information. Thus the grammatical information computed during word recognition determines the initial set of possible alternatives that contextual cues can support. To make this more concrete, consider the account for the DO/S ambiguity. When readers or listeners encounter a verb like “forgot,” the direct object (NP complement) and sentence complement structures would become active based on frequency. Just like an ambiguous word with multiple meanings can have dominant and subordinate senses, an ambiguous word can also have dominant and subordinate syntactic argument structures. If we estimate structural frequencies from the sentence production data of Trueswell et al. (1993), we can assume that the dominant structure for “forgot” is the NP complement, and the dominant structure for “realized” is the sentence complement. This asymmetry in availability of argument structure is the proposed source of the processing preferences observed in the reading study, in which readers prefer the DO interpretation for “forgot” and the SC interpretation for “realized.” The process of recognizing a verb also includes the activation of semantic information about the event denoted by the verb, including its thematic/conceptual roles. What is meant by this is that the semantic representation of an event includes knowledge about the possible participants of the event, as well as a mapping to the syntactic constituents of the verb (see, e.g., Carlson and Tanenhaus 1988). This type of structure permits an explanation of various semantic effects on parsing, like those found for the reduced relative clause (“The defendant/evidence examined . . .”). A verb like “examined” has two roles associated with it, the agent, who is doing the examining, and the theme, which is being examined. In active argument structures (like the main clause), the
The Organization and Use of the Lexicon
331
agent maps onto the NP preceding the verb, and the theme maps onto the NP following the verb. In passive structures (like the relative clause) the opposite pattern holds. If this information is available when recognizing a verb, it could serve as a mechanism for explaining the initial preference for the reduced relative over the main clause when the first noun is a good theme and poor agent (“The evidence examined . . .”). Thus the thematic information of a verb can play a central role in integrating conceptual and syntactic constraints on interpretation. Although the lexicalist theory is consistent with the findings described above, many of its central predictions have so far gone untested. For instance, there is little work that has demonstrated in a direct manner that the initial stages of recognizing a word include the activation of argument structure. Until quite recently, most studies examining the presence of verb argument structure during word recognition have relied upon secondary measures of processing load (e.g., Shapiro, Zurif, and Grimshaw 1987, 1989), and have found conflicting results (Schmauder 1991; Schmauder, Kennison, and Clifton 1991; Shapiro et al. 1987, 1989). In addition, these results have been inconclusive about whether the activation of argument structure, if it occurs during word recognition, is frequency based, showing signs of subordinate and dominant structures. Finally, others have suggested that rapid lexical effects on syntactic ambiguity, like those described above, may in fact be consistent with a structurally based system that permits extremely rapid revision of an initial, lexically blind stage of processing (Frazier 1995; Mitchell et al. 1995). In the remainder of this chapter, I will present experimental evidence that addresses these issues. Two different groups of results will be presented, both of which explore the relationship between lexical and syntactic ambiguity. In the first section, I’ll describe experiments that reveal how effects of lexically specific argument preferences proliferate in syntactic ambiguity resolution and interact with semantic constraints. In the second section, I will turn my attention to effects of word recognition on syntactic ambiguity resolution. I will present results that use a new lexical priming technique to examine whether the argument preferences of briefly displayed prime words (displayed for less than 40 msec) can have an impact on a reader’s syntactic decisions about temporarily ambiguous sentences. Lexical Frequency and Semantic Constraints According to the lexicalist theory, the initial availability of a word’s syntactic alternatives depends upon how often the reader or listener has encountered the word in each syntactic context. In addition, semantic/
332
John C. Trueswell
contextual information can come into play quite rapidly to help resolve possible ambiguities. The theory also predicts that these two sets of constraints interact in particular ways. For instance, processing difficulty should arise when these constraints are in conflict, as when semantic information supports a subordinate (less common) structure. Such an effect has already been observed for words with multiple senses (the “subordinate bias” effect; Rayner and Frazier 1989; Rayner, Pacht, and Duffy 1994; Sereno, Pacht, and Rayner 1992). In these studies, the left context of an ambiguous word supported the intended meaning of the word (as determined by the upcoming right context). Local increases in reading time occurred only when the context supported a subordinate meaning of a word. No increases were found when the context supported the dominant meaning of a word, or when the context supported one meaning of a “balanced” word that has two equally frequent meanings (Rayner and Frazier 1989; Rayner et al. 1994; Sereno, Pacht, and Rayner 1992). Similar effects of context interacting with lexical preference are expected for syntactic ambiguities. Consider again the semantic effects for the ambiguous reduced relative clause (“The defendant/evidence examined by the lawyer. . . ,” Trueswell et al. 1994), in which processing difficulty was eliminated when the noun was a poor agent (“evidence”). One might conclude from this finding alone that the presence of strongly biasing semantic information is sufficient for establishing an initial preference for the relative clause. However, the lexicalist account would expect that the effectiveness of a semantic constraint depends upon the availability of the appropriate structural alternative. It is well known that the reduced relative hinges upon an ambiguity involving the tense of the verb (“examined”). The “-ed” marker for most English verbs can indicate a past-tense verb in an active structure, such as the main clause, or a passive participle verb in a passive structure, such as the relative clause. (Compare with unambiguous verbs like “showed/ shown.”) Reading an ambiguous verb would provide partial activation for both the past-tense and participle forms of the verb. These alternatives would also activate corresponding argument structures (in this case, the main clause and relative clause) that are consistent with the syntactic context of a noun phrase followed by a verb. Thus there are two different types of frequency information predicted to play a role in this ambiguity. One is the overall frequency of the relative clause and main clause structures. This would result in an overwhelming preference for the main clause because a noun phrase followed by a verb+”ed” is almost always a main clause structure (Bever 1970 captured this in the NVN strategy). However, if structural information hinges upon the lexical properties of verbs, this overwhelming struc-
The Organization and Use of the Lexicon
333
tural frequency asymmetry should be moderated for verbs with high participle frequency. As participle frequency increases, there is likely to be an increase in the availability of the otherwise subordinate relative clause alternative. For example, in Francis and Kucera (1982) frequency counts reveal that “searched” is hardly ever used in a participle form whereas “accused” is frequently used in a participle form. So one might expect to find that semantic support for the relative clause would be more effective at eliminating difficulty when the relative clause contains a verb like “accused” than when it contains a verb like “searched.” To test these predictions, I reexamined the reduced relative eyetracking data reported in Trueswell, Tanenhaus, and Garnsey (1994; see Trueswell 1996) for effects of participle frequency. Indeed, on average, verbs used in the study had relatively high participle frequencies, perhaps explaining why semantic support for the relative clause (e.g., “The evidence examined . . .”) was in general so effective at eliminating processing difficulty (see also MacDonald et al. 1994). In addition, I found evidence that some of the variation in processing difficulty between items in this condition was predicted by variation in participle frequency. Regression analyses revealed that the initial processing difficulty for reduced relatives (as measured by first-pass reading times) negatively correlated with each verb’s participle frequency (r2 = 0.41, p < 0.05). In other words, contexts supporting the relative clause were much more effective at eliminating processing difficulty when the ambiguous verb was high in participle frequency. I have recently confirmed these findings in a series of reading studies that directly compared verbs with high and low participle frequency (Trueswell 1996). These studies held semantic support for the relative clause constant, while manipulating participle frequency. As expected, reduced relative clauses were more difficult to read when the verb was low in participle frequency than when the verb was high in participle frequency (a “subordinate bias” effect; see figure 19.1). Although the relative clause data are consistent with the lexicalist predictions for ambiguity resolution, one could argue that the findings only provide indirect evidence in support of this view. Specifically, one would expect that the frequency of a verb’s argument structures, not necessarily tense, determines the availability of syntactic forms. (Tense only indirectly estimates argument structure frequencies—see Trueswell 1996, for further discussion.) To address this issue, I examined how argument frequency affects the resolution of an ambiguity that does not depend upon tense (Trueswell, Kim, and Shapiro 1997). These experiments took advantage of Penn’s syntactically analyzed corpora of English Text (the Penn Treebank, Marcus, Santorini, and Marcinkiewicz 1993) to estimate a verb’s probability of appearing with particular
334
John C. Trueswell
Figure 19.1. Ambiguity effect for the reduced relative (Trueswell 1996; copyright by Academic Press).
arguments. These probabilities were then used to predict processing preferences in readers and listeners. The experiments examined a structural ambiguity that arises when an alternating dative verb is placed in a passive frame (e.g., “The woman was sent . . .”). The verb “sent” can allow a second noun-phrase argument, as in “The woman was sent some flowers,” in which case the woman is the recipient of the event. “Sent” can also allow a prepositional argument, as in “The woman was sent to the bank,” in which case the woman is the theme of the event. The ambiguity arises because “sent” is among a class of verbs called alternating datives, which have two competing syntactic structures for denoting the theme and recipient roles. The verbs can be used in the double object construction (as in the active sentence “Bill sent Susan the money,” or the passive sentence “Susan was sent the money”), in which there are two noun phrases as syntactic arguments of the verb. The verbs can also be used in prepositional dative constructions (e.g., “Bill sent the money to Susan,” “The money was sent to Susan”). Given this observation, one might expect that knowing how often “sent” takes a second noun-phrase argument or a prepositional argument could be very useful in determining the preferred interpretation of “The woman” when the verb is initially encountered in sentences like “The woman was sent . . . ”. In one experiment (Trueswell, Kim, and Shapiro 1997), a cross-modal integration technique was used to exam-
The Organization and Use of the Lexicon
335
Table 19.1 Mean Naming Latency to Target Word in Milliseconds Type of Target Context
Auditory Fragment
THE
TO
Recipient-biasing Theme-biasing
“The boy was mailed . . .” “The card was mailed . . .”
586 625
604 556
ine parsing commitments for the alternating dative. Participants heard auditory fragments that contained a noun that was a good recipient and poor theme (“The boy was mailed . . .”) or a good theme and poor recipient (“The letter was mailed . . .”). Good recipients semantically support the double object construction, whereas good themes support the prepositional dative. Immediately after hearing the fragment, the participants were visually presented with the word “the” or “to” to name aloud. The target word “the” is highly consistent with the double object construction, whereas the word “to” is highly consistent with a prepositional phrase argument. Prior research using this technique has demonstrated that naming latencies are longer to target words that are ungrammatical, or grammatically unexpected, continuations of the context (Cowart 1987; Tyler and Marlsen-Wilson 1977; Trueswell et al. 1993; West and Stanovich 1986). Naming latencies (shown in table 19.1) were consistent with the rapid use of semantic information, mediated by the initial availability of the argument structures. A reliable interaction was found between type of thematic fit (recipient, theme) and type of target (“to,” “the”). When the noun was a good recipient of the verb, a double object construction should be expected, and indeed, naming latencies in this condition were longer for “to” as compared to “the.” When the noun is a good theme of the verb, a double object construction should not be expected, and naming latencies in this condition should be longer for “the” as compared with “to.” Crucially, we expected these effects to depend upon the frequency of the verb argument structures. Again, keeping track of how often a verb appears in the double object construction could be quite useful in determining the appropriate thematic assignment of the initial noun phrase. A corpus analysis was therefore conducted to determine the frequency with which each verb appeared in the double object construction. The analysis revealed that double object frequency is in fact relatively low for verbs used in this study. Indeed, as seen in table 19.1, semantic support for the recipient role (recipient-biasing nouns) is not completely effective at reversing preferences for “to” over “the.” This is because the semantic constraint in this condition supports the subordinate syntactic
336
John C. Trueswell
alternative (a subordinate bias effect). It was expected that the effectiveness of the semantic support for the double object (the recipient-biased context) would vary continuously across verbs, with the most effective items being associated with verbs that have relatively high double object frequency. This was confirmed in a regression analysis, which paired naming latencies in this condition with each verb’s double object frequency. As expected, a reliable negative correlation was found between frequency and naming latencies (r2 = 0.22; p<0.05). A second experiment (also in Trueswell, Kim, and Shapiro 1997) found that similar patterns hold for ambiguity resolution during reading. Eye movements were monitored as subjects read sentences like “The woman was mailed the letter . . . ”. The first noun was always a good recipient and poor theme. In this study, two classes of verbs were directly compared: verbs that are high in double object frequency and verbs that are low in double object frequency. As expected, processing difficulty was found immediately after encountering a verb with low double object frequency, despite the presence of semantic information in support of this alternative. These results complement recent findings examining the comprehension of long-distance dependencies (e.g., “Which man/baseball did Bill toss . . .”), which find similar syntactic and semantic preference effects for alternating dative verbs (Boland 1997; Boland, Tanenhaus, Garnsey, and Carlson 1995). Taken together, the results suggest that both thematic and syntactic information associated with a verb is accessed and used quite rapidly during interpretation. Indeed, it seems likely that retrieval of this information during word recognition is needed to account for data indicating the early commitment to long-distance dependencies when the verb is first encountered (Boland et al. 1995; Boland 1997). Thus it appears that for at least three ambiguities, the DO/S ambiguity, the relative clause ambiguity, and the alternating dative ambiguity, clear signs of verb argument preference emerge. The availability of the syntactic properties of lexical items predicts processing difficulty and the initial effectiveness of semantic constraints. As with other lexical ambiguities, semantic support for an alternative is less effective when this information supports a subordinate alternative. Fast Lexical Priming of Argument Structure This section turns to research that provides perhaps the most compelling evidence to date that word recognition itself includes the parallel activation of possible argument structures, and that it is this information that determines initial availability of syntactic alternatives during syntactic ambiguity resolution. These studies take advantage of
The Organization and Use of the Lexicon
337
a new lexical priming technique, fast lexical priming, first introduced by Sereno, Rayner, and colleagues (Rayner et al. 1995; Sereno and Rayner 1992; Sereno 1995). The technique permits the examination of lexical priming during uninterrupted silent reading. In the eye-tracking version of this technique, fixation patterns are recorded as participants silently read text. When the eye lands on a critical target word, a prime word (of equal number of characters) appears in place of the target. The prime is displayed for a brief amount of time (the first 30–40 msec of the initial fixation), and is immediately replaced by the target word. This sequence appears as a “flicker” to the subject, with subjects rarely being able to identify a prime word. Analyses of fixation times have revealed reliable effects of the prime word’s orthographic, phonological, and semantic properties (Rayner et al. 1995; Sereno and Rayner 1992; Sereno 1995). For instance, fixations on a target word are faster when the target is preceded by a semantically related prime, as compared to a semantically unrelated prime (Sereno 1995). Similar patterns have been observed for orthographically and phonologically related prime words (see, e.g., Rayner et al. 1995). Taken together, these data are highly consistent with theories of word recognition that allow for the parallel activation of the orthographic, phonological, and semantic information associated with a letter string. A central prediction of lexicalist approaches to parsing is that word recognition also includes the parallel activation of rich grammatical information, in the form of possible syntactic complements for a word. If this is the case, the syntactic preferences associated with a briefly presented prime word ought to have a direct impact on a reader’s parsing preferences of a syntactically ambiguous phrase. To test these predictions, we have examined fast lexical priming effects for the direct object complement / sentence complement (DO/SC) ambiguity, as illustrated in the following example (Trueswell and Kim 1998). (3) The man accepted (that) the fire could not be put out. obtained (DO-prime) realized (SC-prime) Target sentences contained a main verb (e.g., “accepted”) followed by a sentence complement. Unambiguous sentence complements always began with the optional complementizer “that.” Ambiguous sentence complements did not contain the complementizer “that,” making the noun phrase “the fire” a potential direct object of the verb. The main verb (e.g., “accepted”) was always a verb that permits a sentence complement, but strongly prefers to appear with a direct object as its argument (i.e., DO-biased verbs, as confirmed by sentence production norms). The noun phrase (e.g., “the fire”) was always a poor object of
338
John C. Trueswell
the verb. Several reading studies (not involving fast-priming) have examined the reading of materials like these (e.g., Holmes, Stowe, and Cupples 1989; Garnsey, Pearlmutter, Myers, and Lotocky 1997; Trueswell et al. 1993). All of these studies have found large garden-path effects for DO-biased verbs when reading the ambiguous forms of these materials—consistent with the notion that readers initially pursued a direct object analysis of the noun phrase. For instance, Garnsey et al. (1997) found that when the optional complementizer “that” was absent, readers were surprised by the poor object “fire,” resulting in long reading times. Long reading times were also observed in the verb-phrase region (“could not be . . .”), suggesting that readers had difficulty retrieving the subordinate sentence complement argument structure. In the present study, a self-paced reading version of fast-priming was used. Prior to reading each sentence, the participant saw groups of equal signs (“=”) covering each character in the sentence. Each press of a button uncovered a word and replaced the previous word with equal signs. When participants reached the target verb, a prime word was displayed in the verb position, for exactly three screen cycles (39 msec). The prime word was then replaced by the target word, which remained on the screen until the next button press. This event was typically perceived as a flicker on the screen, with participants rarely identifying the prime word. Two different types of prime words were compared. DO-primes (e.g., “obtained”) were verbs that strongly prefer a direct object and do not permit a sentence complement. SC-primes were verbs that strongly prefer a sentence complement and rarely use a direct object. (Primes were matched for string length, overall frequency, and letter overlap with the target verb.) If the initial stages of word recognition include the activation of verb argument structures, one might expect that the subcategorization preferences of the “flicker” (the prime verb) would have a direct impact on the size of the garden path observed for these sentences. In particular, prime verbs that prefer direct objects (DO-primes) should induce a large garden-path effect, whereas prime verbs that prefer sentence complements (SC-primes) should reduce or eliminate the garden-path effect. Data from twenty-eight subjects were collected, and the magnitude of the garden-path effect is shown in figure 19.2.2 The differences between the ambiguous and unambiguous sentences are plotted, with positive numbers indicating increased reading times for ambiguous items. As can be seen in the figure, lexicalist parsing predictions were confirmed. The magnitude of the garden-path effect was much greater for DOprimes than SC-primes, resulting in a reliable interaction between ambiguity and prime type at the noun “fire,” and a marginal interaction at
The Organization and Use of the Lexicon
339
Figure 19.2. Ambiguity effect for the sentence complement (Trueswell and Kim 1998; copyright by Academic Press).
the disambiguating verb “could.” (Because differences are graphed, it is important to note that the effect of prime type is carried by ambiguous rather than unambiguous items, with a reliable effect of prime type occurring only for ambiguous items.) Thus there were robust effects of lexical priming on syntactic ambiguity resolution. DO-primes showed much larger garden-path effects than SC-primes. What makes this finding even more striking is that the experiment is comparing reading times to perceptually identical sentences across conditions. The only difference is whether a DO-prime or SCprime was flashed on the screen. Thus it is the subcategorization preferences of the “flicker” that are determining readers’ parsing preferences. To analyze in more detail the contribution of prime and target subcategorization preferences, corpus analyses were also conducted on all prime and target verbs, from the parsed text files of the Penn Treebank. We estimated the probability that each verb uses either a direct object, sentence complement, or some other argument structure. As can be seen in table 19.2, the probabilities confirm the various classifications of verbs. Space precludes a full discussion of the corpus data. However, note that evidence was found that the subcategorization preferences of both the prime and target verbs combined to predict the variation in garden-path effects between items. For instance, a simple averaging of a prime’s DO probability and a target’s DO probability reliably predicted garden-path effects at the disambiguating word “could,” with gardenpathing being largest for targets and primes that more strongly prefer direct objects (r=0.44, p<0.05).
340
John C. Trueswell
Table 19.2 Probability of the Direct Object (DO), Sentence Complement (S) Structures Type of Verb
DO-Comp
S-Comp
Other
DO-biased Target Verbs SC-Prime DO-Prime
0.55 0.12 0.84
0.23 0.41 0.00
0.22 0.47 0.16
Finally, note that given this experimental design, we do not know if DO-primes increased the garden-path effect, or SC-primes decreased the garden-path effect, or whether both contributed to the pattern. A second self-paced study was completed to answer this question. The same design was used; however, nonword primes (random letter strings) were also included in the design. In addition, we doubled the number of experimental items. The overall pattern was replicated: DOprimes showed large garden-path effects, and SC-primes showed little or no garden-path effects. Critically, nonword primes showed gardenpath effects that were in between these two classes of verbs, suggesting that DO-primes increased the garden-pathing and SC-primes decreased the garden-pathing. Summary A clear picture is emerging about the syntactic aspects of word recognition and their impact on incremental interpretation. The earliest stages of word recognition include the parallel activation of possible argument structures, and it is this information that determines initial availability of syntactic alternatives during ambiguity resolution. Perhaps more importantly, the data begin to explain how linguistic information is organized to provide rapid integration of different classes of information. Lexical information is arranged along partially independent stimulus dimensions (phonological, orthographic, semantic, and syntactic), which are relevant for the various ways that we use language. Each type of representation adheres to the same general processing principles, that is, information is made available in a probabilistic fashion and can be constrained by correlated information from other dimensions. The rapid effects of semantic constraints on syntactic ambiguity resolution are consistent with interactive processing mechanisms. However, the results regarding the availability and priming of argument structure suggest partially independent representation of syntax and semantics. This theoretical description highlights a distinction between “modular representation” and “modular processing” (Garnham 1985;
The Organization and Use of the Lexicon
341
Trueswell et al. 1994; Trueswell and Tanenhaus 1994). A modular encoding scheme can emerge when a system is faced with complex stimuli that contain partially independent regularities (i.e., information that can sometimes vary independently). For instance, the visual system has, to a first approximation, adopted this approach. Color and motion information can vary independently (red things can move up and down, for instance). It is therefore not surprising that this information is encoded along partially independent stimulus dimensions (color, motion, etc.). Within language comprehension, lexical items are also associated with distinct classes of information, which can vary independently and are only partially correlated. For instance, although it is clear that structures imply certain meanings, differences in meaning arise when these structures appear in particular contexts and with particular lexical items. It is therefore not surprising to find that the system has organized information along several dimensions. However, modular representation does not require modular processing. A system needs to develop consistent solutions across stimulus dimensions, and one efficient approach is to be highly sensitive to the correlations that exist and allow them to constrain ambiguous representations. Finally, it is interesting that these findings, which emphasize a close relationship between the grammar and the lexicon, tie in nicely with recent developments in computational linguistics, and in particular work here at Penn. As is the case in psycholinguistics, computational linguistics has seen an increased interest in lexicalized syntactic accounts (e.g., Bresnan and Kaplan 1982; Joshi, Vijay-Shanker, and Weir 1991; Pollard and Sag 1994; Steedman 1996) and a reemergence of statistical approaches to parsing (see Church and Mercer 1993; Marcus 1995). Lexicalized grammar formalisms include combinatory categorial grammars (CCGs), head-driven phrase-structure grammars (HPSGs), and lexicalized tree-adjoining grammars (LTAGs). It will be important in the upcoming years to bridge the (relatively small) gap between these linguistic formalisms and the current psycholinguistic theories of sentence processing. Closing Remarks It is not surprising that this body of research has required the coordination of several different disciplines: psychology, linguistics, and maybe even a little bit of computer science. Interdisciplinary research has become the norm in the study of language—an everyday thing. Indeed, Lila and Henry Gleitman have been at the forefront of developing this interdisciplinary approach. But, over the last two decades, we have seen essentially all the subdisciplines of psychology move in this
342
John C. Trueswell
direction. As Henry is fond of saying (quite dramatically of course), “These are changing times.” Psychology is becoming more interdisciplinary, more “biological,” more “computational.” Many people are concerned about whether the broadening of psychology and the blurring of its boundaries will have a helpful or detrimental effect on the field. However, attending this tribute to both Henry and Lila, and listening to their students, has taught me a valuable lesson about this. We can worry a little bit less about these changes, if we know that there are people involved in this process who care deeply about their students, their colleagues, the exchange of a good idea, and the exchange of a good joke, for that matter—people who think and care. Henry and Lila have done the field a great service by promoting these ideals in their students. The changing field called psychology is in better hands because they have made a difference in so many lives. Henry once told me that he didn’t feel that he had really earned his Ph.D. until a few years after he had received it. I know exactly what he means. I have had the opportunity of a lifetime by beginning my career at Penn. And, by watching Henry and Lila at work with their students, I have learned a great deal about what it means to be a teacher and a researcher. I thank them both for their hospitality, advice, and encouragement. Acknowledgments This work was partially supported by National Science Foundation Grant SBR-96-16833; the University of Pennsylvania Research Foundation; and the Institute for Research in Cognitive Science at the University of Pennsylvania (NSF-STC Cooperative Agreement number SBR-89-20230). I am grateful to Michael Kelly and Albert Kim for helpful comments on earlier drafts of this paper. Notes 1. Example sentences tend to use the name “John”—a practice that I have grown tired of. I have therefore developed a program that randomly selects from a list of two names, with the only constraint being that the names appear in alphabetical order in the sentence. Similarities to actual people and situations are purely accidental. 2. Four subjects were excluded from this analysis, because postexperiment interviews revealed that they could identify the majority of the prime words. Interestingly, these subjects show inhibitory effects of the prime’s argument structure (see Trueswell and Kim 1998).
References Bates, E., Harris, C., Marchman, V., Wulfeck, B., and Kritchevsky, M. (1995) Production of complex syntax in normal aging and Alzheimer’s disease. Language and Cognitive Processes 10:487–544.
The Organization and Use of the Lexicon
343
Bever, T. G. (1970) The cognitive basis for linguistic structures. In Cognition and the Development of Language, ed. J. R. Hayes. New York: John Wiley and Sons. Boland, J. E. (1997) The relationship between syntactic and semantic processes in sentence processing. Language and Cognitive Processes 12:423–484. Boland, J. E., Tanenhaus, M. K., Garnsey, S. M., and Carlson, G. (1995) Verb argument structure in parsing and interpretation: Evidence from wh-questions. Journal of Memory and Language 34:774–806. Bresnan, J. and Kaplan, R. (1982) Lexical functional grammar: A formal system of grammatical representation. In Mental Representation of Grammatical Relations, ed. J. Bresnan. Cambridge, MA: MIT Press. Carlson, G. N. and Tanenhaus, M. K. (1988) Thematic roles and language comprehension. In Syntax and Semantics, volume 21, ed. W. Wilkens. New York: Academic Press. Church, K. and Mercer, R.(1993) Introduction to the special issue on computational linguistics using large corpora. Computational Linguistics 19:1–24. Cowart, W. (1987) Evidence for an anaphoric mechanism within syntactic processing: some reference relations defy semantic and pragmatic considerations. Memory and Cognition 15:318–331. Ferreira, F. and Clifton, C. (1986) The independence of syntactic processing. Journal of Memory and Language 25:348–368. Francis, W. N. and Kucera, H. (1982) Frequency Analysis of English Usage: Lexicon and Grammar. Boston: Houghton-Mifflin. Frazier, L. (1989) Against lexical projection of syntax. In Lexical Representation and Process, ed. W. Marslen-Wilson. Cambridge, MA: MIT Press, 505–528. Frazier, L. (1995) Constraint satisfaction as a theory of sentence processing. Journal of Psycholinguistic Research 24:437–468. Frazier, L. and Fodor, J. D. (1978) The sausage machine: a new two-stage parsing model. Cognition 6:291–325. Frazier, L. and Rayner, K. (1982) Making and correcting errors during sentence comprehension: Eye movements in the analysis of structurally ambiguous sentences. Cognitive Psychology 14:178–210. Garnham, A. (1985) Psycholinguistics: Central Topics. New York: Methuen. Garnsey, S. M., Pearlmutter, N. J., Myers, E., and Lotocky, M. A. (1997) The contributions of verb bias and plausibility to the comprehension of temporarily ambiguous sentences. Journal of Memory and Language 37:58–93. Hodges, J. R., Patterson, K., and Tyler, L. K. (1994). Loss of semantic memory: Implications for the modularity of mind. Cognitive Neuropsychology 11:505–542. Holmes, V. M., Stowe, L., and Cupples, L. (1989) Lexical expectations in parsing complement-verb sentences. Journal of Memory and Language 28:668–689. Joshi, A., Vijay-Shanker, K. and Weir, D. (1991) The convergence of mildly contextsensitive formalisms. In The Processing of Linguistic Structure, ed. P. Sells, S. Shieber, and T. Wasow. Cambridge, MA: MIT Press, 31–91. Juliano, C. and Tanenhaus, M. K. (1994) A constraint-based account of the subject/object ambiguity. Journal of Psycholinguistic Research 23:459–471. Levy, Y. (1996). Modularity of language reconsidered. Brain and Language 55:240–263. MacDonald, M. C., Pearlmutter, N. J., and Seidenberg, M. S. (1994) The lexical nature of syntactic ambiguity resolution. Psychological Review 101:676–803. Marcus, M. P. (1995). New trends in natural language processing: Statistical natural language processing. Proceedings of National Academy of Sciences 92:10052–10059. Marcus, M. P., Santorini, B., and Marcinkiewicz, M. A. (1993) Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19:313–330.
344
John C. Trueswell
Marslen-Wilson, W. D. and Tyler, L. K. (1987) Against modularity. In Modularity in Knowledge Representations and Natural Language Understanding, ed. K. Garfield. Cambridge, MA: MIT Press. McClelland, J. L. (1987) The case for interactionism in language processing. In Attention and Performance XII, ed. M. Coltheart. Hillsdale, NJ: Lawrence Erlbaum Associates. Mitchell, D. C., Cuetos, F., Corley, M., and Brysbaert (1995) Exposure-based models of human parsing. Journal of Psycholinguistic Research 24:469–488. Pearlmutter N. J., and MacDonald, M. C. (1995) Individual differences and probabilistic constraints in syntactic ambiguity resolution. Journal of Memory and Language 34:521–542 Pollard, C. and Sag, I. (1994) Head-driven Phrase Structure Grammar. Chicago, IL: University of Chicago Press. Rayner, K., Carlson, M., and Frazier, L. (1983) The interaction of syntax and semantics during sentence processing. Journal of Verbal Learning and Verbal Behavior 22:358–374. Rayner, K. and Frazier, L. (1989) Selectional mechanisms in reading lexically ambiguous words. Journal of Experimental Psychology: Learning, Memory and Cognition 5:779–890. Rayner, K., Pacht, J. M., and Duffy, S. A. (1994) Effects of prior encounter and discourse bias on the processing of lexically ambiguous words: Evidence from eye fixations. Journal of Memory and Language 33:527–544. Rayner, K., Sereno, S., Lesch, M., and Pollatsek, A. (1995) Phonological cues are automatically activated during reading: Evidence from eye movement priming paradigm. Psychological Science 6(1):26–32. Schmauder, A. R. (1991) Argument structure frames: A lexical complexity metric? Journal of Experimental Psychology: Learning, Memory and Cognition 17:49–65. Schmauder, A. R., Kennison, S., and Clifton, C., Jr. (1991) On the conditions necessary for observing argument structure complexity effects. Journal of Experimental Psychology: Learning, Memory, and Cognition 17:1188–1192. Schwartz, M. F., Marin, O. S. M., and Saffran, E. M. (1979) Dissociations of language function in dementia: A case study. Brain and Language 7:277–306. Sereno, S. C. (1995) Resolution of lexical ambiguity: Evidence from an eye movement priming paradigm. Journal of Experimental Psychology: Learning, Memory and Cognition 21(3):582–595. Sereno, S. C., Pacht, J. M., and Rayner, K. (1992) The effect of meaning frequency on processing lexically ambiguous words: Evidence from eye fixations. Psychological Science 3(5):296–299. Sereno, S. C. and Rayner, K. (1992) Fast priming during eye fixations in reading. Journal of Experimental Psychology: Human Perception and Performance 18:173–184. Shapiro, L. P., Zurif, E., and Grimshaw, J. (1987) Sentence processing and the mental representation of verbs. Cognition 27:219–246. Shapiro, L. P., Zurif, E., and Grimshaw, J. (1989) Verb representation and sentence processing: contextual impenetrability. Journal of Psycholinguistic Research 18:223–243. Steedman, M. (1996) Surface Structure and Interpretation. Cambridge, MA: MIT Press. Taraban, R. and McClelland, J. (1988) Constituent attachment and thematic role assignment in sentence processing: Influences of content-based expectations. Journal of Memory and Language 27:1–36. Trueswell, J. C. (1996) The Role of Lexical Frequency in Syntactic Ambiguity Resolution. Journal of Memory and Language 35:566–585. Trueswell, J. C., Kim, A., and Shapiro, S. (1997) Using verb information during listening and reading: Semantic fit and argument frequency effects for the alternating dative. In preparation.
The Organization and Use of the Lexicon
345
Trueswell, J. C. and Kim, A. (1998) How to prune a garden-path by nipping it in the bud: Fast priming of verb argument structure. Journal of Memory and Language 39:102–123. Trueswell, J.C., and Tanenhaus, M.K. (1994) Toward a lexicalist framework for constraintbased syntactic ambiguity resolution. In Perspectives on Sentence Processing, ed. C. Clifton, K. Rayner, and L. Frazier. Hillsdale, NJ: Erlbaum. Trueswell, J. C., Tanenhaus, M. K., and Garnsey, S. M. (1994) Semantic influences on parsing: Use of thematic role information in syntactic ambiguity resolution. Journal of Memory and Language 33:285–318. Trueswell, J. C., Tanenhaus, M. K., and Kello, C. (1993) Verb-specific constraints in sentence processing: Separating effects of lexical preference from garden-paths. Journal of Experimental Psychology: Learning, Memory and Cognition 19:528–553. Tyler, L. K. and Marslen-Wilson, W. D. (1977) The on-line effects of semantic context on syntactic processing. Journal of Verbal Learning and Verbal Behavior 16:683–692. West, R. F. and Stanovich, K. E. (1986) Robust effects of syntactic structure on visual word processing. Memory and Cognition 14:104–112.
Contributors
Cynthia Fisher University of Illinois Susan Goldin-Meadow University of Chicago Kathy Hirsh-Pasek Temple University John Jonides Department of Psychology University of Michigan Philip J. Kellman University of California, Los Angeles
Elissa L. Newport Department of Brain and Cognitive Sciences University of Rochester W. Gerrod Parrott Georgetown University Daniel Reisberg Reed College Robert A. Rescorla University of Pennsylvania Paul Rozin University of Pennsylvania
Michael Kelly Department of Psychology University of Pennsylvania
John Sabini Department of Psychology University of Pennsylvania
Donald S. Lamm Norton Publishers
Elizabeth F. Shipley University of Pennsylvania
Barbara Landau University of Delaware
Thomas F. Shipley Department of Psychology Temple University
Jack Nachmias University of Pennsylvania Letitia R. Naigles University of Connecticut
John C. Trueswell Department of Psychology University of Pensylvania
Index
Abrahamsen, Adele, 209 Adams, R. M., 316 Adaptive staircase procedure, 178 Adjectives, in verse, 313 Alphanumeric stimuli, 10 Alternating dative ambiguity, 334–335 American Sign Language (ASL) children’s acquisition of, 113 topicalized structures in, 116–117 Anderson, R. L., 235 Andrade, J., 141 Aphasia, 10 Appearance. See also Physical objects identity determined by, 79, 80 and object naming, 221 Argument linking, in syntax acquisition, 283 Argument structure fast lexical priming of, 336–340 of verbs, 333 and word recognition, 331 Armstrong, Sharon, 16 Articulation, concurrent and imagery, 148 and memory-span performance, 149 Asch, Solomon, 4 Aslin, Richard, 107 Auditory imagery limitations in, 146 and parallel phenomena, 154 and perception, 143, 144, 145 perceptual understanding in, 151 production of, 146–147 and subvocalized support, 141 and task-irrelevant noises, 141 Auditory images, creation of, 150 Awh, E., 89 Baddeley, Alan, 89, 141 Baker, E., 195
Battig, W. F., 317 Bauer, L., 319 Behavioral properties, in identification of physical objects, 77–80 Besag, J., 143, 155n.2 Blackmore, S. J., 295 Blind children language learning in, 14, 15, 210 objects named by, 214–216 spatial knowledge in, 15 visual verbs and, 16 Bloomfield, Leonard, 6 Boundaries in 3-D world, 170 interpolation, 158, 186 motion signals for, 301 Boundary assignment, 165 Boundary localization, and relatability, 178 Bound morpheme comprehension, study of, 197–206 Bound morphemes, 193 data analysis of, 205 and sentence processing, 202 Bower, Gordon, 239 Bowerman, M., 277 Bowes, Thomas, 233 Brain activation computer display of, 102 dose response curves for, 95–97 and memory load, 98 in neuroimaging experiments, 92–94 Brelstaff, G., 295 Brent, M. R., 283 Bright, Timothy, 233 Brown, R., 218 Bruner, J., 128 Burton, Richard, 234 Bush, Robert, 5, 27, 53 Bushnell, Emily, 7
350
Index
Caregiver speech, 11. See also Motherese Carew, Richard, 233 Carey, S., 71, 72 Categories entrenched properties and, 73–76 membership in, 70 psychological essentialism of, 70–73 Category development in children, 73 establishing identities in, 291 Causal agent, in object naming, 220 Certainty distinction, and MSV acquisition, 250–251, 257–258, 266 Chalkley, M. A., 107, 194 Change motion-based model of, 301 perception of, 293–394 and perception of occlusion, 299 during saccades, 295 Charron, Pierre, 233 Cheese Seminar, 8, 191. See also Seminars Chiasson, L., 255 Children. See also Blind children; Deaf children acquisition of mental state verbs in, 245 bound morpheme comprehension in, 197–200, 202 categorization of objects by, 69 entrenched categories acquired by, 75–76 notions about insides of, 81–82 preschool experience of, 263–266 sensitivity to morphological cues of, 196 statistical learning in, 109–110 Children, hearing, gesture systems of, 123 Chinese culture development of gestural systems in, 129 mother-child interaction in, 128 Chomsky, Carol, 4 Chomsky, Noam, 4, 6, 275–276 Chronometric analysis, 87 Clark, E. V., 317 Clark, H. H., 108, 317 Clark, Kenneth, 1 Classification. See also Categories similarity as basis for, 215 and word learning, 216 Cognitive development and MSV acquisition, 254–255 similarities in, 215 Cognitive psychological approach, to mental verbs, 249–250 Cognitive science, 87
and creative language use, 312 foundation of, 213 Coloration, and adjectives, 219 Colwill, Ruth, 43–44 Combinatory categorial grammars (CCGs), 341 “Common Fate, Factor of,” 181–182 Complete objects, perception of, 182 Compound nouns, structure of, 6 Comprehension studies, 195, 205. See also Cognitive science Computational analysis, of visualizing objects, 163 Computational linguistics, 4, 341 Computer models, in study of cognition, 87 Computers advances in, 12 UNIVAC2, 3 Conjunctive phrases, contractions of, 319 Conjuncts, 320 Continuation principle, 163, 166 Continuity first-order, 164 Gestalt notions of, 165 Continuity errors, 295–296 Contours 3-D illusory, 170 and good continuation, 163 illusory vs. occluded, 160–161 occluded, 170 relatability, 174 Counting-out poems, 321–322, 324 Count noun context, for object naming, 219 Crabbe, G., 157 Crowder, R., 142 Cunningham, Douglas, 297 Darbishire, H., 316 Das, A., 186 Davidge, J., 255 Deaf children acquisition of language by, 14 Chinese, 129 gesture systems of, 123–124, 126, 133 language acquisition of, 113 language use of, 126 and parent-child interaction, 127, 128 Deception, emotional, 236 Decomposing, of complex cognitive processes, 87
Index Dell, Gary, 7 Deprivation paradigm in language acquisition, 245 pioneered by Gleitmans, 269 Depth relatability experiments, stimuli in, 170–171 Depth relationship, between objects, 160 Detachment gain, 155 Diachronic survey, 318 Diggle, P. J., 143, 155n.2 Discontinuity, first-order, 164 Discovery, image-based, 152 Distributional analysis, of language acquisition, 17–18 Distributional information, for language, 107 Donders, R. C., 91, 95–96 DO/S ambiguity, 336 Dot localization paradigm, 178–179 Down’s syndrome, language learning in, 14 Drama, H. Gleitman’s interest in, 231–232. See also Plays; Theater Dynamic occlusion displays, 299, 300, 301 paradigm, 183–187 Ecological analyses of perception, 181 Edge, occluding, 165–166 Edge classification, 165, 168 Edge continuation, and interpolation, 186 Edge-insensitive process, 181 Edge interactions, 187 Edge relatability and object perception, 181 and similarity, 179 spatiotemporal, 182–183 Edge-sensitive process, 182 Education, undergraduate, 54 Elyot, Sir Thomas, 233 Embarrassment, emotion of, 232 Emotion in Elizabethan psychology, 234–241 everyday meaning of, 240–241 folk psychologies of, 232 modern academic theories of, 241 modern psychology of, 240–241 Enacted images, 147 English language, lexical innovation in, 317. See also Language Entrenchment, concept of, 73 Equifinality, in language learning, 129–130
351
Ergative languages, 286, 287 Errors, patterns of, 87 Essences, psychological, 70–71 Essentialism, psychological, 70–73 Ethics, emotion and, 241–242 Events, in Elizabethan psychology, 237 Factivity, determination of, 248–249 Factivity dimension children’s understanding of, 253 in grade-schoolers, 252 and MSV acquisition, 254 in three-year-olds, 251 Factor of direction, 162 False beliefs, first-order vs. second-order, 269n.3 Familiar morpheme hypothesis, 204 Fast lexical priming, 337, 338 Feldman, Heidi, 7, 13, 14, 123, 210 Field, D., 168 Fisher, Cynthia, 220, 275–290 Form class, 193 grammatical morphemes in, 194 and spatial relationships, 213 Fowler, C. A., 313 Frank, R., 255 Free report procedure, 300 Function, 221 in object naming, 221–225 smooth, 162 Furrow, D., 253, 255 Gabor patches, 168 Galanter, Eugene, 3–4, 5 Gallistel, Randy, 15 Garden-path effects, 338 in language learning, 327–328 magnitude of, 339 Garnsey, S. M., 329 Geer, Sandra, 7 Gelman, R., 81–82 Gelman, S. A., 82, 195 Generalization, in object naming, 226 Gerken, L. A., 196, 197, 201 Gestalt principles good continuation, 188 good form, 174–177 law of proximity, 169 in object perception, 158 object segregation, 157 and unit formation, 161 Gestalt psychology
352
Index
good continuation principle in, 161–165, 166 identity hypothesis and, 159–161 neural models of, 186–187 update of, 158 Gesture creation, and language development, 129 Gesture-speech mismatch, and readinessto-learn, 131 Gesture-speech system, integrated, 130–131 Gesture systems of deaf children, 123 effect on learning of, 132–134 environmental conditions for, 126–129 grammatical categories for, 125 and mother-child interaction, 128 sentence-level structure in, 124–125 word-level structure of, 125 Gibson, J. J., 293, 294, 296 Gilbert, C. D., 186 Gleitman, Claire, 6, 28 Gleitman, Ellen, 6 Gleitman, Henry academic style of, 29–30 educational philosophy of, 311–312 formal education of, 1 honors and awards of, 18–19 interdisciplinary approach of, 341–342 interest in drama of, 28, 35, 231 as mentor, 211 A Midsummer Night’s Dream directed by, 311 photos of, 35, 36, 37 published with students, 13, 14, 24, 41 at Swarthmore, 4–5, 23 on teaching, 49 teaching of, 191, 210 Gleitman, Lila R., 36, 38 cited, 192, 194, 195, 197, 202, 206, 209, 210, 218, 220, 247 collaboration with P. Rozin, 28 coteaching with H. Gleitman, 54–55 educational philosophy of, 312 honors and awards of, 19 interdisciplinary approach of, 341–342 as mentor, 211 Ph.D. work of, 6–7 Psychology reviewed by, 64 published work of, 11, 12, 13 teaching of, 191 at Univ. of Penn, 13
Gleitman, Phillip, 34 Gleitman family, 33 Gleitology, 57–65 Global instruction condition, 178 Global symmetry, 188 Goddard, David, 5 Goffman, Erving, 53, 232 Goldberg, A., 276 Goldfish, memory studies in, 9, 27 Goldin-Meadow, Susan, 7, 13, 14, 121–137, 210 Goldsmith, John, 7 Golinkoff, Roberta, 192, 197, 284 Good continuation and object perception, 188 principle, 161–163, 163–165, 166 and relatability, 166–167 updated notion of, 174 Good form principle of, 174–177 putative examples of, 175 Goodman, Nelson, 69, 73, 82 Grammar acquisition of, 288 discovery of, 192 in gesture systems, 125–126 learning, 204–205 lexicalized formalisms in, 341 and lexicon, 341 universal architectural principles of, 116–117 universal (UG), 276 Grammatical categories distributional view of, 194 extensions of, 317–318 “Great Verb Game,” 17 Grief, in Shakespeare’s plays, 237–238 Grouping basic problem in, 158–159 and notions of smoothness, 163 understanding, 187 Hall, G., 220 Hall, W. S., 254, 255, 264 Harris, Zellig, 3, 17, 112 Hayes, A., 168 Hayes, J. R., 108 Head-driven phrase-structure grammars (HPSGs), 341 Hearing children, gestures of, 133 Hess, R. F., 168 Hilgard, E. R., 39
Index
353
Hirsh-Pasek, Katherine, 191–208, 284 Hiz, Henry, 6 Hoenigswald, Henry, 3, 6 Hoff-Ginsberg, E., 268 Holophrastic listeners, and grammatical morphemes, 195 Homophone judgment, 153 Housum, J., 313 Hull, 24 Humor, role of expectation and surprise in, 14 Hurvich, Leo, 5 Huttenlocher, J., 264
Informativeness, and poetic meter, 313 Insides, and determination of identity, 80–82 Instrumental learning, experiment in, 44–45 Instrument timbres, study of, 142–148 Intention, and object naming, 221 Intermodal preferential looking paradigm (IPLP), testing, 198–200 Interpolation, in perception, 182 Intransitives, subjects of, 286–287 Item recognition, study of, 9–10 Iverson, P., 143
Iambic pentameter, 312–313, 315 Identification. See also Object naming effects of insides upon, 80–82 entrenched properties and, 76–80 Identification judgments, of children, 82–83 Identity and perception, 292–294 psychological, 291 Identity hypothesis and global-local controversy, 176–177 in object completion, 159–161 Imagery. See also Auditory imagery auditory vs. visual, 152 binary definition of, 323 functioning of, 151 and memory, 322 Images interpretation of, 155n.4 nature of, 152 pathways for creation of, 142–148 Imagined sounds, multidimensional space for, 142 Inductive inferences entrenchment concept in, 73 and psychological essentialism, 72 role of entrenchment in, 74 Infants enactment tasks for, 195 function morphemes identified by, 196 language learning in, 6 morpheme sensitivity of, 201–202 statistical learning in, 109–110 understanding in, 195 Information, functional, and shape bias, 221–225 Information, verbal, working memory for, 89
Jackendoff, R., 281 James, William, 36, 37 Jameson, Dottie, 5 Johansson, G., 292 Johnson, E. C., 72 Jones, S. S., 71 Jonides, John, 7, 9, 13, 34, 87–103 Joshi, Aravind, 4 Jusczyk, Peter, 7 Kalish, C. W., 72 Kanizsa’s Demonstrations, 175 Kanizsa, G., 160 Kaplan, G., 294 Katz, N., 195 Katz, S., 89 Kauffman, Bruria, 4 Keil, F. C., 71, 81 Kellman, Philip J., 14, 157–190, 301 Kelly, Michael, 311–326 Kinetic screen effect, 294 Koeppe, R. A., 89 Koffka, K., 181 Köhler, Wolfgang, 4, 23, 61, 307 Krumhansl, C., 143 Lamm, Donald S., 5, 57–65 Landau, Barbara, 14, 71, 209–230, 288 Language ergative, 286, 287 and perceptual bias, 220–221 resilience of, 123–124 and space, 210–211, 227 Language acquisition, 275 in children with Down’s syndrome, 129 comprehension data in, 195, 205 conceptual structures in, 281 deprivation paradigm in, 245
354
Index
Language acquisition (cont.) distributional analysis of, 17–18 grammatical morphemes’ role in, 192 and inconsistent linguistic input, 113–115 natural experiments in, 112 nature-nurture question in, 11–12 output grammar, 113 and partial information, 288 problem of, 105–106 role of syntax in, 218–221 transitional probabilities in, 108–110 understanding, 13 Language and Experience (Landau and Gleitman), 211, 214 Language comprehension incremental nature of, 327 lexical items in, 341 Language development language comprehension in, 195 sensitivity to morphological cues in, 196 Language learning, 122 equifinality in, 129–130 matching problem in, 291 object naming in, 225–227 research in, 6 resilient properties of, 124–126 of second language, 14 spatial experience in, 211 Learning delayed response, 41 gesture’s role in, 130, 132 segmentation in, 111 and teaching, 154 Leeuwenberg, E., 174 Lennard, Samson, 233 Levin, D. T., 296, 307 Lexicalized tree-adjoining grammars (LTAGs), 341 Lexical learning, 211–212 Lexical preference, and syntactic ambiguities, 331–336 Lexical priming technique, 337 Lexical stress, and English spelling, 312, 314, 315, 317. See also Stress Lichtenberg, Lila, 2–3. See also Gleitman, Lila R. Linguistic input building structure with, 116–118 inconsistent, 113–115 natural experiments of, 112–118 reshaping and restructuring of, 117–118 Linguistics
computational, 341 mental verbs in, 248–249 primitives and, 276 Liquid-crystal-diode (LCD) shutter glasses, 171 Literature, folk, 240 Location, and object naming, 220 Locke, John, 211 Lorenz, Konrad, 32 Luce, Duncan, 5, 6 Luminance masking, 295 MacMillan, Deborah, 7 MacNamara, J., 195 Malt, B. C., 72 Maratsos, M., 107, 194 Marin, Oscar, 10 Markman, E. M., 74 Material, and object naming, 220 Maternal speech mental verbs in, 247–248 and MSV acquisition, 255–256 May, Robert, 7 McDill, M., 301 McGill, Bill, 25 McIntosh, B. J., 196, 197, 201 Meanings for blind child, 212 of gestures, 125 of MSVs, 250–251 and presyntactic cues, 279–280 sentence structure and, 276–277 Memory goldfish, 8–9 investigations of, 5 visual, 296–297 Memory, working, 88–99 architecture of verbal, 89–99 defined, 88 differing subsystems of, 99–102 parametric studies of, 92, 95 spatial vs. verbal, 99–100, 102 studies in, 148 Memory control, in neuroimaging experiments, 89–90 Mental context, detachment from, 154 Mental image, understandings attached to, 150 Mental representations prototypes for, 16 shape bias in, 218 Mental state verb acquisition
Index experiments with, 258–263 and preschool experience, 263–266 and radical translation problem, 247–248 theory of, 254–256 Mental state verbs (MSVs) adult usage of, 264 characteristics of, 246–247 children’s acquisition of, 245–246 degree of certainty of, 250–251, 257–258, 266 developmental understanding of, 251–252 first uses of, 251 measuring comprehension of, 260, 261, 262 polysemy of, 248–251 and theories of mind (TOM), 252–254 Michotte, A., 157, 293, 296, 306 Miller, George A., 58, 59, 60 Milton, John, 315, 316 Mind. See Theory of mind Mintz, Toby, 107 Montague, W. E., 317 Moore, C., 253, 255, 257, 258 Morgan, J., 194 Morphemes, grammatical, 193. See also Bound morphemes acquisition of, 113 in form class assignments, 194 in language acquisition, 192–193 sensitivity to, 195 in syntactic development, 194 Morphology bound, 204–205 in language acquisition, 114 probability used, 118 Motherese, 11–12, 13, 105, 122, 196, 246 Mother’s speech MSVs in, 269n.4 studies of, 105 Motion, perceiving structure from, 292 Motion signals boundaries defined by, 302–306 to perceive moving boundary, 301–302 sequential pattern of, 303 Murphy, Gardner, 1 Music, segmentation in, 111 Nachmias, Jacob, 4, 5, 23–25, 40 Nagy, W. E., 254 Naigles, Letitia R., 245–274, 284 Nakayama, K., 165
355
Naming latencies, 335 Naming patterns, study of, 222. See also Object naming Nature-nurture questions, 112 “N-back” task, in working memory experiments, 89, 93, 96, 97 Necker cube, 151 “Neg-raising” defined, 269n.1 syntactic phenomenon of, 248–249 Neisser, Ulrich, 4, 23, 24, 296 Nelson, K., 295 Neural models, 186–187 Neuroimaging techniques application of, 102–103 experiments with, 89–99 in reaction time studies, 91–92 in study of cognition, 87–88 Newport, Elissa, 7, 10–11, 13, 105–119 Norton, 59 Nouns in child-directed speech, 192–193 compared with verbs, 317–318 contexts of, 219 first-syllable stress, 318 presyntactic primitives as, 283–285 prosodic properties of, 193 Object completion depth information in, 170–174 dynamic, 183–187 and edge interactions, 187 global notions of, 179 hypothesis, 172 identity hypothesis in, 159–161 similarity in, 179 Object constancy, occlusion and, 297–301 Object function, studies of, 226. See also Function Object names, for blind children, 214–216 Object naming children’s vs. adult’s, 225–227 form vs. function in, 223–224 generalization in, 226 mature, 225–227 role of syntax in, 218–221 and shape bias, 217–218 shape in, 216–217 Object perception. See also Perception models of, 188 multiple tasks in, 165 relatability in, 166–168
356
Index
Object permanence, in occlusion displays, 307 Object recognition system, object naming in, 227 Objects categorization of, 69 in Elizabethan psychology, 237 (nonnaming) judgments about, 222 persistence of, 293 in word acquisition, 213 Object shape, 216–217. See also Shape; Shape bias Object unity, in 3–D world, 170 Occlusion dynamic, 299 and object constancy, 297 sequential, 303, 304, 305 Opacity, of moving surfaces, 305 Optics, ecological, 163 Oral traditions, and study of memory, 323–324 O’Regan, J. K., 295 Orthography, English, variability in, 314, 317 Outcomes, in instrumental learning, 44–45, 47 Paivio, A., 322 Palmer, Evan, 183 Paquin, M., 255 Parental input, and mental verb understanding, 255, 263. See also Maternal speech Parent-child interaction, and MSV distinction, 267–268 Parentheticals, MSVs in discourse structure of, 249 Parrott, W. Gerrod, 14, 231–244 Parser, development of, 4 Parsing alternatives to encapsulated, 328–331 lexicalist approaches to, 337 Partial sentence representation (PSR) explained, 281 and utterance structure, 285 Participle frequency, effects of, 333 Particular morpheme hypothesis, of morpheme sensitivity, 203–204 Paulesu, E., 90 Pavlovian conditioning, exploration of, 46 Peabody Picture Vocabulary Test (PPVT) Revised, 256
Peer language use, 268 Pennsylvania, University of. See also Psychology Dept. Gleitmans’ move to, 5 H. Gleitman as chairman at, 53 H. Gleitman’s arrival at, 60 job-talk ritual at, 121–122 Perception and auditory imagery, 143 and conscious experience, 307 and continuity errors, 295–296 depth relationship, 160 ecological analyses of, 181 in Elizabethan psychology, 237 of form, 151 in Gleitmans’ seminars, 157 identity and, 292 and internal representation, 293, 294–295 and luminance masking, 295 of object constancy, 297 Perception, visual, and object naming, 216–217 Perceptual biases, language and, 221 Perceptual constancies, 292 Perceptual development, and Gestalt principles, 158 Perceptual organization development of, 157 understanding, 187 Perceptual processing, continuity in, 165 Perceptual unit formation, 291 Perceptual verbs, 250 Persistence, perception of, 293, 294 Petersik, J. T., 301 PET scanner, working memory experiments with, 101 Petter, G., 159 Petter’s effect, 160, 176 Philosophy, educational, 311 Phrasal stress. See Stress Phrase and Paraphrase (Gleitman and Gleitman), 7 Physical objects. See also Object naming; Object perception importance of, 71 psychological categories of, 69 transformation of properties of, 71 Picture pointing tasks, limitations of, 198 Pinker, S., 193, 194 Pitt, M., 142 Place, in word acquisition, 213. See also Space
Index Plays. See also Drama; Theater directed by H. Gleitman, 29 produced by H. Gleitman, 25 psychology in, 232–234 Shakespeare’s, 232–240 Poetic meter iambic pentameter, 312–313, 315 information in, 313 Poetry, and study of memory, 323–324. See also Verse Polat, U., 186 Prägnanz principle, 187 Predicates argument, 283 spatial relationships as, 213 Preschoolers, vs. home-schooled children, 265 Preschool experience, and mental verb understanding, 263–266 Priming studies, 177 Primitives Chomsky on, 275–276 nouns as presyntactic, 283–285 Problem solving, and working memory, 88 Prosodic bracketing, 194 Prosody, in language acquisition, 193 Prototype representations, 16 Proximity, Gestalt law of, 169 Pseudowords, disyllabic, 314 Psycholinguistics, 192, 206, 341 and sentence processing, 341 and word blends, 321 Psychology American, 13 Elizabethan, 234–240 in English Renaissance, 232–234 folk, 232, 240–242 identity problem in, 291 Renaissance, 240–242 teaching of, 27 Psychology course, introductory, 39–40 recording of, 60–61 syllabus for, 58, 59, 60 Psychology dept., Univ. of Penn, 5–6, 61 rules of, 54 teaching at, 154 Psychology (Gleitman), 311 beginning of, 5 completion of, 19, 65 editions of, 65 first chapter of, 62
357
historical component of, 63 origins of, 58, 59 Renaissance equivalent of, 232 reviews of, 63–64 success of, 64–65 Psychophysiology, Elizabethans’, 235 Puppets, in experiments with children, 258–259 Quine, W. V. O., 214, 215, 216, 246 Radical translation, problem of, 246, 247 Radio Free Europe, 25 Rakowitz, S., 220 Rayner, K., 337 Reaction times, study of, 91, 94 Readiness to learn, gesture and speech as index of, 131–132 Reading ambiguity resolution during, 336 studies of, 9 Reading acquisition, early study of, 27–28 Reasoning, and working memory, 88 Recall cues, effectiveness of, 322 Recognition from partial information (RPI), 176, 177 Reduced relative clauses, and participle frequency, 333 Rehearsal, memory, and concurrent articulation, 149 Rehearsal control, in neuroimaging experiments, 89–90, 91 Reisberg, Daniel, 139–156 Relatability in cases of minimal gaps, 169–170 construction used to define, 167 3-D, 170–174 in edge interpolation, 186 experimental evidence about, 168–169 and localization of boundaries, 178 in object perception, 166–168 spatiotemporal, 182–183, 184 Relatable displays, accuracy of, 185 Relative clause ambiguity, 333, 336 Rensink, R. A., 295, 307 Repetitions, imagined vs. perceived, 149 Rescorla, Robert A., 39–47 Reynolds, H., 294 Rhyme judgment, 153 Rhyme patterns, in child verse, 321–324 Rice, M., 256 Rips, Lance, 7
358
Index
Ritter, E., 277 Rock, I., 301 Rosch, Eleanor, 16 Rosen, S. T., 277 Ross, D. S., 115 Rozin, Paul, 8, 9, 10, 27–38, 42 Sabini, John, 49–56 Saccades, changes during, 295 Saffran, Jenny, 107 Sagar, Naomi, 4 Sagi, D., 186 Same kind, in object naming, 225 Scherer, Martin, 1 Schmidt, Hillary, 14 Scholnick, E., 254 Schumacher, E. H., 89 Schwanenflugel, P., 250, 252 Schweisguth, Melissa, 192, 197 Second language learning, 14 Segmentation basic problem in, 158–159 and notions of smoothness, 163 understanding, 187 word, 107–110 Selective interference experiments, 100 Self-control, as virtue, 239, 240 Self-splitting figures (SSOs), 159–160, 161 Semantic constraints, lexical frequency and, 331 Semantic information lexically specific, 329 and verb recognition, 330 Semantics, and language acquisition, 193 Seminars cheese, 8, 191 Gleitmans’ research, 7–8, 8, 24, 40, 55, 87, 103, 118, 139, 157, 188, 191, 221, 242 L. Gleitman’s graduate, 54, 56 weekly evening, 122–123, 209–210 Sentence interpretation, and sentence structure, 281–283 Sentence processing, psycholinguistic theories of, 341 Sentences, children’s gesture, 124–125 Sentence structure and conceptual structure, 282–283 and meaning, 276–277 presyntactic, 288 Sereno, S. C., 337 “Sesame Street,” 256
Shafer, V. L., 196 Shakespeare, plays of emotion in, 234–240 psychology in, 232–234 Shape accurate identification of, 303 artifacts used in studies of, 221 and object naming, 216–217, 219 Shape bias, 75 in adults vs. children, 223–224 and functional information, 221–225 and object naming, 217–218 Shatz, Marilyn, 7–8, 251 Shepard, Roger, 140 Shimojo, S., 165 Shipley, Elizabeth, 6, 8, 11, 12, 13, 69–85, 192, 194, 195, 197, 202, 206, 226 Shipley, Thomas F., 291–309 Shipley, Tim, 157, 179, 183 Shows. See also Theater directed by H. Gleitman, 29 produced by H. Gleitman, 25 Shucard, D., 196 Shucard, J., 196 Siblings, effect on MSV acquisition, 268 Siegal, Muffy, 7 Sigman, E., 301 Silverman, G., 165 Similarity and edge relatability, 179 and object naming, 214–215, 217 Simons, D. J., 81, 296, 307 Singer, D., 256 Singer, J., 256 Singleton, J. L., 116 Siskind, J., 282 Sloan Group, 211 Smith, Carlotta, 6, 11, 12, 13, 192, 194, 195, 197, 202, 206 Smith, E. E. (Ed), 88, 89 Smith, L. B., 71 Smoothness, notions of, 162–163 Smythe, P. C., 322 Solomon, Dick, 5, 41, 42 Sounds. See also Auditory imagery mental images of, 144 mental representations of, 153 ratings of perceptions of, 143, 145 Space, vs. language, 210–211, 227 Spatial representation, and object naming, 227
Index Speech covert, 155n.1 prosodic principles of, 313 Spelke, Elizabeth, 14, 15, 157 Spelling, and stress, 314 Stability appearance of, 294 change as information for, 293–294 illusory, 296, 297, 300, 307 motion-based model of, 301 and motion signals, 305–306 perception of, 292, 296 Statistical information, 107 Statistical learning, 106–107, 117 in language acquisition, 111 vs. nonstatistical learning, 111 and word segmentation, 107–108 Sternberg, S., 92 Storytelling, and study of memory, 323–324 Stress. See also Verse first- vs. second-syllable, 318 spelling and, 314 Stress patterns, noun vs. verb, 318 Students, psych 1, 51–52. See also Psychology course, introductory Subjects, and transitive vs. intransitive sentences, 285–287 “Subordinate bias” effect, 332 Subrahmanyam, K., 220 Subtraction strategy, in cognition studies, 89 Subvocalization, 140 and auditory imagery, 152 detachment provided by, 153 planning mechanisms for, 155n.3 Surface completion process, 180 Surface filling, 186 Surface interpolation, examples of, 158 Surface quality, spreading of, 179–181 Surfaces, illusory vs. occluded, 160–161 Surface texture, and adjectives, 219 Swarthmore College, 4–5, 7, 23 Syllable number, and memory, 322–323 Symmetry testing, 179 Syntactic ambiguities and lexically specific argument, 331–336 and lexical preference, 331–336 and lexical priming, 339 Syntactic bootstrapping defined, 277–279
359
original proposal of, 288 pioneered by Gleitmans, 276 presyntactic mechanism for, 284–285 sentence interpretation in, 283 in verb learning, 280 Syntactic context, and object naming, 219 Syntactic diversity, 268 Syntactic evidence, child’s understanding of, 279–280 Syntactic information, lexically specific, 329 Syntax acquisition, 276 argument linking in, 283 early education of, 246 and presyntactic division, 287 utterances in, 282 verb learning in, 280 Tanenhaus, M. K., 329 Tangent discontinuity (TD), 163 continuity and, 165–166 and good form, 174 Taylor, M., 195 Teacher-preschooler interactions, and MSV distinction, 267–268 Teaching. See also Seminars of graduate students, 154 H. Gleitman on, 49 of L. Gleitman, 54–55, 56 of psychology by H. Gleitman, 27 Teaching assistants, 42 Teitelbaum, Philip, 5 Telegraphic listeners, and grammatical morphemes, 195 Television input and MSV acquisition, 270n.6 and MSV understanding, 266 and PPVT scores, 256 Textbooks. See also Psychology publishing of, 51 writing of, 51 Theater, H. Gleitman’s love of, 28, 36. See also Plays Theory of mind (TOM) developing, 252–254 and preschool experience, 264, 268 Thines, G., 157 Thinking and mental verbs, 247 as perceptual experience, 249 Thinking out loud. See also Auditory imagery
360
Index
Thinking out loud (cont.) reasons for, 154 research on, 139–140 Thoughts effect of emotion on, 239 externalized forms of, 140 Toddlers. See also Children bound morpheme comprehension in, 197–200, 202 comprehension studies in, 196 and sentence structure, 280 understanding in, 195 Tolman, Edward Chace, 2, 47 Transformation and Discourse Analysis Project (TDAP), 3 Transitional probabilities, 108, 111 Transitive sentences vs. intransitive sentences, 287 object arguments of, 286–287 Transparency phenomena, 158, 161 Troscianko, T., 295 Trueswell, John C., 327–345 Tunneling, demonstration of, 306 Unit formation, Gestalt principles and, 161 Unity, perceived, and relatability, 172 UNIVAC2 computer, 3 Universal Grammar (UG), 276 Utterances, grouping of words into, 281–282 Vanderhelm, P., 174 Vanlier, R. J., 174 Verbal auxiliaries, position of, 13 Verbal memory, in neuroimaging experiments, 89–90 Verb game, 17 Verb meanings learning of, 16 presyntactic structural cues to, 279–280 understanding of, 18 Verbs. See also Mental state verbs (MSVs) argument structures of, 333 compared with nouns, 317–318 interpretation of, 278 learning, 278, 288 perceptual, 250 prosodic properties of, 193 relational meaning of, 277 second-syllable stress, 318 semantic structures of, 281 and sentence structure, 277
Verbs, action, in syntactic context, 220 Verrekia, L., 315 Verse noun-verb stress difference in, 318–319 rhyme patterns in child, 321–324 rhythmic structure of, 312–314 Virtue, emotion and, 241 Vision, models of, 163 Visual imagery, and Gestalt principles, 150–151 Visual memory, 296–297 Visual recognition, study of, 10 Visual sequences, segmentation in, 110 Vives, Juan Luis, 233 Vocabulary of blind children, 211, 214 and shape bias, 227 and television input, 256–263 Vocabulary learning, and sentence structure, 282–283 Wallach, Hans, 4, 23, 292 Watson, John, 139 Wellman, H. M., 82 Wertheimer, M., 161, 181 Wheeler, K., 294 Whole units, words stored as, 202–203 Williams, Dave, 27 Word acquisition, for blind child, 212–213 Word blends, 319–321 elements in, 320 predictions of, 320–321 Wordgleits, 28 Word learning, and category membership, 216 Word order patterns, 319–320 Word recognition argument structures in, 336–340 syntactic aspects of, 340 Word segmentation and statistical learning, 107–108 studies, 108–110 World events, in language acquisition, 277 Wright, Thomas, 234 Writing, detachment from, 155 Yantis, S., 297 Yin, Carol, 180 Yuille, J. C., 322