INPUT AND EVIDENCE
LANGUAGE ACQUISITION & LANGUAGE DISORDERS
EDITORS Harald Clahsen University of Essex
Lydia White...
81 downloads
1642 Views
1MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
INPUT AND EVIDENCE
LANGUAGE ACQUISITION & LANGUAGE DISORDERS
EDITORS Harald Clahsen University of Essex
Lydia White McGill University
EDITORIAL BOARD Melissa Bowerman (Max Planck Institut für Psycholinguistik, Nijmegen) Wolfgang Dressler (University of Vienna) Katherine Demuth (Brown University) Nina Hyams (University of California at Los Angeles) William O’Grady (University of Hawaii) Jürgen Meisel (Universität Hamburg) Mabel Rice (University of Kansas) Luigi Rizzi (University of Siena) Bonnie Schwartz (University of Durham) Antonella Sorace (University of Edinburgh) Karin Stromswold (Rutgers University) Jürgen Weissenborn (Universität Potsdam) Frank Wijnen (Utrecht University)
Volume 25
Susanne E. Carroll Input and Evidence The raw material of second language acquisition
INPUT AND EVIDENCE THE RAW MATERIAL OF SECOND LANGUAGE ACQUISITION
SUSANNE E. CARROLL Universität Potsdam
JOHN BENJAMINS PUBLISHING COMPANY AMSTERDAM/PHILADELPHIA
8
TM
The paper used in this publication meets the minimum requirements of American National Standard for Information Sciences — Permanence of Paper for Printed Library Materials, ANSI Z39.48-1984.
Library of Congress Cataloging-in-Publication Data Carroll, Susanne. Input and evidence : the raw material of second language acquisition / Susanne E. Carroll. p. cm. -- (Language acquisition & language disorders : ISSN 0925-0123; v. 25) Includes bibliographical references and index. 1. Second language acquisition. 2. Linguistic models. I. Title. II. Series. P118.2.C374 2000 401’.93--dc21 00-046767 ISBN 90 272 2493 5 (Eur.) / 1 58811 011 7 (US) (alk. paper) CIP © 2001 - John Benjamins B.V. No part of this book may be reproduced in any form, by print, photoprint, microfilm, or any other means, without written permission from the publisher. John Benjamins Publishing Co. • P.O.Box 36224 • 1020 ME Amsterdam • The Netherlands John Benjamins North America • P.O.Box 27519 • Philadelphia PA 19118-0519 • USA
To my Dad, J. Allan Carroll. Just for you.
Table of contents
List of tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii List of figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii C 1 Questions, problems, and definitions . . . . . . . . . . . . . . . . . . . . . . . . . 1. Objectives and research questions . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Research questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Definitions: stimuli, intake, and input . . . . . . . . . . . . . . . . . . . . . . 2.1 Stimuli and transducers: the first level of processing . . . . . . . 2.2 Modularity and the Comprehensible Input Hypothesis . . . . . . 2.3 Processing in the Autonomous Induction Theory . . . . . . . . . 2.4 Positive and negative evidence, positive and negative feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Competence and performance, skill and control, faculties, and abilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. A reformulation of the research questions . . . . . . . . . . . . . . . . . . . 4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C 2 Property and transition theories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Why do we need the Autonomous Induction Theory (or something like it)? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 What must an SLA theory explain? . . . . . . . . . . . . . . . . . . . 1.2 Problems with SLA applications of Principles and Parameters Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Problems with the Competition Model . . . . . . . . . . . . . . . . .
1 1 1 4 8 8 12 16 17 24 31 31
37 37 37 40 48
viii
2.
TABLE OF CONTENTS
1.4 A third approach: the Autonomous Induction Theory . . . . . . 50 1.5 The limits of theism and deism as accounts of SLA . . . . . . . 51 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
C 3 The representational and developmental problems of language acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Principles and parameters theory (P&P) . . . . . . . . . . . . . . . . . 2.1 P&P theory: the core ideas . . . . . . . . . . . . . . . . . . . . . 2.2 Universal Grammar: substantive universals . . . . . . . . . 2.3 Universal Grammar: formal universals . . . . . . . . . . . . . 3. Well, that looks good! So what’s the problem? . . . . . . . . . . . . 3.1 UG and the problem of representational realism . . . . . . 3.2 Can parameters be biological constructs? . . . . . . . . . . . 3.3 The SLA P&P theory has no model of triggers . . . . . . 3.4 How does one set a parameter in the face of ambiguous stimuli from two different languages? . . . . . . . . . . . . . 3.5 The deductive value of parameters is now questionable . 3.6 What if there were no parameters in a theory of UG? . . 3.7 What might it mean now to say that UG is “innate”? . . 4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C 4 The autonomous induction model . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The language faculty in outline . . . . . . . . . . . . . . . . . . . . . 2.1 Representational Modularity: The hypothesis of levels 2.2 The intermediate level theory of awareness . . . . . . . 3. Induction and i-learning . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Basic properties of induction . . . . . . . . . . . . . . . . . 3.2 Induction as a search through a search space? . . . . . 3.3 Domains of knowledge as mental models . . . . . . . . 3.4 Condition–action rules . . . . . . . . . . . . . . . . . . . . . . 3.5 Competition among rules . . . . . . . . . . . . . . . . . . . . 3.6 Clustering of effects? . . . . . . . . . . . . . . . . . . . . . . . 3.7 Generation of new rules . . . . . . . . . . . . . . . . . . . . . 3.8 What initiates and ends i-learning? . . . . . . . . . . . . . 4. Summary of the Autonomous Induction Theory . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
65 65 71 71 73 84 88 89 93 96
. . . . .
. . . . .
. . . . .
. . . . .
100 101 106 107 110
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
119 119 120 121 127 130 130 131 136 141 151 155 164 168 173
ix
TABLE OF CONTENTS
C 5 Constraints on i-learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. Form extraction, distributional analysis, and categorisation . . . . . . . 1.1 Prosodic bootstrapping and form extraction? . . . . . . . . . . . 1.2 Distributional analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Categorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Removing the straw man, or why induction needn’t produce “rogue grammars” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Induction and the Autonomy Hypotheses . . . . . . . . . . . . . . 2.2 Induction is not random hypothesis-formation . . . . . . . . . . 2.3 Jettisoning the problem-solving metaphor . . . . . . . . . . . . . 2.4 The Coding Constraint . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 The role of feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
179 179 179 183 184 187
. . . . . . .
187 187 191 195 197 202 203
C 6 The logical problem of (second) language acquisition revisited . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. There is no logical problem of second language acquisition . . . 2.1 The form of the argument . . . . . . . . . . . . . . . . . . . . . 2.2 What is the logical problem of language acquisition? . . 2.2.1 Three basic assumptions . . . . . . . . . . . . . . . . . . 2.2.2 The linguistically innocent learner . . . . . . . . . . . . 2.2.3 The cognitively innocent learner . . . . . . . . . . . . . 2.2.4 The Poverty of the Stimulus Hypothesis . . . . . . . 2.2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. The empirical facts from first language acquisition . . . . . . . . . 3.1 From FLA to SLA? . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Input consists of more than strings of forms . . . . . . . . . 3.2.1 Meaning as input . . . . . . . . . . . . . . . . . . . . . . . 3.2.2 Feedback as input . . . . . . . . . . . . . . . . . . . . . . . 3.2.3 The Simplified Input Hypothesis . . . . . . . . . . . . . 3.3 The representational problem vs. the discovery problem 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. The empirical problem of second language acquisition . . . . . . 4.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Success Measure . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
207 207 208 208 210 210 213 215 222 226 227 227 228 228 230 233 236 238 239 239 239
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
x
TABLE OF CONTENTS
4.3
5.
The adult’s other cognitive “equipment” . . . . . . . . . . . . . 4.3.1 How meaning solves the logical problem of language acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.2 How feedback solves the logical problem of language acquisition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.3 The time element . . . . . . . . . . . . . . . . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . 240 . . 241 . . 241 . . 241 . . 243
C 7 Input and the Modularity Hypothesis . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The Modularity of Mind Hypothesis . . . . . . . . . . . . . . . . . . . . . . . 2.1 Componentiality, autonomy and domain-specificity . . . . . . . . 2.2 Modules are part of the “hardware” . . . . . . . . . . . . . . . . . . 2.3 Modular domains exhibit a specific ontogeny . . . . . . . . . . . . 2.4 The automaticity and speed of modular processing . . . . . . . . 2.5 Modules produce “shallow” outputs . . . . . . . . . . . . . . . . . . 2.6 “Cross-talk” is limited . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.7 The Schwartz model of modular L2 acquisition . . . . . . . . . . 2.7.1 K-acquisition vs. k-learning . . . . . . . . . . . . . . . . . . . . 2.7.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. What’s wrong with Fodorian modularity? . . . . . . . . . . . . . . . . . . . . 3.1 Problem one: We’re not talking about “language acquisition”, we’re talking only about “grammar acquisition” . . . . . . . . . . 3.2 Problem two: Fodor’s concept of what a module is is too crude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Problem three: The relationship between parsing and knowledge in proficient native speakers and acquisition in L2 learners is unclear . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Problem four: It cannot be true that all grammatical restructuring takes place as a direct consequence of a failure to process input bottom-up . . . . . . . . . . . . . . . . . . . . . . . . . 4. Whither Linguistic Competence? . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 What does a psychogrammar really consist of? . . . . . . . . . . . 4.1.1 Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.2 Meaning and form . . . . . . . . . . . . . . . . . . . . . . . . . . 5. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
249 249 250 251 252 252 254 255 255 255 255 259 260 261 262
271
273 274 274 274 278 284
xi
TABLE OF CONTENTS
C 8 The evidence for negative evidence . . . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. The empirical studies of indirect negative evidence, metalinguistic instruction, and feedback and correction . . . . . . . . . . . . . . . . . . 2.1 Indirect negative evidence . . . . . . . . . . . . . . . . . . . . . . . 2.2 The metalinguistic instruction studies . . . . . . . . . . . . . . . 2.3 The feedback and correction studies . . . . . . . . . . . . . . . . 3. The -ing affixation/morphological conversion study . . . . . . . . . . . 3.1 The subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Design and methodology . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4 Major findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
290 290 294 313 321 321 322 329 329 337 340
C 9 Feedback in the Autonomous Induction Theory . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. Focused attention and detectable errors . . . . . . . . . . . . 2.1 Focused attention . . . . . . . . . . . . . . . . . . . . . . 2.2 Detectable errors . . . . . . . . . . . . . . . . . . . . . . . 2.3 Error location, or the blame assignment problem. 2.4 Categorisation, i-learning, and feedback . . . . . . . 3. Learning the double object construction . . . . . . . . . . . . 3.1 The metalinguistic feedback . . . . . . . . . . . . . . . 3.2 The other forms of negative evidence . . . . . . . . 4. Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
347 347 348 348 350 354 357 363 364 366 366
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
C 10 The interpretation of verbal feedback . . . . . . . . . . . . . . . . . . . . . . . 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. How could feedback and correction initiate restructuring in the learner’s grammar? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Feedback and correction are types of speech acts . . . . . . . 2.2 To count as feedback an utterance must be parsed and interpreted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Irrelevance of linguistic feedback . . . . . . . . . . . . . . . . . . 2.4 The Relevance of feedback depends on its informativeness
. . 289 . . 289
. . 371 . . 371 . . 373 . . 373 . . 376 . . 377 . . 382
xii
TABLE OF CONTENTS
2.5 2.6
3.
The blame assignment problem . . . . . . . . . . . . . . . . Metalinguistic information, grammar teaching, and information which cannot be modelled . . . . . . . . . . . 2.7 The corrective intention and indirect forms of feedback Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . 386 . . . . . 388 . . . . . 389 . . . . . 390
Epilogue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 Appendix 1: Acceptability judgement task . . . . . . . . . . . . . . . . . . . . . . 395 Appendix 2: Experimental session . . . . . . . . . . . . . . . . . . . . . . . . . . . . 397 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 401 Subject index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449
List of tables
Chapter 6 Table 6.1: Complexity of language input (reconstructed from Morgan 1989: 352) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 8 Table 8.1: Means and standard deviations by group and session for the feedback items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.2: Results of repeated measures ANOVAS for the feedback items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.3: Between-group comparison of means of feedback items on the initial feedback session . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.4: Between-group comparison of means of feedback items on Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.5: Between-group comparison of means of feedback items on Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.6: Means and standard deviations by group and session for the guessing items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.7: Results of repeated measures ANOVAS for the guessing items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.8: Between-group comparison of means of guessing items on the initial guessing session . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.9: Between-group comparison of means of guessing items on Test 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.10: Between-group comparison of means of guessing items on Test 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Table 8.11: Between-group comparison of means of guessing items on initial guessing session (from Carroll and Swain 1993) . . . . . . . . Table 8.12: Between-group comparison of means of guessing items on Test 1 (from Carroll and Swain 1993) . . . . . . . . . . . . . . . . . . . . Table 8.13: Between-group comparison of means of guessing items on Test 2 (from Carroll and Swain 1993) . . . . . . . . . . . . . . . . . . . .
234
329 331 332 332 333 334 335 335 336 336 338 338 339
List of figures
Chapter 1 Figure 1.1: Transducing linguistic stimuli . . . . . . . . . . . . . . . . . . . Figure 1.2: The relationship of stimuli to input . . . . . . . . . . . . . . . Figure 1.3: A modular processing/learning model à la Schwartz (1993: 157) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 1.4: The Comprehensible Input Hypothesis (Faerch et al. 1984: 187) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 1.5: Types of possible inputs during speech processing . . . . Figure 1.6: Positive and negative evidence . . . . . . . . . . . . . . . . . . Chapter 4 Figure 4.1: The organisation of all levels of mental representation (Jackendoff 1987: 248) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.2: The organisation of the grammar (Jackendoff 1997: 100) Figure 4.3: Processing Hypothesis for perception (Jackendoff 1987: 101) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.4: Processing Hypothesis for production (Jackendoff 1987: 109) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.5: Problem-solving as the transition from an initial “problem state” to a “goal state” . . . . . . . . . . . . . . . . . . . . . . . . . Figure 4.6: Problem-solving as the transition from an “initial state” to an “intermediate state” and to an “end state” . . . . . . . . . . . . . . Chapter 7 Figure 7.1: Fodor’s model of faculty psychology . . . . . . . . . . . . . Figure 7.2: Schwartz’ model of modular processing . . . . . . . . . . . Figure 7.3: Interactive parallel processing . . . . . . . . . . . . . . . . . . Figure 7.4: Frazier’s modularity (Frazier 1990: 413) . . . . . . . . . . .
. 10 . . 11 . 13 . 15 . 17 . 23
. 122 . 124 . 125 . 126 . 132 . 132 . . . .
251 256 268 269
xvi
LIST OF FIGURES
Chapter 8 Figure 8.1: A model of Lexical Phonology showing the interaction of word-building processes and phonological rules (based on Jensen 1990: 92) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Figure 8.2: % Correct — Feedback items . . . . . . . . . . . . . . . . . . . . 330 Figure 8.3: % Correct — Guessing items . . . . . . . . . . . . . . . . . . . . 333
Acknowledgements
The origins of this book go back to the mid 1980s and my increasing dissatisfaction with the turn which generative work in second language acquisition was then taking. I decided to do something completely different. In the intervening years, I have instead been pursuing an agenda focused more on the integration of linguistic theorising with research on processing, perception, memory, and other aspects of cognition. My intellectual debt to Ray Jackendoff will be apparent from the first pages of this book. If Ray ever chooses to read it, I can only hope that he is pleased with the use I have made of his many ideas about linguistic cognition and meaning. Bonnie Schwartz and I have been discussing the specific themes treated here for years — over e-mail, at conferences, over the telephone. We don’t agree on the details but the discussions have always been stimulating and I expect them to continue to be. Merrill Swain was co-principal investigator of the project which led to the data set reported on in Chapter 8. This project was funded by the Social Sciences and Humanities Research Council of Canada (SSHRCC File no. 410–89–1484) and the Ontario Ministry of Education through its block grant to the Ontario Institute for Studies in Education. I would like to acknowledge the generosity of both of these funding agencies. I take pleasure also in acknowledging the assistance of Myriam Schachter, who carried out the experimental sessions with the subjects, Phil Nagy, who provided statistical advice, and Harmut Brasche who carried out the statistical analysis. I am grateful for this collaboration; I learned a great deal from it. Mike Sharwood-Smith read parts of the manuscript and made many important editorial suggestions. Kevin Gregg read two (only two?) complete drafts, commenting on everything from my interpretations of Chomsky to my spelling. The final version is vastly improved due to his pointed criticisms and suggestions. (And I still owe him a dinner!) Olaf Malecki proof-read the penultimate text and assisted with the bibliography and indices. Lydia White and Harald Clahsen are to be thanked for facilitating things at Benjamins. Kees Vaes of Benjamins was understanding about interminable delays in the delivery of the
xviii
ACKNOWLEDGEMENTS
manuscript (due to the use of three different versions of Word on three different computers in three different locations). I thank Cambridge University Press for permission to reproduce Figure 1.3 (also 7.2) taken from B. D. Schwartz “On explicit and negative data effecting and affecting competence and linguistic behavior” Studies in Second Language Acquisition 15(2): 157, Table 6.1, taken from J. Morgan “Learnability considerations and the nature of trigger experience in language acquisition” Behavioral and Brain Science 12(2): 352, and Tables 8.11, 8.12, and 8.13, taken from S. Carroll and M. Swain “Explicit and implicit negative feedback: An empirical study of the learning of linguistic generalizations” Studies in Second Language Acquisition 15(2); John Benjamin’s for permission to reproduce Figure 8.1, taken from John Jensen’s Morphology: Word structures in generative grammar; and MIT Press for permission to reproduce Figures 4.1, 4.3, and 4.4, which are all from R. Jackendoff’s Consciousness and the computational mind. Gyldendal were repeatedly approached for permission to reproduce Figure 1.4 from Learner Language and Language Learning (1984: 187), models of speech processing: Psycholinguistic and computational perspectives. If I have not lost sight completely of the many changes which have occurred within linguistic theory during my period of psychological reeducation, it is due to the members of the Linguistics Department of the Universität Potsdam. My colleagues Ria de Bleser, Damar Cavar, Gisbert Fanselow, Caroline Féry, Hans-Martin Gärtner, Ina Hockl, Barbara Höhle, Inge Lasser, Susan Powers, Doug Saddy, Matthias Schlesewsky, Peter Staudacher, and Jürgen Weissenborn, have created a particularly stimulating work environment for someone like me. It is hard to imagine a more interesting place to be if you’re interested in linguistic and psycholinguistic issues. If the relationship has at times seemed one-sided (all take and no give) I can only plead that the UP occasionally provides too many distractions. Ria de Bleser, Reinhold Kliegl, and Susan Powers have done me a thousand kindnesses in various ways at various times. The Parkes (John, Alexandra, Dennis, and Jessica), Diane Massam, Yves Roberge, Susan Ehrlich, Nancy McNee, Lynn Drapeau and Michael Rochemont have provided much e-humour and e-support over the years. To my two best friends Margaret McNee and Chantal Ramsay I owe many thanks for their generosity, affection, and ability to put everything into perspective when I can’t. Finally, I would like to thank my husband Jürgen M. Meisel for his encouragement. We have discussed aspects of this work over the years, and I have made extensive use of his ideas and his research. But JMM had a more important role to play in the genesis of this work. At some point, all discussions ended with: “Finish your book!” And so I did.
C 1 Questions, problems, and definitions
1.
Objectives and research questions
1.1 Objectives This book has several objectives. One is to explore the logical relationship between the language that second language learners hear and interpret and the nature of the knowledge of a second language (L2) that they ultimately acquire. A second objective is to investigate the particular role that feedback and correction might play in principle in second language acquisition (SLA). A third is to investigate these issues against the backdrop of research on Universal Grammar (UG), seen as an attempt to constrain language learning theory to what is logically possible and empirically observable. Fourthly, I wish to locate input and evidence within a particular metatheoretical space by embedding discussion of learning within a theory of speech processing and speech production. Fifthly, I intend to add to the body of empirical evidence which supports the hypothesis that feedback and correction have a role to play in initiating second language acquisition and determining the nature of its course, although I will also show that this role is much more limited than many might at present think. Finally, I hope to draw attention to one of the most under-researched and under-theorised aspects of second language acquisition — the raw material from which learners learn — and to argue that it deserves far more attention than it currently gets. (1.1) Objectives of this book a. to explore the logical relationship between the language second language learners are exposed to (input) and the linguistic competence and skills they ultimately acquire; b. to explore the function of feedback and correction in SLA; c. to fit input, feedback and correction into a constrained theory of second language acquisition;
2
INPUT AND EVIDENCE
d. e.
f.
to embed models of second language acquisition in a theory of speech perception and speech production; to add to the empirical evidence which shows that feedback and correction play a causal role in SLA, but to also show that this role is limited to specific kinds of language acquisition problems and to particular periods of development; to demonstrate that feedback and correction cannot be major explanatory mechanisms in a general theory of language acquisition.
My motivation is quite simple. Although SLA researchers agree about very little when it comes to explaining the how and why of L2 acquisition, one point on which there is consensus is that SLA requires exposure to the second language. If you want to learn Ojibwa, you will have to listen to Ojibwa speech, read Ojibwa texts, and attempt to reproduce Ojibwa sentences. This much would seem to be a matter of logical necessity, since there is as yet no Ojibwa pill on the market obviating the need for language learning, nor are there any reported cases of anyone coming to know Ojibwa through telepathy. Learners still have to work at learning, and learning requires exposure to the sounds or written forms of the L2. From these sounds and written symbols, learners must construct grammatical representations of the speech they are hearing or reading and infer a meaning. This said, consensus ends, for there is no agreement on what kind or how much exposure a learner needs.1 Indeed, we know very little still about the kinds of linguistic exposure learners actually get. Are they bathed in the sounds of the L2 in a way which makes the words and structures expressing particular meanings transparent? Or do they spend their days immersed in their own thoughts against a background rumble of noises which make no sense to them, and emerge only occasionally to participate in speech exchanges mostly involving highly ritualised language? On the basis of what we currently know, we could argue for either scenario. We also do not know how learners process the sounds they hear although processing speech is a necessary first step in constructing grammatical representations. Do all learners experience the L2 initially as a “wall of noise” whose sounds cannot even be perceived as syllables? Certainly, this is the initial experience of many learners. Or do learners of particular L1s learning particular L2s already come equipped with the knowledge necessary to re-analyse the continuous speech signal into discrete strings of sound units? Extensive search of the SLA literature has failed to produce any studies shedding light on this particular question. Much of the early SLA literature discusses learning in general terms and presupposes that learners already have the ability to perceive and extract “words” from the speech signal. Such discussions can shed no light on the initial state or on how the learner leaves it. Even if we limit our attention to learners at a subsequent stage of
QUESTIONS, PROBLEMS, AND DEFINITIONS
3
learning, still we do not know a whole lot about how they construct grammatical representations once they do perceive and encode “words”, although we have made some progress in the investigation of both global properties of interlanguage grammars and the specific details of a limited range of learning problems such as inflectional morphology, constituent order, reflexives, stress, syllable structure constraints, and the learning of phonetic segments. It is often suggested that learners discover form through meaning. But we know little about how learners make sense of the sounds they hear in context, that is to say, how they map meanings onto sounds. It is often implied that meanings are transparent whenever speech is used in context but observation of real language use suggests that many situations lend themselves to a number of mutually inconsistent interpretations. If meaning is to provide the way into grammatical form, we will require a sophisticated theory of meaning, appropriate for different groups of learners. In its absence, we must admit that we have no theory of input for SLA. A theory of input to learning is required for all sorts of reasons. We need it, first of all, to make sense of learnability research. Learnability studies ask how it is possible in principle to learn a language. One criterion is that learners get input, but no one has, as yet, explored the minimally necessary kinds of input needed to learn a single grammatical phenomenon. This is partly due to the complexity of even “simple” grammatical phenomena, partly due to the changing nature of grammarians’ views of how to define them. If we are to state how learners identify formal entities like “word”, “sentence”, or “passive construction” in the speech stream, then we will obviously need clear and unambiguous formal definitions of what they are. Secondly, it is not clear yet how to marry the idealisations of learnability theories to the psycholinguistic assumptions designed to characterise real learning in real time. Consequently, the claim that learners can learn on the basis of a single exposure to a given phenomenon is currently just as viable as the contrary claim that they need several hundred hours of exposure, correction, and practice. Indeed, there is not even agreement as to whether it really makes sense to claim that learners learn a language from exposure to the sounds of the L2. If second language acquisition is constrained by an innate languagespecific knowledge system called Universal Grammar, as many have maintained, and if by the word learning we mean hypothesis-formation and hypothesis-testing of the sort evidenced in problem-solving and concept learning, as Schachter (1992) has argued, then it would be misleading, say some researchers, to describe with this word what language learners do.2 Krashen (1981) has attempted to draw a distinction between acquisition (defined as the unconscious processes of infants and children, whatever they might be) and learning (the conscious processes involving metalinguistic information in the form of grammar rules of the sort
4
INPUT AND EVIDENCE
fostered in many language classrooms). Krashen argued that learners acquire from exposure to the sounds of the L2 in meaningful communication, but this is not learning. While Krashen has provided no direct empirical evidence for the distinction, many now assume that it is correct.3 To summarise, the most basic questions about the relationship between grammar learning in SLA and the speech or meanings on which it is based have yet to be sorted out. In this book I shall ask some of these basic questions, and I hope to provide some equally basic answers. 1.2 Research questions In pursuing my objectives, I will explore in detail four particular questions. The first is: Question 1 What is the nature of the raw material on which particular grammatical generalisations are formed?
In contrast to what appears to be the standard position in our field (see Carroll 1999b, for a defense of this claim), I will argue here that the input to language learning cannot consist of the objective properties of the speech signal or the physical properties of the page of a book. Language learning requires the transformation of wave forms or patterns of light detected by the eye into mental representations of a particular sort. Only these mental representations can be the starting point for language learning. Exploring how the wave form (or the patterns of light) is converted into mental representations ties a theory of language learning irrevocably to research on speech processing.4 Unfortunately, contemporary research on speech processing has had little influence on main stream SLA theorising which has contented itself largely with the investigation of ideas emanating from grammatical theory, in particular, Principles and Parameters Theory, and from first language acquisition research done within the same paradigm.5 While individual efforts can undoubtedly be cited, for example, Myles’ (1995) treatment of working memory in the parsing and acquisition of whconstructions in French by English speakers, the on-going research in various centres in Europe (especially in The Netherlands) on the L2 and bilingual lexicon (see, e.g., de Groot and Nas 1991; de Bot, Cox, Ralston, Schaufeli and Weltens 1995; Pal 1999, 2000), Juffs’ (1998a, b) studies of sentence-based parsing in a principle-based parsing framework (see also Juffs and Harrington 1995, 1996), or Pienemann’s use of production models in his Processability Theory (Pienemann 1998a, b), such efforts have tended to be narrow in focus. Certainly they have had little influence on the way in which SLA tends to be conceptualised in the
QUESTIONS, PROBLEMS, AND DEFINITIONS
5
textbooks we use to train our students. I hope this book may help to change this state of affairs. The second question to be asked is: Question 2 What is the role that conceptual information plays in the learning of grammatical distinctions?
In contrast to some generative SLA research to be discussed later in detail, I shall argue that input to learning cannot consist solely of information entering the linguistic systems “bottom-up” from the sensory systems. Some grammatical learning requires linkages between concepts and morphosyntactic representations; some grammatical learning requires linkages between concepts and phonological representations. Both possibilities presuppose that conceptual distinctions are accessible to and accessed by learning mechanisms constructing novel grammatical categories or novel grammatical structures. At the same time, I hope to show that learning mechanisms make constrained use of conceptual information. This entails that learning mechanisms must make use of grammatical features, categories and structures of an autonomous sort. Grammatical categories are not mere convenient labels for conceptual categories, thus allowing us to dispense with all that formal nonsense so beloved of grammarians! Nor can the constraints which hold of interlanguage grammars be derived from conceptual functions. Consequently, a theory of SLA must show how grammatical categories and structures can be acquired given appropriate input. My view is that that input (including input of a conceptual type) must be defined in formal terms if it is to account for grammatical knowledge. The third question to be examined has to do with the role of feedback and correction in adult SLA as a motor of grammatical restructuring. Question 3 What is the role of feedback and correction in adult second language acquisition?
It is a major thesis of this book that correction and feedback play a role in the creation of novel forms of grammatical knowledge. I intend to demonstrate that for correction and feedback to play a role in grammatical restructuring, what is said to the learner must receive a metalinguistic interpretation. In other words, she must understand that someone is talking about her speech, which in turn demands that she form a conceptual representation of her own utterance. The deployment of correction and feedback thus operates initially from within the conceptual system of the learner, where what has been said is interpreted and integrated into
6
INPUT AND EVIDENCE
a model of the on-going speech situation. This logically entails that meaning must be able to influence grammatical restructuring. Talk of meaning requires us to have a theory of meaning. This work will make extensive use of two distinct approaches to that subject. On the one hand, I will adopt Jackendoff’s theory of Conceptual Semantics, which is especially congenial to psycholinguistic argumentation and data, in contrast to much of the truth-theoretic formal semantics developed within the philosophy of language and pursued within linguistic theory.6 Nonetheless, it must be admitted that Jackendoff’s theory is one which attempts to describe and explain in principle how linguistic expressions can have the meanings they do. It is not per se a theory of inferencing or ratiocination. In order to explain how learners can derive information from feedback and correction as a prelude to learning some new part of the second language, I will need such a theory of inferencing. I shall, therefore also draw heavily on Sperber and Wilson’s (1986) Relevance Theory. Both theories will be presented in some detail in subsequent chapters. They are necessary supports for a demonstration that feedback and correction has a role in any theory of adult SLA.7 Notice that such a demonstration will not arise just from amassing a large amount of empirical data showing that the provision of feedback coincides with behavioural changes among targetted groups of learners. In other words, I might have attempted to show merely that learners appeared to know things after having received feedback or correction, that they did not know before, and that they were able to act on the basis of that knowledge. Or, I might have shown that learners have somehow acknowledged the provision of feedback and correction. This has been the tack of the interactionist approach to input (see Pica 1994; Wesche 1994; Long 1996, for state-of-the-art reviews of a large number of empirical studies). It is important to show that there is an empirical phenomenon to investigate. If the provision of feedback and correction never is conducive to seeing changes in learners’ behaviour, and we conclude that it has no effects on the learning of any grammatical distinction, then that would be the end of the matter. But the demonstration that feedback and correction sometimes are effective will not substitute for a theory of feedback and correction. It will be the theory which provides an explanation and not the empirical facts. Although I have said before that the possible effects of feedback and correction in SLA is a matter for empirical investigation and not one to be settled by theoretical fiat (Carroll and Swain 1991, 1993; Carroll 1995b), and although data, my own and that of others working on this topic, will be presented and reviewed in Chapter 8, I will not be content to show that feedback works with this individual or with that one, for this problem or for that problem. The longterm goal is to develop an explanation of learning on the basis of this kind of input.
QUESTIONS, PROBLEMS, AND DEFINITIONS
7
The elaboration of such an explanation cannot proceed without participating in current debates about the causal factors in SLA. These debates have to do with the major theoretical question animating much SLA writing: What is the nature of the constraints on SLA? Question 4 What is the nature of the constraints on SLA?
I shall also explore this question. In SLA research, Question 4 can be broken down into three subparts: Question 4 (in more detail) a. What role does existing knowledge, in particular knowledge in the form of linguistic universals, play in determining the form of interlanguage grammars and the course of developmental patterns? b. What role does existing knowledge, in particular knowledge in the form of acquired knowledge from the first language, play in determining the form of interlanguage grammars and the course of developmental patterns? c. How are novel forms of knowledge created?
Question 4a is the UG question: How does Universal Grammar constrain the processes that form grammatical categories and construct linguistic representations? Question 4b is the transfer question: How does knowledge of the L1 constrain the processes that form grammatical categories and construct linguistic representations? Question 4c is the creation question: How do we construct new grammatical categories and new linguistic representations? Can we do so uninfluenced by prior forms of knowledge? The answer to the last question, in this book, will be a firm “No.” Novel grammatical representations will consist of novel combinations of existing elements constrained in a variety of ways. I take very seriously the problem of constraining a theory of language learning to exclude forms, structures and generalisations which in fact no learner ever seems to acquire. The theory of language learning to be presented here is, moreover, embedded in a modular theory of the mind in the precise sense that UG provides a repertoire of representational primitives and combinatory operations which no learning mechanism can countermand or override. New representational primitives and new combinatorial operations cannot be acquired. Within these constraints and others to be mentioned as we go along, there is still much room for talk of learning. But before we turn to the theoretical debates, let us start with some definitions.
8 2.
INPUT AND EVIDENCE
Definitions: stimuli, intake, and input
2.1 Stimuli and transducers: the first level of processing It is standard practice in the SLA literature to refer to the speech that learners hear in meaningful contexts as input, borrowing a term from the Artificial Intelligence literature. Most writers, however, do not bother to define what they mean by the term. In some cases it is clear that what is intended is some kind of physical entity.8 Input in this sense consists of events effecting the visual and auditory perceptual systems. They can be understood to be acoustic-phonetic events, in the case of speech, or graphic objects, in the case of written text, produced by an individual for some purpose on a specific occasion. These events and objects are observable by third parties; they can be recorded, are measurable, analysable, and hence objectively definable. They are the “stuff out there” which we will need to refer to from time to time, although in my view this constitutes the least interesting aspect of a theory of input. Let me introduce the term stimuli into the discussion and refer to all such observable instantiations of the second language with this word, using the standard terminology of the psychology of learning.9 This means that when people say that learning a second language requires exposure to the L2, minimally what they intend to say is that learning requires exposure to L2 stimuli. If you want to learn Ojibwa, you need exposure to Ojibwa stimuli, but if you want to learn Plains Cree, you need instead exposure to Plains Cree stimuli. It has been well understood for some time, however, that language acquisition depends not so much on what transpires in the observable environment, but rather on what transpires in the unobservable mind/brain of the learner. Learners can, on a given occasion, attend to some stimulus in the speech environment, process it, and acquire some bit of knowledge about the L2. On a different occasion, however, the same learners may not attend to the same relevant physical events, not process the relevant stimulus, and not learn anything about the language. From the learner’s perspective, these situations are quite different; you hear the L2 utterance in the first situation but in the second, you just don’t quite get it. To differentiate, therefore, between the observable stimulus and the information that the learner extracts from it, most researchers contrast input and intake (Corder 1967: 165, 1978: 82; Krashen 1981: 102; VanPatten 1993: 436). The minimal definition of intake is that it is some processed stimulus. By this, I mean that the stimulus has been mentally represented. We can contrast intake and stimuli accordingly by saying that intake is a mental representation of a
QUESTIONS, PROBLEMS, AND DEFINITIONS
9
physical stimulus, as Corder’s original discussions make clear. This leads to the question: What kind of mental representation is intake? Much of the interaction literature claims that intake is comprehended speech (Gass, MacKay and Pica 1998). Comprehended speech in turn corresponds to a speech signal which has been successfully parsed and re-encoded in semantic terms. Under this view, all grammatical learning would involve concept learning, since the properties of grammars would be derived from information encoded as concepts. Thus, the raw material for grammatical learning would be concepts encoded in what I am calling here conceptual representations. However, if this were literally true, then it would be possible for learners to reflect about all aspects of their own grammar learning since conceptual representations are also the representations in which we think. Learners ought even to be able to project the various aspects of grammatical knowledge into conscious awareness, since our semantic encodings appear to be the basis of our phenomenological experiences (Jackendoff 1987). There is, however, not a single shred of evidence that learners can think about the sounds of the speech signal in terms of the constituents and properties which must be acquired if they are to master the phonology of the L2. Thus, we have no reason to think that as learners learn the sounds of the L2, they decide to extract syllables out of the speech signal, or to break down syllables into sequences of consonants and vowels. No one says: “I must learn to hear the consonants and vowels of this noise you’re making” and even if they were to say such a thing, it would have no consequences for the learners’ actual experiences. Even highly trained phoneticians and phonologists hear the speech signal of an unfamiliar L2 initially as just a lot of noise and not as sequences of syllables or “words.” Similarly, there is no evidence that as learners learn enough of the L2 to begin to extract syllables, they simultaneously can think about them as morphosyntactic features and structures. The learner does not say: “Now I am hearing a subject and that word must be an auxiliary” even if the successful interpretation of the speech signal requires the analysis of the signal as an NP Aux sequence. I conclude that the view that intake is comprehended speech is mistaken and has arisen from an uncritical examination of the implications of Krashen’s (1985) claims to this effect. I discuss Krashen’s Comprehensible Input Hypothesis in some detail below so I will limit myself here to pointing out that comprehending speech is something which happens as a consequence of a successful parse of the speech signal. Before one can successfully parse the L2, one must learn its grammatical properties. Krashen got it backwards! Obviously, before we can sort out how information gets from the “outside” in to the learning mechanisms, we must first understand something about the nature of speech parsing in knowledgable and proficient hearers of a language.
10
INPUT AND EVIDENCE
In order to have a term which refers to minimally processed stimuli, I shall define intake in this book to be literally that which is taken in by the hearer, following Corder’s logic. Stimuli which make it into the system will be, moreover, characterised as transduced stimuli. Transducers are defined as analogue processors that convert stimuli available to the perceptual systems into neural signals that covary in a direct fashion with properties of the signal (Lowenstein 1960). In speech processing, for example, it has been hypothesised that speech signals are first coded in the peripheral auditory system when stimuli such as the bursts of noise associated with simple (or steady state) vowels and stop consonants occurring in consonant-vowel (CV) syllables trigger auditory-nerve activity. The sounds represented in phonetic transcription as /pa/ and /ba/ would be good examples of such simple syllables. The triggering events, here complex acoustic properties of the syllables /pa/ or /ba/, correspond in a direct way at this level of representation to the discharge patterns of the nerve fibres (Pisoni and Luce 1987: 27). So the discharge patterns of the nerve fibres instantiate the relevant acoustic properties. These analogue representations, however, are only the first stage in a series of complex transformations of the stimuli into discrete, structural representations. Keep in mind that the signal does not exhibit such discrete strings. Transduction and transformation of the signal are thus essential to linguistic representation of any sort. Once stimuli are mentally represented and “in the system”, other processors can extract additional information from them. See Figure 1.1.
specific acoustic properties of the linguistic stimuli
→ → →
analogue representations characterised in terms of varying discharge patterns of the auditory nerve
Figure 1.1. Transducing linguistic stimuli
Stimuli are thus converted into intake. Intake from the speech signal is, however, not input to learning mechanisms. Rather it is input to speech parsers. Parsers are mechanisms designed to encode the signal in various representational formats. They consist of various acquired procedures encoding grammatical distinctions and are tuned to the frequency of particular structures in the language the hearer hears. Good parsers are thus not only able to parse the sentence which has never been heard before, they are designed to operate very quickly on precisely those
QUESTIONS, PROBLEMS, AND DEFINITIONS
11
structures which occur over and over again. Let us think about language processing in terms of the functional architecture of linguistic cognition. It is commonplace to picture the mind/brain in terms of “black boxes” which operate on mental representations of information transforming them into other mental representations. See Figure 1.2. Conceptual representations ↑
(↓)
↑
Processor n ↑
(↓)
↑
Analysed and represented output Processor 3 ↑
(↓)
↑
Analysed and represented output Processor 2 ↑
(↑)
↑
Analogue representation of the stimuli Processor 1 ↑
(↑)
↑
Potential stimuli Figure 1.2. The relationship of stimuli to input
Note that in a modular bottom-up only system, the direction of information flow would be in a single direction only. The arrows in parentheses in Figure 1.2 are designed to raise the possibility that information can also flow “downwards.” Models of speech processing which permit information to flow in both directions are commonly called interactive. The model adopted here will be partially interactive in that it treats the phonetic and phonological processing as bottom-up to the point of lexical selection. See below. Functional architecture pictures of this sort treat mentally represented information as the input and output of various processors. Input is therefore not something “out there” in the external environment; it is, like intake, a purely mental construct. Moreover, input need not be restricted to representations transduced from proximal stimuli. It also can consist of second-order representations
12
INPUT AND EVIDENCE
which convert intake into structural representations. And in interactive models of processing, input can also be mental representations stored in longterm memory (LTM) and activated to participate in some processing. In such models, input can also be mental representations created afresh from inferencing, problem-solving or ratiocination of the “I just sort of dreamt it up” variety. In Figure 1.2, each level of represented information serves as input to the next processor and is the output of the processor which created the relevant representations. 2.2 Modularity and the Comprehensible Input Hypothesis Saying even this much about input, admittedly not much at all, requires making certain assumptions about language processing and language processors. I have framed the definition of input in terms drawn from classical structural theories of information processing. These theories claim that mental processes are sensitive to structural distinctions encoded in mental representations (see Carroll 1989b: 535–43, for more discussion from an SLA perspective). According to this approach to cognition, input is a mental representation which has structure. Classical connectionist approaches to linguistic cognition (Rumelhart and McClelland 1986, 1987; Gasser 1990; MacWhinney 1987a, b, 1992, 1997), on the other hand, explicitly deny the relevance of structural representations to linguistic cognition, claiming that linguistic knowledge can be encoded as activated neural nets instantiating, e.g. concepts, and linked to acoustic events merely by association.10 Classical connectionist approaches therefore allow talk of stimuli and intake (transduced information), processes, and even mental representations, but they do not allow talk of mentally represented structure. Anyone who is convinced that the last 100 years of linguistic research demonstrate that linguistic cognition is structure dependent — and not merely patterned — cannot adopt a classical connectionist approach to SLA.11 I happen to be convinced that linguistic knowledge is structure dependent — indeed dependent upon just the sorts of structural distinctions that grammarians like to talk about (dominance, precedence, sisterhood, c-command, etc.). I will therefore assume throughout that language is mentally represented in terms of structures, and that constructing a model of language processing and language processors involves being explicit about the structures that are being built, and modified. In the previous section I defined input in a way that information is not restricted to what is derived on-line during the bottom-up processing (from signal to construal) of a linguistic stimuli. See Figure 1.2 again. Schwartz (1986, 1987, 1993; Schwartz and Gubala-Ryzak 1992), in contrast, claims that input for second language acquisition is derived solely from the analysis of intake by the
13
QUESTIONS, PROBLEMS, AND DEFINITIONS
linguistic processors. She provides the view of processing shown in Figure 1.3, based on an interpretation of Fodor’s (1983) Modularity of Mind hypothesis and his theory of faculty psychology.
Vision module Things out there in the world (e.g. that what we see or hear)
Language module
Central processing systems
etc. Figure 1.3. A modular processing/learning model à la Schwartz (1993: 157)
According to Schwartz’ view, the perceptual and linguistic systems analyse linguistic stimuli and intake independently of a mental representation of the ongoing speech situation, of world knowledge, of information inferrable through visual or non-language auditory processing of stimuli in the environment, and so on. Since there is considerable empirical evidence for the interaction of conceptual information and the analysis of stimuli, ranging from phoneme restoration effects to word recognition (see Jackendoff 1987, for a summary), Schwartz’ argumentation hinges entirely on the nature of on-line language processing at critical decision-points. Since Schwartz’ claims, if correct, would preclude any role for feedback and correction, or indeed for any metalinguistic information, in grammar acquisition, they must be examined closely. I will come back to her arguments in Chapter 7. Despite the fact that Schwartz (1987) claims to be re-interpreting Krashen’s proposals, quite a different view of input is provided by Krashen, who has articulated what he calls the Comprehensible Input Hypothesis: … in order to acquire, two conditions are necessary. The first is comprehensible (or even better, comprehended) input containing I + 1, structures a bit beyond the acquirer’s current level, and second, a low or weak affective filter to allow the input “in.” This is equivalent to saying that comprehensible input and the strength of the filter are the true causes of second language acquisition. (Krashen 1982: 33)
Krashen refers to stimuli which are comprehensible, i.e. which the learner is capable of comprehending, or which are indeed comprehended by the learner. Comprehensibility is not the same thing as parsability. We may be able to provide a construal for a string like My walk walk longtime you see — perhaps “I walked a long time in order to see you” — without being able to parse it.
14
INPUT AND EVIDENCE
Indeed, we may infer that talk of input “beyond the acquirer’s current level” means “beyond what the learner’s parser will now parse.” Krashen therefore appears to be saying that the mechanisms responsible for grammar learning are operating on the basis of semantic representations despite the fact that the learner’s parsers cannot parse the speech the learner is hearing. The Comprehensible Input Hypothesis therefore requires an interactive, possibly non-modular mind, with information from the conceptual system determining the construction of linguistic representations in the grammar. The Schwartz model does not require that intake be comprehended in order for language acquisition to take place on the basis of an analysis of it. Indeed, given that comprehension is the end-point of processing in her model, it could not play a role in the acquisition of phonological or morphosyntactic knowledge. For Schwartz, the comprehension of the stimulus must be irrelevant to grammatical restructuring; for Krashen, however, the comprehension of the stimulus is a precondition for the acquisition of structures not currently part of the learner’s interlanguage grammar. A position like Krashen’s is spelt out explicitly in Faerch, Haastrup and Phillipson who appear to adopt both the distinction between input and intake, as well as the Comprehensible Input Hypothesis:12 Some of the input can be interpreted directly by means of the knowledge the learner already has of the foreign language (the learner’s “interlanguage” knowledge (…) This means that the psycholinguistic rules (…) which a native speaker has used in order to produce the language are matched by rules in the learner’s IL system. The learner can also interpret input by means of inferencing strategies (…) by making qualified guesses as to the meaning of input (“top-down processing”). The result may be total or partial comprehension, or there may be a residue of input which is incomprehensible. (Faerch et al. 1984: 187)13 Foreign language learning presupposes input which is made comprehensible by means of inferencing and not by the direct application of a psycholinguistic rule. To continue with our metaphor from computer science, we can refer to those parts of input which satisfy the conditions we have just specified for learning to take place as : intake is input which affects the learner’s existing knowledge of the foreign language.14 In Figure [1.4], (b) refers to intake. (Faerch et al. 1984: 188).
The picture they present is shown in Figure 1.4.
QUESTIONS, PROBLEMS, AND DEFINITIONS
Comprehensible input
Incomprehensible input
15
a
can be interpreted directly by means of a learner’s existing IL knowledge
b
can be interpreted by means of inferencing procedures
c
cannot be interpreted
Figure 1.4. The Comprehensible Input Hypothesis (Faerch et al. 1984: 187)
According to this view of learning, it must be the case that conceptual information can interact with the learner’s grammar. Finally, the assumption that meaning can determine the course of acquisition is a necessary part of the Interaction Hypothesis of Long (1980), who argues that learners’ grammars change as a result of conversational interactions in which native speakers are forced to simplify their speech (to accommodate the limited comprehension of the learners) while learners are obliged to elaborate their speech in the face of their conversational partner’s failures to comprehend them.15 There are two ways to interpret the claims of this literature. One is to say that comprehension of the meaning of a stimulus is sufficient for linguistic development. If you understand the meaning of an utterance, then you can learn its form and grammatical organisation. A stronger claim which emerges from this literature is that comprehension is necessary for acquisition. If you don’t understand the meaning of an utterance, then you cannot learn its form and grammatical organisation. Under either interpretation, it must be the case that representations of meaning can interact with those parts of the acquisition mechanism responsible for learning the sound system and the morphosyntax of a language.16 If modularity can be shown to hold in the strong sense of Schwartz, then the Interaction Hypothesis must be rejected. If modularity can be shown to hold in the weaker sense that I argue for, then the Interaction Hypothesis will require some serious emendation. I shall have more to say about both Schwartz’ version of The Modularity of Mind Hypothesis and the Comprehensible Input Hypothesis/Interaction Hypothesis in subsequent chapters so I will reserve making detailed comments about them until then. Suffice it to say here that the contrast between them brings out one important fact, which is that we will not understand the nature of SLA by studying the putatively objective properties of the stimulus. What matters for
16
INPUT AND EVIDENCE
language acquisition is how stimuli are analysed. This processing is not directly observable. It cannot be directly measured. At best we can make inferences about it on the basis of other observable events such as the utterances that learners produce, the interpretations they assign to utterances they hear or read, the time it takes them to interpret an utterance, their judgements of the acceptability of utterances, etc. These events must be construed as the outward consequence of internal events which we can only study in the context of a theory of language processing. The scholarly disagreement between Schwartz, on the one hand, and Krashen, Faerch et al., and Long, on the other, does not hinge on logical properties of the SLA problem. It hinges on matters of fact, namely, on how linguistic stimuli are processed. Consequently, to explore the relationship between language acquisition and the speech events in which learners find themselves we will necessarily have to spell out what language processing is like. Much of this book is dedicated to doing just that. 2.3 Processing in the Autonomous Induction Theory In the theory being expounded here, language processing is both autonomous and interactive. The various representational systems needed to encode acoustic, articulatory, phonological, and morphosyntactic representations in interpreting the speech signal or in articulating our ideas are all distinct and not reducible to any other. Speech processing is both bottom-up and top-down. In contrast to the above-mentioned authors, I use the term input to refer to any mental representation which is analysed by a processor. A possible picture of input is shown in Figure 1.5. The first thing to do in developing an explanatory theory of input is to make a distinction between these inputs to speech processing and input to learning mechanisms. In the theory of speech processing adopted here, the input to a processor (either on the signal-to-interpretation side of things or on the conceptto-articulated speech side) is any representation which can be “read” by the procedures of that processor. Procedures encode acquired grammatical distinctions and so are the by-product of language acquisition. In the theory of acquisition to be elaborated in this book, novel encoding of information is triggered by on-line events related to the processing of a speech stimulus. In particular, I want to claim that the novel encoding of information occurs when parsing fails. Input to the parsers is therefore necessarily different from input to the acquisition mechanisms and we must be clear about which one researchers are referring to.
QUESTIONS, PROBLEMS, AND DEFINITIONS
Input types
Processing mechanisms Auditory nerve fibres
Linguistic stimuli ? ? Phonological representations Word class features, number, gender, & other morphosyntactic features, subcategorisation
17
? ? Lexical activation
Syntactic parser
Conceptual category, theta roles (maybe) s-structure representation lexical conceptual representations of words of the s-structure
Conceptual Interpreter
argument structure default conceptual structures provided by various Mental Models Figure 1.5. Types of possible inputs during speech processing
2.4 Positive and negative evidence, positive and negative feedback I can now rephrase Questions 1–3 as: What role is played by input (to learning mechanisms) derived from stimuli, input (to learning mechanisms) derived from inferencing, and by input (to learning mechanisms) derived from feedback and correction in SLA? In order to answer this question, we must make a distinction between positive and negative evidence, and positive and negative feedback. The term positive evidence occasionally is used to refer to physical instantiations of the L2. This usage fails to do any work and should be given up. Sometimes it is used to refer to analyses of linguistic stimuli in the sense that I have so far been using the term input, as is the term primary linguistic data. In other
18
INPUT AND EVIDENCE
words, the stimuli give rise to a particular form or meaning as a consequence of parsing. Positive evidence in this usage refers to the output of the parsing system, and not to input to acquisition mechanisms. If one thinks about it, however, the term evidence refers to proof of something. Positive evidence should therefore be thought of as stimuli which provide evidence for a particular analysis. Researchers who work on language processing often talk of cues in precisely this way. In German, the occurrence of an AdvP at the left edge of a sentence is a cue for the presence of a verb in Comp, e.g. the sequence Gestern at the left edge of a matrix clause will be a cue that the next constituent may be a tensed verb — Gestern habe ich meine Pfingstrosen geschnitten (‘Yesterday have I my peonies cut’) and not an NP or a pro-NP, e.g. *Gestern ich habe meine Pfingstrosen geschnitten (‘Yesterday I have my peonies cut’). If we say there is no evidence for the learner for a particular analysis, then, we are claiming that there is no stimuli which can reasonably be analysed in a specific way. I have said that language acquisition will take place when parsing fails. Therefore the absence of evidence for a particular parse will coincide with the triggering of the acquisition mechanism. What is positive evidence for SLA? Since we have said that acquisition begins when parsing fails, the evidence for a novel analysis cannot come from the speech signal. At the moment of parse failure, the parser is attempting unsuccessfully to complete a bottom-up analysis. Thus, positive evidence cannot be literally in either the speech signal (it doesn’t instantiate grammatical constructs) or in the partially parsed representation the parser has attempted to build. At best, what can be meant is that once the learning mechanisms have somehow arrived at the appropriate information needed to successfully parse the previously unparsable, then that entity can be construed as the element which caused grammatical learning. This hardly advances our understanding of acquisition, but it does allow us to distinguish between stimuli or input (to a parser) and the stimuli or input which cause parsing failure. Let’s agree to refer to the latter with the term positive evidence. The term negative evidence is usually used to refer to information derivable from strings which are not part of the grammar of the language which the learner is to learn. Since the strings do not occur in the language, the potential for confusing input to parsing and evidence for learning does not arise. Illformed strings are not parsed by definition. How do such strings provide evidence for the acquisition mechanisms? Since I am interested in how information could be derived from such ill-formed strings, it won’t do to merely construct examples. The term negative evidence will be used here to refer to representations which can be defined neither as an intake representation of a stimulus nor as a parse of a given bit of intake (since the string is not part of the language), but which
QUESTIONS, PROBLEMS, AND DEFINITIONS
19
nonetheless convey information about structures of the language to the acquisition mechanisms. The sentence in (1.2a) can motivate some positive evidence about word order in a learner of English whose grammar cannot yet analyse NP V infinitival complement structures. The sentences in (1.2b–c) are the basis for deriving negative evidence about English word order. (1.2) a. Jessica likes to go to circus school. b. *Likes Jessica to go to circus school. c. En anglais, le sujet logique d’une proposition ne doit pas normalement, dans une phrase déclarative, suivre le verbe principal à temps fini. ‘In English, the logical subject of a proposition in a declarative sentence normally does not follow the tensed main verb’
The sentence in (1.2a) instantiates the information that in a declarative sentence in English the logical subject of the proposition , namely , instantiated by a form Jessica, may appear to the left of the form likes to go to circus school, which instantiates the predicate.17 Grammarians usually shorten this complex statement by saying that the subject (or S) in English precedes the verb (or V). Under current generative assumptions, it is assumed that the subject NP is the daughter of a Tense Phrase or an Infl Phrase (see Pollock 1989), occupying a so-called specifier position. If the learner’s grammar does not have any representations of sentential structure, and the learner’s parser has no parsing mechanism which will permit the analysis of Jessica as an NP-subject which must be attached as a left branch of a Tense Phrase, then the parser will break down in attempting to analyse Jessica. See (1.3).18 (1.3) Current input and current parse Input: Jessica Analysis: NP Attachment rules: None Addition to the parser which must be acquired by the learning mechanism: “Attach NP-subject as left branch of Tense Phrase” Tense Phrase NP N Jessica
20
INPUT AND EVIDENCE
Once the learning mechanism has successfully added the correct parsing procedure to the parser, it can be said that (1.2a) provides positive evidence to the learning mechanism that the subject precedes the verb in English. A successful parse of the sentence will in fact deploy this information (once it is represented in a parsing procedure). This sentence, of course, instantiates many other things as well, and hence could provide the learning mechanisms with evidence for many other distinctions. Sentence (1.2b) is an artefact. It is not a naturally occurring utterance used in normal communicative situations.19 It has been created by placing the tensed verb in the first position of the sentence. The asterisk “*” indicates that the string is not a possible sentence of English (accepting the cautionary remarks of endnote 19). The string plus the intended meaning given in the proposition together with the conventional interpretation of the asterisk, and the comparison with sentence (1.2a) provide the negative evidence that the verb and the subject cannot appear in the positions given. Thus, a learning mecanism would need to combine all of this information to represent the facts. Let me emphasise this by rephrasing the point: the negative evidence that in a normal declarative sentence in English the verb cannot precede the subject arises as an inference from the form-meaning correspondence *Likes Jessica to go to school/ , the interpretation of the asterisk to mean , and the systematic comparison of the starred form and the well-formed string Jessica likes to go to circus school. The string (1.2b) has no intrinsic properties making it evidence for anything; the evidence only arises through the described inferencing. The string in (1.2c) also permits the same inference, for anyone who can comprehend it. Once the string has been interpeted, it can be used to make inferences about the position of the subject and the tensed verb in English sentences. It thus instantiates in French information about word order in French (and thus might provide a mechanism learning French with positive evidence) but it expresses information about word order in English. It licenses inferences about French grammar as positive evidence, and licenses quite different inferences about English grammar as negative evidence. To sum up, positive evidence is evidence for a particular analysis of a stimulus which the learning mechanism uses to encode a category, a parsing procedure and/or an interpretation of linguistic stimuli. It is the entity which requires restructuring the grammar and/or the creation of new production schemata. The evidence will often not be directly encoded as physical properties of the speech signal, rather it can emerge in any of the mediating levels of
QUESTIONS, PROBLEMS, AND DEFINITIONS
21
processing, or via the conceptual system. Negative evidence, in contrast, cannot be derived bottom-up by the parsing and interpretation of a linguistic stimulus since, by definition, the information would have to be derived from sentences which are either not parsable or not interpretable. Rather, negative evidence emerges as result of a complex inferencing process which must work top-down and which involves a systematic comparision of an analysable string to an unanalysable string, or some other inferencing process which leads to the conclusion that some property is not part of the grammar of the language. Since, by definition, the systematic comparison of representations cannot be based on the on-line bottom-up analysis of a string, it is important to ask just where and how these representations are being generated. If the story here is correct, they must be generated as conceptual representations, raising the problem of how the conceptual system could accurately encode the grammatical properties needed by the performance systems. While it might be the case that conceptual structures could encode as concepts information directly relevant to the contents of a separate mental grammar (understood as sets of rules and categories), there is no reason to assume that conceptual structures could encode the procedural information needed to parse a representation. The term feedback will be used to characterise particular kinds of utterances which are produced by, for example, native speakers, but which are used by learners to derive positive and negative evidence about the second language. The term is designed to draw our attention to the fact that feedback occurs as a response to something the learner has said or done and only counts as feedback for the purposes of language acquisition if the learner so construes it. Feedback is a general category which includes repetitions of things the learner has said, anticipating and “filling in” things that the hearer expects the learner to say, questions about intended meanings, statements of a failure to understand, and so on. Since the constructs positive evidence and negative evidence have focused on the role of information in the learner’s processing system, it would be useful to have some terminology to distinguish what the respondent is reacting so. Feedback can be directed at pronunciation, the appropriateness of a word choice to express a given idea, word order, morphological wellformedness, or any other aspect of grammatical organisation that people are capable of reacting to and commenting on. It is also produced as a consequence of interactions which have language at their centre. Since language can convey information about anything that we can communicate about, it can also be used to make commentary about itself. Interactions in which one speaker makes comments to another about that person’s use of language can be said to provide that person with feedback about his language use. They can be an attempt to confirm that some expression or
22
INPUT AND EVIDENCE
sequence is well-formed, or else they can be an attempt to state that some string produced by the learner is not well-formed. I will call the first type of feedback positive feedback, and the second type will be called negative feedback. I give an example in (1.4).20 (1.4) a. b. c. d. e. f.
A: is it [flanz6nkatal"g]? B: [katalo˜g® ] nicht [katal"g], we don’t have ["]. It’s so easy. A: Yes, but is it [flanz6nkatal"g]/[katalog] B: /[katalo˜k] … yes A: I mean is it plural in this sort of compound or what? B: [pflanz6n] yes
The L2 learner in example (1.4) wanted to know what the correct form was of the German version of plant catalogue. She had been looking at a British seed catalogue and had noticed that British English differs from her North American variety in using a plural first element, e.g. plants catalogue, drugs problem, weapons trade, etc.21 This raised the possibility that German compounding might follow either the British model or the American one. Her initial question was designed to confirm that German compounding follows the British model. The learner was thus attempting to elicit feedback about morphological structure. Her interlocutor, however, couldn’t get beyond her non-native pronunciation and decided to correct that instead. He offers two corrections of the initial pronunciation (1.4b, d), explicitly mentioning her faulty pronunciation of the final vowel in (1.4b), and implicitly correcting her pronunciation of the final consonant. He thus provided (negative) feedback that her pronunciation of the compound is not permissible. The confirmation that the first element of the compound is indeed plural in the interlocutor’s final utterance (1.4f) can also serve as an instance of feedback — positive feedback this time, telling the learner that her surmise was indeed correct. To summarise: negative feedback is an utterance which is normally intended to communicate the information that something is not possible in a language. Positive feedback is intended to license the inference by the learner that something else is. Ordinarily, negative feedback signals trouble in the communication while positive feedback is rarer.22 We also use positive feedback to confirm that we are attending to a conversation, e.g. although our heads are hidden behind a newspaper or our ears are covered by headphones. In other words, we provide positive feedback in precisely those cases where our interlocutor has reason to believe that communication might not be successful. There are certain environmental settings where positive feedback is commonplace. The language classroom is one case. It is well-known that language teachers will respond “Yes” to
QUESTIONS, PROBLEMS, AND DEFINITIONS
23
ill-formed utterances and to wrong answers as part of a didactic practice to elicit speech from L2 learners (Chaudron 1977). Frequent use of positive feedback is thus one of the characteristics of teacher talk. Evidence from memory Evidence from stimuli
Analysis & Representation
Evidence from visual perception Evidence from inferencing Etc.
Figure 1.6. Positive and negative evidence
Developmental psycholinguists have developed a separate terminology for stimuli that purportedly are both positive feedback and positive evidence: models and recasts. A model is any utterance offered by the native speaker which the learner is supposed to construe as positive evidence for some analysis and as the proper way of saying something. A model is therefore not merely positive evidence, it is also a stimulus with metalinguistic implications. The learner must infer that the interlocutor is saying “This is the way to say ‘X’ …” Note that in the normal course of conversation any utterance uttered might lead to the same inference but typically does not. We normally do not construe utterances as models because we normally do not consciously process them for their mediating form but rather for their meaning.23 Models are utterances which have been assigned a role as exemplars of specific form-meaning pairs. Unacceptable sentences can be modelled. In the linguistics literature, they frequently are. A recast is a reformulation of a learner’s faulty utterance in response to its illformedness. It can be seen as a particular subtype of modelling. A recast is also to be construed as saying “This is the way to say ‘X’ …” In addition, it is supposed to license the construal “Don’t say ‘X’ the way you just said it!” It can be thought of, therefore, as positive evidence or negative evidence depending on what construals the learner actually makes. It can be construed as positive feedback and negative feedback depending on what intention the native speaker has. In (1.4) above, B’s first utterance (1.4b) is a recast, offering a model of the proper pronunciation, and explicitly rejecting A’s pronunciation. That A draws the correct inference can be seen from the fact that she repeats her L1-based pronunciation of Katalog, hears it, recognises that she has made an error and immediately corrects her pronunciation herself. See (1.4c). B’s second utterance (1.4d) is also a recast, attempting to draw A’s attention to her faulty pronunciation of the final
24
INPUT AND EVIDENCE
stop consonant which is devoiced in German. In this instance, A heard the utterance, understood the utterance as a recast, but chose to ignore it, since she already knew that German voiced stops devoice in syllable final position. This second recast therefore has no effect. 2.5 Competence and performance, skill and control, faculties, and abilities I have emphasised that I am interested in the analyses that the learner assigns to particular stimuli but that to study them we are obliged to make inferences about them on the basis of learner behaviour. To facilitate the discussion, I should introduce further terminological distinctions here. The first is the now traditional distinction between competence and performance. The term competence is used to mean a number of quite different things in the linguistics literature. It is important for us to be clear about which ones are useful for SLA. On the one hand, competence is sometimes defined as the knowledge that an individual has in some cognitive domain. Linguistic competence is thus said to be knowledge of a language however it happens to be represented. This knowledge might correspond to a psychogrammar — defined here as the grammatical information used in parsing and producing a language. Or else, linguistic competence might be any representation of linguistic information, regardless of how, or indeed if, it is put to use. Chomsky has thus argued in a variety of places that an individual can possess linguistic competence in this sense even if he could never express the knowledge in his behaviour due to some physical impairment, for example. There is also another quite different use of the term, however, also found in the linguistics and philosophical literature. Chomsky has used the term to refer to the knowledge of an ideal speaker/hearer living in a homogeneous community who knows his language perfectly (Chomsky 1965: 3). Since such individuals do not exist, it is impossible to directly observe competence in Chomsky’s sense. Competence of this sort is not a construct equivalent to some individual psychological system, which may be why Chomsky subsequently introduced the term I-language to designate that system. In this book, therefore, I will reserve the term competence for Chomsky’s idealised notion, or when quoting others, and use the term I-language to refer to all aspects of the individual’s knowledge of the language, regardless of how idiosyncratic that may prove to be.24 Note that the term I-language is still neutral as to how and where the knowledge is represented. It will certainly include the psychogrammar, but it could also include metalinguistic information encoded in the speaker/hearer’s conceptual system. Thus, when an individual examines the string (1.2b) and makes a
QUESTIONS, PROBLEMS, AND DEFINITIONS
25
classification judgement to the effect that (1.2b) is not a possible sentence of English, the conceptual representation encoding that judgement is encoding knowledge of English. In the case of L2 learners, I will also occasionally follow traditional practice and refer to the I-language of the L2 as the interlanguage. When I want to focus on specific aspects or components of the I-language, I will refer simply to the relevant knowledge system, e.g. the phonological system, the morphosyntactic system, and so on. Performance is normally defined as the actualisation of the linguistic knowledge system in concrete situations — what the individual actually does, subject to shifts in attention and interest, memory limitations, etc. (Chomsky 1965: 3). This construct has been much criticised as inadequate to account for the relationship between knowledge and behaviour. Its purpose originally was to legitimise talk of knowledge, to distinguish between knowledge and the behavioural manifestations of that knowledge, and to make the case that cognitive psychology should have as its object of study and explanation the knowledge systems, rather than behaviour tout court. See Chomsky (1980: Ch. 1–3) for relevant discussion. Elbers and Wijnen (1992), however, note that a distinction must be made between a learner’s actual behaviour at any one time, and the abstract processing mechanisms which underly or potentiate performance. In order to comprehend speech, to read a text, to speak, or to write a sentence, knowledge must be put to use. Putting knowledge to use requires the development of complex procedures which select and integrate representations before actualising them in motor or interpretation programs. The deployment of knowledge of language may be expert or inexpert, mature or immature, in addition to being subject to shifts in attention, interest and memory limitations. This observation has led to the creation of constructs designed to capture behaviour-based knowledge. One well-known construct is procedural knowledge or the implementation of knowledge “on-line” in working memory (Anderson 1983: 10–11). A slightly different way to think about procedural knowledge is to define it as the encodings of grammatical information in parsing and production systems.25 Consider the case of an L2 learner of German. One thing that must happen is that the person develops procedures to analyse representations of embedded sentences with verb slots in final position. This is not merely a requirement for production, although we know that putting tensed verbs in the right position is a problem for L2 learners (Clahsen, Meisel and Pienemann 1983; Clahsen 1984, 1988; Meisel 1991). It is also a minimal requirement for comprehending German speech. A speaker of a subject (S) verb (V) object (O) order language like English who transfers parsing procedures to German, a language which is SVO in matrix sentences but SOV in embedded sentences will
26
INPUT AND EVIDENCE
have comparatively little problem processing German matrix SVO sentences even in the early stages of acquisition (as long as those sentences contain known vocabulary). However, working memory will often not be sufficiently large to accommodate a parser building a representation of an embedded sentence word by word. This is what must happen when the learner initially hears an SV daß SOV sentences (= SV that SOV). It is well known that working memory is quite limited in the number of unrelated items it can hold and process (see Eysenck and Keane 1995: Ch. 6). In Miller’s (1956) famous paper on memory, he estimated that working memory could deal with between 5 to 9 unrelated items. Processing sentences or other input of greater length is assumed to involve “chunking” or grouping elements into larger units in such a way that capacity in working memory is freed up. Chunking is precisely what the unskilled hearer cannot do. Alternatively, we might think of the difference between expert and inexpert parsing as the ability to activate pre-compiled chunks with which to analyse incoming material. Novices must construct the parsing routines in realtime. Now, analysing a verb is essential to finding out who did what to whom in a sentence. If a learner cannot chunk the lexical items comprising subjects and direct objects, and if the verb is further away than the magical number 7 (plus or minus 2), then the learner will simply “lose” the incoming message stored briefly in verbal memory. The memory trace of the analysed and represented sounds (phonetic and phonological encodings) simply fades away before the sound sequences of the final items can be reanalysed as syntactic and semantic units.26 This ability to put knowledge to use Elbers and Wijnen call skill, following wellestablished traditions in the psychology of learning. I will adopt this terminology too, keeping the term performance for the individual, context-specific behaviours. Bialystok and her colleagues have drawn a distinction between components of skills, namely analysis or knowledge and control.27 Analysis here does not refer to processing in general (the way I use the term) but rather to the specific analysis of information into conceptual representations (or propositional or declarative knowledge). Unanalysed knowledge is knowledge which is not so rerepresented in conceptual terms.28 Unanalysed knowledge can be put to use in speaking and hearing, but tasks such as making acceptability judgement tasks, performing grammatical analysis consciously, or writing sonnets, so the claim goes, require analysed knowledge. The distinction that Bialystok is making is an important one, although the choice of terminology is not especially felicitous. To avoid confusion, I will adopt a terminology drawn from Karmiloff-Smith (1992) and talk about representational redescription. Karmiloff-Smith (1986, 1992: 15) argues that the same information can be represented in a variety of representational formats and that skill development involves a cyclical process by which
QUESTIONS, PROBLEMS, AND DEFINITIONS
27
information stored in special-purpose (autonomous) representations is made available progressively to other cognitive systems by increasing the level of explicitness of the encodings. From the perspective of Representational Modularity, representational redescription amounts to information in autonomous representations of the parsing and production systems being gradually put into correspondence with representations in a conceptual representation, where it can be put to use for conscious reflection, performing particular kinds of metalinguistic tasks, and so on. Alternatively representational redescription means that information is reencoded into phonological representations which can be projected into conscious awareness (Jackendoff 1987). In the theory of functional architecture adopted here, it is these phonological representations which serve as the basis for thinking about language and which are probably the point of departure for acceptability judgement tasks. In the Bialystok and Karmiloff-Smith models, there are no obvious constraints on how representational redescription occurs. In the model developed here, in contrast, it is assumed that the architecture of linguistic cognition imposes quite severe constraints on which autonomously represented knowledge can be redescribed. It is claimed, for example, that the featural contents of conceptual and phonetic representations are not re-representable in phonological representations so that we are never consciously aware of either semantic or phonetic features of speech.29 Whether the model adopted here is compatible with Bialystok’s assumptions is not clear and will not be pursued.30 Control in Bialystok’s model (Bialystok 1986a, b) involves the ability to focus attention on various dimensions of a task in order to do it. The development of expertise necessitates the development of control in this sense. Producing a simple sentence in an L2 involves organising a message, selecting suitable lexical items to express that message, grouping those lexical items into a surface structure (a linear and hierarchical syntactic structure), mapping the syntactic structure onto prosodic structures, and articulating the motor-articulatory schemas of the phonetic representations. Beginners cannot do all of these things at once, although they may possess all of the necessary information encoded in partial representations, and must attack the problem of production bit by bit, practising, making errors, being corrected, and practising some more. It is an interesting question whether control is involved in the development of parsing and comprehension skills. Clearly, there are differences in the way beginning and advanced learners process stimuli, which involve, in part, their ability to chunk and deploy analysed information. There is no evidence, however, that focused attention is involved in improving one’s ability to do this. Rather, something happens during development, outside of our awareness, beyond our
28
INPUT AND EVIDENCE
control, such that stimuli that were formerly incomprehensible suddenly become comprehensible. To the extent that the Analysis and Control model has nothing to say about this, it is incomplete.31 Moreover, as a model of language learning, it is also fundamentally flawed since it provides neither a theory of I-language nor a theory of how I-languages change. Why these weaknesses are so basic will become clearer in Chapter 2. It seems to me, however, that the basic distinctions Bialystok is making, based on the idea that control varies in terms of task demands, will be an essential part of any theory of skill development. Indeed, it is a robust finding of much psycholinguistic research that behaviour fluctuates in terms of the demands of the task being done. It is well established in speech production that the same learners may manifest different degrees of control of syntactic properties of English depending on whether they are performing a written manipulation task or speaking spontaneously. Research on speech perception reveals similar taskspecific variability. Janet Werker and her colleagues have demonstrated that the ability to make certain acoustic-phonetic discriminations disappears with age. Indeed, although children appear to be born with the ability to make certain linguistically relevant featural discriminations, and quickly manifest the ability to make other (possibly all) linguistically relevant perceptual distinctions (Eimas, Siqueland, Jusczyk, and Vigorito 1971; Streeter 1976; Jusczyk 1985, 1992), they fine-tune their perceptual systems to the specific properties of the stimuli within the first year of life. This means that some distinctions which are discriminable at birth, or shortly after, are no longer discriminated later on (Werker and Tees 1984a; Werker and Lalonde 1988; Werker and Pegg 1992).32 Nonetheless, the ability to make discriminations among sounds not part of the L1 does not necessarily disappear for all or even for most types of features (Tees and Werker 1984; Werker and Tees 1984b). When adults are required only to discriminate between exemplars exhibiting L1-non-distinctive phonetic features, they can often do it. Interestingly enough, this ability is task-specific; embed the same sounds in a word recognition task and they cannot discriminate the relevant features (see Wode 1992 for further discussion). Therefore, the well-documented problems that adult learners have in mastering and accurately producing new segmental categories in the L2 (Flege 1991, 1992) cannot be explained entirely on the basis of differences in perception and in the absolute representability of sounds. Rather, the ability to perceive and represent acoustic-phonetic distinctions is also dependent upon the level of processing carried out — processing in what Werker and Logan (1985) call “speech mode” interferes with the ability to make certain discriminations which can be made nevertheless when processing in a less demanding “phonemic mode.” Processing in phonemic mode involves processing
QUESTIONS, PROBLEMS, AND DEFINITIONS
29
sounds qua sounds rather than, e.g. processing sounds in order to recognise words.33 These differences between native speakers and L2 learners can thus be characterised in terms of differing abilities to analyse and deploy the information in the stimulus relative to other types of information available in specific perceptual tasks. Other confirmatory evidence comes from research on memory and learning which shows systematic dissociations on distinct memory tasks. This research has led to what is known as the processing account of memory. According to Morris, Bransford and Franks (1977), memory for some particular bit of information improves to the degree that the type of processing needed to perform a given memory task is the same type of processing needed to encode or learn the information. See also Roediger (1990). The basic idea is that there is a transfer from the initial encoding to the performance on the memory task. When there is a dissociation between the initial encoding and the processing needed to perform a memory task, then the information must be re-represented to perform the task. There is a substantial literature showing patterns of dissociation on memory tasks performed by normal subjects, reviewed in Blaxton (1992). She then tested amnesic patients suffering from temporal lobe epilepsy on a variety of tasks. In one experiment, subjects were tested on 4 different memory tasks involving words, half of which had been previously studied, half of which were novel items. The first test was a word-fragment completion task in which subjects had to fill in the missing letters on stimuli such as _UT_E_FL_ (target = butterfly). The second test was a test of conceptual knowledge related to the selected semantic class, e.g. “What kind of insect are both the Monarch and the Queen Alexandra?” Answering such a question involves drawing on the knowledge that the Monarch butterfly is a sub-class of the superset . The third test involved semantic cued-recall in which subjects heard previously unstudied words (in the context, these were new stimuli), semantically related somehow to items previously studied, e.g. moth as a cue to butterfly. The fourth test was a graphemic cued-recall test in which subjects saw words graphemically like those studied, e.g. buttery. The hypothesis behind the design was that reading a list of words is a data-driven information-encoding task, and that generating a word on the basis of conceptual knowledge or semantic cues is a conceptually driven task. The memory tests can also be analysed in this way. Thus, the word-fragment completion test is hypothesised to be an implicit data-driven task, answering questions based on conceptual knowledge is assumed to be an implicit but conceptually driven task, the semantic cued recall is an explicit task which is conceptually driven and the graphemic recall task is explicit and data driven. The research question was: Given the nature of the encoding tasks, does it make any
30
INPUT AND EVIDENCE
difference what the memory task is? The answer turned out to be positive. Amnesic patients performed best when there was a match between data-driven processes at learning and at test time. Their worst performance occurred when there were conceptually driven processes at both learning and at test time. Normal controls exhibited, as expected, similar dissociations. Subsequent experimentation showed that the amnesics were having particular difficulty with the generation tasks, and that there are important differences in the performance of left-temporal-lobe and right-temporal-lobe epileptics. One way to interpret these results is to say that performance on a given task type (data-driven or concept-driven) is enhanced when the memory task requires the activation of the same type of representations of the encoded information. The processing account of memory leads us to the conclusion that information can be encoded and activated in a task-specific way. Matching across learning and memory tasks thus involves “transfer” from the learning task to the memory task, which is just another way of referring to facilitation effects seen when the task-based encodings were the same. In Blaxton’s experiments, the processing account of memory forces a rethinking of such terms as “amnesic”, for her experiments clearly show that subjects could access memory traces on some tasks. We might expect that such research will also help us to understand the differences between expert and novice behaviour, since experts re-encode information in different ways as expertise develops. Novices, in contrast, appear to have single encodings of information. This point has obvious implications for SLA, since proficient speaker/hearers are none other than expert users of the language. Finally, recall that the hypothesis that information is first encoded in autonomous encodings is explicitly assumed in the theory of representational redescription. This idea can now be interpreted in terms of the processing account of memory. I conclude that discussion of control and the nature of encoding of information on specific tasks has a place in any theory of second language acquisition. The processing account of memory is a promising theoretical approach to the facts. I will continue to use the term ability as a way of talking about skills in abstraction from the specific demands of a task. This usage follows normal usage as when we say that Gisbert has the ability to play the piano or Alex has the ability to ride a bicycle. Finally, I will use the term faculty to talk in a general way about both the representations and mechanisms involved in deploying language. We will thus talk about the language faculty in contrast to the faculty for number, face recognition, or music.
QUESTIONS, PROBLEMS, AND DEFINITIONS
3.
31
A reformulation of the research questions
Now that we have some basic terminology, we can rephrase the research questions under investigation. I began by saying that I would be exploring the relationship between the language that learners hear (or read) and the knowledge of the L2 that they acquire. I can now reformulate this as (1.5): (1.5) a.
b. c. d. e.
What role is played by positive and negative feedback in forcing the learner towards a specific analysis of intake and intermediate input representations during language learning? How does feedback interact with positive evidence during the analysis of grammatical representations? What processing mechanisms must exist in order for feedback to be utilised? What is the relationship among the various processing mechanisms in a given faculty and the information they manipulate? How are representations of stimuli computed on-line?
SLA, indeed psycholinguistics in general, is not so far advanced that I can hope to answer these questions. Nevertheless, by exploring each one, I hope to cast input and evidence in a novel light, pushing us towards a different sort of explanation from those which are currently popular.
4.
Summary
In this chapter I have laid out the broad lines of the investigation into evidence, feedback and correction which will follow. I will examine the hypothesis that feedback and correction can cause the restructuring of interlanguage grammars, within a theory of acquisition called the Autonomous Induction Theory, embedded in a partially interactive theory of processing. I have also introduced some terminology necessary to conceptualise evidence, feedback and correction in information processing terms. In particular, I have pointed out that current Corderian terminology is inadequate, distinguishing only between some undifferentiated stuff “outside” the mind/brain in the environment and some undifferentiated stuff “inside” the mind/brain. I have instead distinguished between three broad categories of information in the information processing systems I shall invoke to discuss second language acquisition. The first broad category concerns representation types to the parsers, where I have distinguished (i) stimuli (= information entering the perceptual systems), (ii) intake (= transduced stimuli or
32
INPUT AND EVIDENCE
analogue representations), (iii) input (= structured representations entering particular processors). The second broad category concerns information to the learning mechanisms. Here I have distinguished between two types of information. The first type is derivable from analyses of speech stimuli during parsing or inferences based on parsing some string which is part of the language (positive evidence). The second type is derivable via complex inferencing from conceptual representations encoding information about non-existent strings or unacceptable strings, or inferencing from feedback and correction (negative evidence). The third broad category concerns a distinction between the nature of interactions between learners and native speakers and the nature of the responses of native speakers to the errors and infelicities of interlanguage. Here I distinguished between positive feedback (confirmation that some form or string or interpretation is possible in the language) and negative feedback, which is intended to communicate that some form, string or interpretation is not possible in the language. Finally, I have been careful to distinguish between linguistic knowledge (I-language), the encoded information which is directly relevant to the parsing and production systems (the psychogrammar), skills (task-based encodings of linguistic information in parsing and speech production), and performance (what one does on a given occasion). With these distinctions in mind, I would now like to show why the Autonomous Induction Theory is needed.
Notes 1. To simplify the presentation, I will assume throughout that we are discussing adult learners who live and work in a community where the L2 is the language of official communication. I happen to believe that at the micro-processing level that I intend to explore here, the important situational differences which characterise language-to-(im)migrants and language-to-classroom-learners, differences which still need to be explored empirically and theoretically, will not matter much. 2. In my view, describing SLA as hypothesis-formation and hypothesis-testing is a misleading metaphor. In learning new categories of the grammar, new structural arrangements in on-line parses and hence new parsing procedures and new production schemata, learners are not learning new concepts. 3. At the very least one might ask for some solid psycholinguistic evidence to support it. In the mean time, many of us find it unhelpful to arbitrarily reallocate the terms acquisition and learning in the way Krashen does without ever defining what the processes actually consist of. In my view, Krashen’s terminological distinctions simply obscure analysis of the difference between representation and process. 4. Although reading texts in a foreign language plays an important part in classroom learning, I will have nothing further to say about it in this book. The significance of literacy for adult learners should not, however, be underestimated for many reasons, among them being that writing practices can provide many clues to word boundaries which the speech stream does not
QUESTIONS, PROBLEMS, AND DEFINITIONS
33
(thus giving the literate learner a leg up on constructing form-meaning associations), or that reading is often linked to vocabulary expansion. Through reading the learner can also better control the amount of exposure he gets; starting up and maintaining conversations for the purpose of just hearing more speech is not nearly so easy. 5. This remark should not be construed as saying that no other theoretical influences can be observed in SLA. Nothing could be further from the truth. However, to the extent that functionalists have contented themselves with descriptions of interlanguage varieties, and set aside the problem of constructing a transition theory, as we shall shortly call it, their use of semantic models cannot be described as “mainstream theorising” in the intended sense. Similar remarks can be made about the application of neurological theory in SLA. Finally, while the exciting research of the last decade on cross-linguistic speech perception (to be discussed elsewhere) has certainly influenced phoneticians and phonologists working in SLA, the significance of this work for characterising general aspects of SLA has not been seen. 6. This theory has been developed over many years, beginning with work which predates the current label for the theory, namely Jackendoff (1972). The first exposition of the theory appears as Jackendoff (1983), which was developed and extended in Jackendoff (1985, 1990b, 1992, 1995, 1996a, b, c). 7. I intend to demonstrate that the role of correction depends on a metalinguistic interpretation of a given stimulus. Since the ability to re-encode and conceptualise language in metalinguistic terms appears to depend upon cognitive developments which may, in turn, be linked to maturation, I will not discuss the role correction may play in either the simultaneous acquisition of two first languages (that is, child bilingualism) or in early child SLA. This book is about adult learners and learning by adults. I will simplify the exposition by omitting specific reference to adults except where I intend to contrast child and adult language acquisition. 8. Gass and Selinker (1994: 197), for example, refer to the speech stream as “input.” Towell and Hawkins (1994: 249) refer to “authentic data.” VanPatten (1993: 436) defines “input processing” as the conversion of linguistic data to that subset which is comprehended and attended to in some way through the mediation of various (unspecified) linguistic and cognitive factors. All of these discussions appear to regard input as outside of the cognitive systems which encode language. 9. In the Competition Model (see Chapters 2 and 3), cues are stimuli since they meet this definition of being objectively definable properties of the environment. In my view, cues should more properly be defined as properties of internalised representations, relatable to stimuli via parsing procedures. 10. I might add hear that we can talk for eons about association without necessarily knowing what we are talking about. Studies of animal learning have clearly demonstrated that animals show a remarkable sensitivity to probabilistic relations among events. As Kelly and Martin (1994: 113) have pointed out, these demonstrations reveal very little about the mechanisms underlying association or about what is actually represented. 11. This does not preclude hybrid approaches which resort to connectionist architecture to capture the lowest levels of information-processing but which also incorporate structure. I will have nothing further to say about this issue beyond saying that my approach to SLA is not in principle inconsistent with such hybrid functional architectures. See Goldsmith (1992) for discussion. 12. I am drawing conclusions here since the authors fail to cite either Corder or Krashen in their discussion. 13. It should be clear that Faerch et al. are not using the term “input” as I do to mean internal mental representations of structures but rather to mean “stimuli.”
34
INPUT AND EVIDENCE
14. As in note 13, Faerch et al. are attributing a different meaning to their technical terminology from my definitions. Intake in this citation is intended to mean what I call input. 15. See also Long (1981, 1983, 1996) for additional discussion. 16. To anticipate discussion which follows, let me say here that both of these claims are false unless qualified. 17. I will capitalise semantic information and constituents following a practice introduced in Jackendoff (1983). Linguistic forms are italicised. 18. Parsing procedures will have to be redefined in Minimalist approaches to conform to the properties of the operation Merge which replaces X-bar theory. To simplify discussion, proper names will be analysed as NPs. DetPs will be used only where definiteness is critical to the discussion. 19. I sidestep a difficult problem here which is the definition of the term “normal language.” The fact that such strings regularly appear in linguistics texts serving as examples of what is not possible in English, and can be identified by knowledgeable readers of such texts in the intended manner, might be sufficient to define them as instances of naturally-occurring English used in normal communicative situations. 20. This example comes from a corpus that I have been collecting since 1992 of spontaneously provided cases of feedback and correction, occurring during normal conversation between adult second language learners and native speakers. In the example in (1.4), the learner is a native speaker of English learning German and the corrector is a multilingual native speaker of German. 21. These are attested examples. Lardiere (1995: 28), in a treatment of the Level-Ordering Hypothesis and the “No regular plurals inside compounds” claim, provides a critique of the American-based morphological literature which claims that such forms are impossible in English. Three years of watching SKY News have convinced me that they are not only possible but productive, with no idiosyncratic interpretation, just as Lardiere suggests. (Although Mike Sharwood Smith disagrees and has other intuitions based on his English- English variety.) This points to a standard problem that psycholinguists face, namely defining the object of study. No one can have intuitions about the “English language” because no one speaks that. Rather, each of us speaks a particular variety of it. A more cautious approach to grammatical description would see grammarians limiting their claims to the specific varieties which they do speak, and buttressing broader generalisations with facts drawn from corpus analysis. To the extent that ESL subjects are learning their English in an American context, relying on such descriptions may not be a problem for psycholinguists, if the language of the community happens to match the descriptions provided in the literature. This needs to be verified of course. It stands to reason that researchers whose subjects are acquiring English outside of the U.S. need to be doubly cautious in interpreting the facts of the “target” language, and are well-advised to establish local norms. In the case where subjects have little exposure to native-speakers this may involve testing the students’ instructors or collecting baseline data from language classrooms. 22. Since some researchers apparently treat all instances of use of the Gricean Cooperative Principle as “feedback” (see Allwood 1993), I should stress here that I do not. While there may be occasions where we would want to distinguish a response to a question from a topic shift, I will not treat answers to questions, responses to commands, etc. as feedback. My use of the term throughout is restricted to its role in language acquisition. 23. Notice that this allows two different ways for input to influence grammatical organisation. On the one hand, unconscious processes of parsing can operate to feed a sound sequence, a word, or a structure into the grammar. On the other hand, the same elements can be treated by the
QUESTIONS, PROBLEMS, AND DEFINITIONS
35
conscious system as a model, directing the learner’s attention to the formal properties of the stimulus. It is not clear to me whether the results will differ. 24. The psychogrammar is also an idealisation, inferred necessarily from the observation of behaviour. We attribute to it a direct causal role in explaining, say, Aunt Helen’s ability to comprehend and produce sentences like We went there for to visit Lee. Whether Chomsky’s concept of competence plays any direct causal role in explaining this ability is another matter. 25. I do not want to suggest that this is Anderson’s view. I am, in particular, not making reference to declarative knowledge since this is merely information stored in long-term memory, undifferentiated as to type or level of analysis. This distinction is critical for me. In addition, Anderson makes no assumptions of modularity of processing. 26. I can attest to the reality of the phenomenon, having lived through it myself. This anecdote is not, however, a substitute for an empirical investigation of the English-L1 learner’s parsing of verb-final German sentences at various stages of acquisition. 27. See Bialystok (1982, 1987, 1994), Bialystok and Bouchard Ryan (1985), and Bialystok and Sharwood Smith (1985). 28. The reader should keep in mind that in my framework it is incoherent to talk of “unanalysed knowledge.” Structural knowledge, by definition, is contained in a representation which is the product of analysis. 29. It does not follow that acoustic-phonetic information cannot be directly encoded in conceptual representations so that, for example, I can use aspects of it to imitate a British accent, write down a phonetic transcription of sounds I hear in the alphabet of the International Phonetic Association, or reproduce on demand for my German-speaking students the difference between my own Canadian pronunciation of the written form grass and the pronunciation indicated in the Oxford English Dictionary. In Jackendoff’s theory, conceptual structures are the level of representation at which information decoded from language meets information decoded from the perceptual and motor systems. Since we can talk about what we see, hear and do, it must therefore be the case that conceptual representations can encode phonetic representations, articulatory-motoric representations, 2 ½ D representations and so on. But since we cannot always think about the contents of these representations, the claim in the text means that I do not encode in my interpretation of sound strings a phonetic analysis of the phonetic features which make them up. 30. The psychological literature on learning and memory makes available a variety of models potentially of relevance here, e.g. data-driven vs. conceptually-driven processing models, implicit vs. explicit learning and implicit vs. explicit memory, declarative vs. procedural knowledge. See Eysenck and Keane (1995: Chs. 6 and 7). Ultimately, one would hope to see the constructs of analysis and control linked explicitly to some such model. 31. Hulstijn (1990) has mounted a more general critique of Bialystok’s work. 32. It follows that there are necessarily real representational differences between first and second language acquisition. On these grounds alone, we can reject the simplistic notion “L1 = L2” (Dulay, Burt, and Krashen 1982). 33. The full story on adult L2 perceptual abilities and their effects on the ability to acquire an authentic accent will therefore have two parts: the first part will detail the absolute loss of perceptual discriminatory abilities during the first year of life, the second part will show how processing speech to isolate lexical items interferes with the ability to detect sub-segmental features exhibited in those words.
C 2 Property and transition theories
1.
Why do we need the Autonomous Induction Theory (or something like it)?
1.1 What must an SLA theory explain? I have been careful in Chapter 1 to frame the discussion in terms of knowledge and behavioural change. A theory of second language acquisition must explain both. It takes behavioural changes into account as part of its observables. We are interested in describing and explaining how the ability to understand and produce the L2 changes over time. While it might be true that learners regularly undergo changes in their internal knowledge states, this will be of no consequence for our field unless those changes cause the learner to do things that previously were not doable. We are interested in internal states because they can explain behaviour. However, at the same time, we are committing ourselves here to a cognitive story — it is the nature of and changes in internal state that are the objects of analysis and theorising. Gregg (1993), citing Cummins (1983), states that explanation in SLA requires us to have two different but related sorts of theories: a property theory and a transition theory.1 The property theory will tell us how knowledge of an L2 is instantiated or represented in a learner, while the transition theory describes changes in what is represented, and also, when connected to some causal mechanism, explains how one knowledge state develops into another. No SLA theory will be explanatory without possessing both subparts, which is why in the face of numerous models, hypotheses and proposals, Gregg can maintain that we still have no real theory of SLA. On the one hand, much work in the past has failed to recognise the centrality of a theory of grammar as a property theory of SLA. Since the early 1980s, this problem has been addressed by researchers adopting the Principles and Parameters Theory (Chomsky 1981b, 1986: 243). However, Principle and Parameters Theorists in SLA have made little effort to develop a transition theory, and consequently, their results can only tell us part
38
INPUT AND EVIDENCE
of the story, namely what the property theory has to look like. To put the matter more succinctly, talk of parameter-resetting is a description of the fact that one mental grammar has been restructured into another; it is not an explanation of how restructuring occurs. The challenges facing a theory of parameter-resetting are considerable, as we shall see below.2 On the other hand, only one serious proposal has been made to date regarding a transition theory, namely connectionism, more specifically as it is instantiated in the Competition Model (see Bates and MacWhinney 1981, 1987; and the papers in MacWhinney and Bates 1989) and explored in an explicitly SLA context by Gass (1987), Harrington (1987), and Kilborn (1987). The Competition Model, however, lacks a property theory rich enough to be taken seriously as an account of linguistic competence. Neither paradigm in its present form can be said to meet the requirements of providing both a property theory and a transition theory. Gregg can be read as claiming that once we have a theory of knowledge states (e.g. a theory of mental grammars) and a theory of how those knowledge states change over time, our work is over. We can then relax and go for a beer. I part company with Gregg on this point in that I think property and transition theories are still only one-half of the story. I have said above that a transition theory explains changes in knowledge states when connected to some causal mechanism. In my view, making that connection requires us to link a theory of representation to a theory of speech processing, and therefore ultimately to language behaviour. Another part of the explanation story therefore links the knowledge states and their changes to information coming from the environment or somewhere else in the cognitive system outside the learner’s grammar. Input to learning mechanisms somehow has to be brought into the equation and this cannot be done in the absence of processing theories which connect the knowledge states to speech in the environment or to information (stored elsewhere) which is applied to a linguistic problem. So, we require now three things: a theory of mental representations (i.e. of mental grammars), a theory of how mental grammars can restructure, and a theory of how learning mechanisms get access to stimuli or to conceptual information to reanalyse and encode grammatically during the learning process. But even this will not be the full story. We also need a theory of learning in addition to property, transition and performance theories. The learning theory will explain how restructuring takes place given the availability to the learning mechanisms of new information at some moment in time. Once we embed discussions of second language learning within a theory of speech processing, learning must be defined in several quite distinct ways: as the initial encoding in memory of a specific representation, as the entrenchment of that initial encoding so that it becomes easily activated, as the realignment of
PROPERTY AND TRANSITION THEORIES
39
previously encoded cues to an existing category, or, as what happens when parsing with extant parsing procedures fails. Speech processing theory cannot provide an answer to the question: What happens next? Only a theory of learning can do that. (2.1) What a theory of SLA must consist of: a. a theory of linguistic knowledge (a theory of mental grammars); b. a theory of knowledge restructuring (how the representations of a mental grammar can change in principle and, equally importantly, how they cannot); c. a theory of linguistic processing showing how input gets into the system from the speech signal (bottom-up) or from the conceptual system (top-down) thereby creating a learning problem; d. a theory of learning which shows how novel information (not brought into the system from outside the grammar by the parsers) can be created to resolve the learning problem.
Feedback and correction are sometimes presented as central elements of a general theory of learning designed to serve the functions of both a property and transition theory. Since a general theory of learning is intended to do the work of a particular theory of second language acquisition, we may conclude that feedback and correction are intended as central elements of a theory of SLA. Such a theory names induction as the principal mechanism responsible for both the properties of the L2 knowledge systems and for the changes observed in an interlanguage grammar when it restructures from one state to another, say, from one which cannot encode complement sentences to one which can.3 I intend to argue that induction is indeed needed to explain how truly novel information gets into the language processing system (2.1d above). I shall, however, take the position that induction, with or without feedback and correction, cannot properly explain the nature of the basic organisation of linguistic knowledge which manifests certain abstract universals unique to linguistic systems. Alone it cannot tell us, for example, why L2 linguistic knowledge shows sensitivity to constituents like phones, syllables or prosodic words.4 None of these constituents are present as objective properties of the speech signal. Rather, they must be attributed to the signal as part of a re-representation of it during the course of speech processing. But speech processors must somehow come to perform these re-representations which means that learners must come to mentally represent phones, syllables and prosodic words as part of a representational format. Induction cannot explain where the format comes from. Induction alone also cannot show why predication, one of the basic semantic relations, is constrained
40
INPUT AND EVIDENCE
by c-command which is a structural relation defined over morphosyntactic representations. (See Chapter 3, note 33, for a formal definition.) Alone, induction cannot explain any of the abstract constraints which appear to be true of grammatical relations.5 However, given a suitable representational system, one constrained by properties of Universal Grammar, induction, coupled with a theory of input (including feedback and correction) along with the relevant performance mechanisms, can give a constrained and correct account of interlanguage grammars, of how one interlanguage grammar changes into another, and of how learners’ ability to understand and produce speech changes over time. I think that such an approach can give a better account than the current alternatives — the Principles and Parameters theory of Universal Grammar and the Competition Model — as they are currently applied in SLA research. 1.2 Problems with SLA applications of Principles and Parameters Theory The Principles and Parameters theory, henceforth P&P theory, suffers from two major faults, as it has been imported into SLA studies: It is incomplete as a theory of grammatical encoding, and it has been treated by many as a theory of transition. Let me take up the incompleteness argument first. By focusing on the agenda of UG research within grammatical theory, which is to whittle away language-specific details in order to zero in on the universal, P&P research in SLA runs the risk of simply ignoring most of what one would like to call knowledge of a second language. Variation is what P&P theory is designed for, but it limits itself to those aspects of variation which are not the result of learning. It therefore can only deal with part of the job. There are three distinct types of knowledge which P&P theory largely ignores. First of all, there is what we might call irregular variation, which derives from the fact that grammars of particular languages consist of information which is unsystematic or only partly systematic, information which simply does not lend itself to a characterisation in either general or parametric terms. As we know from descriptive studies of individual languages and cross-linguistic comparative research, languages vary sometimes considerably from one another in terms of their repertoires of constituents, the ways in which those constituents are organised syntagmatically and in how constructions correspond across levels of analysis. Some of this variation is just variety-specific. In his own writings, Chomsky has always acknowledged the necessity of describing and explaining the flotsam and jetsam of specific language systems (see Chomsky 1981a: 8). This type of linguistic knowledge often gets referred to as the grammatical periphery but this term is not well-defined. Certainly we have no reason to equate “peripheral” with
PROPERTY AND TRANSITION THEORIES
41
“marginal” or “uninteresting” or “trivial.” The flotsam and jetsam must also be acquired and a language learning theory must therefore encompass mechanisms to do that job. Secondly, once we focus our attention on the fact that psychogrammars are the by-product of individual experience, we must grant that much linguistic knowledge varies from speaker to speaker.6 The theory of UG has nothing to say about idiosyncratic knowledge. We need a theory which will explain that I say [th6Áme:no] and you say [th6Ám":tho] while somebody I know from Kitchener Ontario says [th6Ámaeno]. Psychogrammars, the grammatical systems which we use in parsing and producing speech, are rich in such particulars. Only some of this variation will be explainable in terms of the norms and speech practices of the groups in which the language user holds membership. Some of it is just truly unique. Thirdly, research in sociolinguistics reveals variation in language use within a given language community of a rather different sort. Much of that variation is patterned and shared and can be explained in great detail in terms of the social contexts in which language is acquired and deployed, as well as in terms of the social symbolism features of language take on for members of a given community.7 P&P theory has nothing to say about sociolinguistic variation either. Moreover, certain kinds of sociolinguistic variation raise fundamental difficulties for grammatical theories which assume that mental grammars are homogeneous systems. Mufwene (1992) has argued that they are not, on the basis of variation in the use of the copula in African-American Vernacular English (AAVE). Labov (1998) has taken up this issue, trying to sort out the logic of traditional assumptions. He points out (Labov 1998: 117) that the speakers of AAVE who are also skilled users of “other American dialects” (OAD, or, basically, the white varieties heard in the electronic media, and used in the school systems), might be seen to be in one of several distinct states. Thus, AAVE and OAD could be seen as separate languages or dialects (or grammars) and variation in use could be analysed as codeswitching. Roeper (1999) has argued for something along these lines for all users of a language. Variation within a learner’s language of the sort which might allow a child learning English to master and use productively verb-raising in a very narrow set of contexts (say the language of storybooks but nowhere else) would then be accounted for in terms of “bilingualism.” The child would have two distinct “grammars” and would “code-switch” between them. Understood properly, however, it is clear that the child might have a large number of grammars, each distinguished by its irreduceability to some other parametersetting. In the absence of clear situation-based definitions of the impetus for “code-switching”, this makes a mockery of the construct. On the basis of an understanding of perhaps a broader range of cases of variation, Labov points out
42
INPUT AND EVIDENCE
the necessity of distinguishing other types. AAVE and OAD might be seen as distinct but interdependent systems (grammars). On this view, one of the systems may be incomplete and incapable of producing a full range of utterances, and is used merely to enhance the other. Speakers of English who systematically produce German names like Strauss with a palatal fricative (instead of a coronal fricative) might be subsumed under this type of variation. In this case, there is no fundamental incompatability among the rules used to articulate English spellings. Some items might simply be tagged [+German] and pronounced in a way that English-speakers think sounds German. Or, AAVE might be seen as a “mesolectal” or intermediate stage and OAD as the “acrolect” or highest stage on a creole continuum characterising the variation between the most creole-like grammar and the most standard-like grammar. In this case, one would expect to find implicational relationships among the varying phenomena. Finally, AAVE could be viewed as a de-creolised dialect of English with certain creole-like features remaining as historical residue. This view reduces AAVE copula variation to the flotsam and jetsam type. In his analysis of AAVE, Labov argues for parallel and co-existent systems. Observe that if UG were assumed to be the whole sum of grammatical theory, and one sometimes gets the impression from the SLA literature that UG is all there is, then that theory would fail as a property theory, regardless of whether it is applied to monolinguals or L2 learners. UG does not provide a way of accommodating variation. Those who blithely make assertions that linguistic knowledge is not to be described in terms of rules, learned exceptions to rules, or idiosyncrasies of other sorts because current conceptions of UG have no place for such constructs, have either not understood this basic point, or else they have decided not to participate in the development of a theory of second language acquisition. It requires a theory of linguistic competence which can describe everything that a speaker/hearer knows. For further discussion of these issues in the context of first language acquisition, see Meisel (1999) and Valian (1999). If it is legitimate to ask: What are the properties of interlanguage grammars? — and it is — then we need a property theory rich enough to describe and explain all of them. Psychology and psycholinguistics have every reason to be interested in not only variation but also variability and idiosyncratic knowledge since they show how language acquisition is guided (or not guided) by properties present in the environment. Researchers who are interested in how languages are acquired have the right to demand a property theory in which one can talk without embarrassment about context-dependent learning. Linguistic theory has been forced to grapple with the issue of language variation, first in terms of marked and unmarked rules, subsequently in terms of parameter-setting. But
PROPERTY AND TRANSITION THEORIES
43
given the generative focus on the universal, it is hardly surprising that generative theory has tended to ignore forms of knowledge which are context dependent. Although many would readily grant that a theory of language acquisition needs learning mechanisms to acquire these aspects of linguistic competence, there has been little attempt within UG-based second language acquisition theory to develop independent models of learning mechanisms capable of explaining how learned properties of grammars get learned or even could in principle be learned. In the first acquisition literature, such independent models exist and are acknowledged even by those whose primary interest is the study of UG and what first language grammars might have to say about it. We return to such models in subsequent chapters. However, no such independent models exist yet in SLA.8 The L2 literature has focused exclusively on variation deemed to be under the control of UG — the parameter-setting part of the P&P story — and left apparently no room for the characterisation of the acquisition of instances, rules, paradigms, or exceptional phenomena.9 To the extent that P&P theorists in SLA have offered us UG as a property theory, we can legitimately regard it as incomplete. Note that given the way in which UG and mental grammars are characterised as autonomous and modular systems, it is not obvious how to repair this inadequacy. It will hardly do to suppose that a general theory of learning can be “stuck on” to P&P theory as a way of dealing with what has been acquired. General theories of learning are fundamentally incompatible with the P&P framework. An alternative solution must be found which respects the autonomy of the grammatical systems and allows for a fit between those aspects of linguistic competence deemed to be provided by UG and those aspects which are not.10 In addition, it is not obvious that parameter-setting has been good at resolving the central task it has set itself, namely explaining cross-linguistic variation. Theoretical work on grammatical variation has been reinterpreted in SLA as allowing for a restatement of the transfer problem (see Question 4b of Chapter 1). If the grammars of two distinct languages can be said to be characterisable in terms of different parameter-settings, then the knowledge systems of an individual who knows the one language and is learning the other can also be fruitfully characterised in terms of different parameter settings. The state of L1 knowledge can be said to consist of setting M, and perhaps that is what the learner transfers to the L2, but when he has indeed acquired the L2, he will be in a state of knowledge consisting of the distinct setting N. So the learner has to leave state M and enter state N; this is parameter-resetting. Lifting the curtain on the jargon, however, shows that this is just another way of saying that when the learner has successfully acquired the L2 he knows something different from what he knew before. P&P theory in SLA tells us nothing about how the change occurs;
44
INPUT AND EVIDENCE
it can only inform us as to the properties of the states of knowledge themselves. And just what does it tell us? In my view, not very much when one looks closely at the relevant literature. For one thing, there is an on-going problem in the definition of the central theoretical constructs. Despite considerable activity in grammatical research trying to delimit a range of central parameters capable of explaining the discoveries of typological research, there is no agreement on even a handful of proposals.11 Every putative parameter with deductive potential has been picked apart and rejected as empirically and conceptually inadequate. This has become something of an embarrassment for SLA studies as researchers committed to the P&P paradigm have seen otherwise decent empirical SLA research undermined by that nagging critical voice which pipes up: “But no one believes in that proposal any more!” This puts the onus on SLA P&P researchers to continually reanalyse the same data set to make it theoretically acceptable to grammarians. See White (1995) for relevent discussion.12 One might instead ask: Why should grammatical theory be setting the research agenda of SLA? More problematic is the logic itself of P&P theory. It currently consists of lists of stipulations of ways in which languages can arbitrarily vary from one another. Grimshaw (1985) warned about the danger of calling every instance of linguistic difference a “parameter.” She pointed out that parameter-setting theory was interesting for acquisition purposes precisely because it attempted to explain clusters of effects in terms of a single abstract parameter. The clustering provided the theory with “deductive” consequences (Nishigauchi and Roeper 1987). In the intervening years, this constraint on the theory of parameters has disappeared; now any arbitrary distinction gets labelled a “parameter.” Why should we view this as an improvement on earlier versions of generative grammar? In the “old days”, linguists wrote rules with lists of stipulations of ways in which the rules could arbitrarily vary from one another. With time, it was recognised that this approach was not principled. The stipulations were replaced with more abstract and more general formulations of linguistic constraints. Stipulations of arbitrary lists of ways in which individual languages differ from one another are equally unappealing. It is inevitable that they will be rejected too. In recent publications, we can see the logic of Chomsky’s research agenda at work; arbitrary “parameters” are being replaced by more abstract and more general formulations of universal principles. In the most recent versions of generative theory, Minimalist Theory, and, more obviously, Optimality Theory, the formulations are absolutely universal — which raises some doubts about the future of parameter-setting within the generative paradigm. Again, White (1995) acknowledges the difficulties for SLA researchers; unfortunately, she offers no principled solution.13
PROPERTY AND TRANSITION THEORIES
45
Lest I be misconstrued, let me point out what criticism I am not making of P&P theory or its proponents. I am not criticising those researchers solely interested in the question of how SLA research can contribute to the refinement of a theory of Universal Grammar. UG research has been extraordinarily rich and insightful and has made a major contribution to our understanding of human cognition. To the extent that acquisition research can support this approach, it is useful. More specifically, to the extent that SLA research can show that universals are manifested in interlanguage grammars even when the learning conditions of first and second language acquisition are radically different, it will have added significantly to our understanding of linguistic competence. To summarise, I claim that P&P theory fails as a property theory for SLA because it is incomplete and conceptually inadequate for our purposes. Developmental psycholinguistics requires a property theory of universal and of languagespecific phenomena. The more universal the formulation of parameters becomes, the more obvious it will be that UG will not provide the right property theory.14 It is unreasonable to expect those devoted to the pursuit of UG to provide us with one. Descriptive linguistics, first language acquisition studies and research in language parsing are more likely sources of the necessary information we need to construct reasonable models of the specific grammatical systems of individual speakers. We need such models to characterise the adult learner’s initial state (the L1 grammar). If we choose to define “successful L2 acquisition” as the acquisition of a grammar identical to that acquired by the monolingual L1 speaker/ hearer, then such models would also be needed to make that notion precise.15 Now let me turn to a consideration of P&P theory as a transition theory in SLA. It fails in this role too for a number of reasons. First of all, it fails to provide any account of why language learning begins. This is because none of the current P&P SLA theorising is connected to a theory of perception and language processing. It therefore also fails to explain why some phenomenon at a given point in time, and no other, gets acquired. Secondly, aside from some limited discussion of the Subset Principle (see, e.g. White 1986a, 1989b), no one in SLA has bothered to articulate a model of logically prior parameter-(re)settings for a given language such that a learner would need to set P1 before setting P2.16 Thus, the logical relations among parameters have yet to be worked out for the acquisition of any given language. Thirdly, no one in SLA has bothered to articulate the triggers for given parameters.17 Triggers are the logically necessary input to parameters, the minimal type of input for resetting to occur. Unless we know something about triggers, we cannot understand parameter-resetting.18 Triggers are to P&P theory what cues are to the Competition Model, but while proponents of the latter have worked out some of the details of cue extraction,
46
INPUT AND EVIDENCE
cue validity, cue reliability, and cue conflict, no comparable research program has emerged in the SLA P&P literature.19 It will not do to cite learnability studies or research in first language acquisition as if they were a reasonable substitute for SLA activity. SLA researchers have to demonstrate the relevance of treatments presupposing monolingualism to a characterisation of acquired or incipient bilingualism. I suspect that defining “the same parameter” for distinct languages and isolating the relevant triggers for that parameter will turn out to be a complicated affair.20 Fourthly, although interesting results have emerged from research investigating developmental orders, SLA P&P theory says nothing at all about why the things which are first learned are learned first. Moreover, SLA research in this paradigm fails to explain why two different learners can follow distinct developmental paths, what Valian (1999) calls microvariation in development. This would require dealing with the variability issue which, as previously noted, P&P researchers cannot do. Finally, P&P theory also has no account of why language learning stops. There is a deep and difficult problem at the root of this question which involves explaining how learners can be simultaneously sensitive and insensitive to variation in the speech they hear. They must be sensitive enough to cognise, to use Chomsky’s (1980: 69) felicitous term, that what they currently know will not accurately represent what they are hearing. They must be insensitive enough not to detect and encode other types of variation in the speech signal once they have arrived at the right analysis. Otherwise, they would be constantly revising their grammars and would never settle on a final analysis for a given phenomenon. To mention a simple example, a learner of English whose L1 Spanish has null subjects has to detect and encode that subjects occur with extraordinary regularity in English in simple sentences, with weather verbs, presentational verbs and verbs like seem. But the learner cannot be so sensitive that she notices data like those in (2.2) and construes them as evidence that English, like Spanish, has null subjects in the relevant sense. (2.2) Q: Goin’ out? A: Dunno!
If the learner resets whatever parameter is responsible for explaining the difference between English and Spanish with regard to subjects (be it dubbed the ProDrop Parameter, the Morphological Uniformity Principle, or the Morphological Richness Parameter, it’s name is of no consequence to my argument), she should not re-reset the same parameter when she encounters (2.2), as she will if she is exposed to normal vernacular English. This difficult problem, dubbed the catapult problem of endless parameterresetting by Randall (1990, 1992) has been recognised in the first language
PROPERTY AND TRANSITION THEORIES
47
acquisition literature (see also Valian 1990a, b). One proposal to deal with it is that there simply is no parameter-resetting (Bley-Vroman 1990; Clahsen 1990/ 1991; Müller 1994). Proponents of P&P theory in SLA have largely ignored this critical issue. It is complicated in SLA because we must factor in somehow the problem of the learner’s incipient bilingualism. How does an L2 learner decide on the evidence for a given analysis of a parameter? One issue we must address is the validity of what appears to be the default assumption in much of the SLA theorising that as far as parameter-setting is concerned, the learner’s grammars are always in separate “boxes” which never communicate with one another — at least not where parameter-setting is concerned. How else could one put aside the thorny question: How does a single mind/brain establish contradictory settings for the same parameter? SLA researchers appear to believe that the psychogrammars for the L1 and L2 are distinct and separate (in other words, that they are cognitively and functionally autonomous), so that parameter-resetting in the L2 cannot possibly negatively influence the parameter-setting in the L1. Consequently, the acquisition of a contradictory parameter-setting in the L2 does not cause loss of the parameter-setting in the L1. At the same time, however, we are required to believe that parameter-settings can transfer from one grammar to another, and that the grammatical systems can mutually influence one another in additional ways, permitting borrowing, code switching, lexical interference, the readjustment of linguistic category boundaries, language attrition, and so on. How grammars can be autonomous and permeable to cross-grammar influences at the same time is not clear.21 Further consideration of this issue is surely in order. Secondly, it is not obvious why the learner’s prior experience should be necessarily irrelevant for the analysis of the L2. If Finnish is defined as morphologically “rich”, how could a Finnish learner of French, German or English not find these languages morphologically “poor” in their manifestations of morphological marking of morphosyntactic categories? And would a Finnish learner of all three languages notice subtle differences among them which would be sufficient to account for differences in their use of subjects? This seems unlikely. It seems far more plausible to hypothesise that the explanation of the correlation between morphological complexity and the existence of expletive subjects and “obligatory” use of pronouns in non-focus contexts will require a more abstract solution than those proposed in the P&P literature. Finally, consider that in the more recent interpretations of parameter-setting, parameters have been hypothesised to be associated with the heads of particular morphosyntactic categories, more specifically with functional categories such as tense, agreement, or comp (Ouhalla 1991). Somehow, in ways which are not at all clear since paradigm learning is at stake, setting these parameters is said to take place
48
INPUT AND EVIDENCE
“in the lexicon.” The exact properties of the lexicon are open to debate within linguistic theory. What matters for syntactic discussion is simply that the lexicon lies outside of derivational paths for particular sentences. So, parameter-setting is outside of the derivation of a specific sentence. At the same time, generative theory asserts that the lexicon is a memory store, in particular a storage place for the idiosyncrasies and irregularities which could only result from learning — in the conventional sense of the term. It follows that the mechanisms which are responsible for setting lexical parameters cannot operate until (possibly) domaingeneral mechanisms responsible for the induction of individual expressions from the stimuli available have done their work. This proposal makes parameter-setting and parameter-resetting dependent upon learning mechanisms in ways which have as yet not been explored in SLA and which can hardly be called trivial. Above we raised the issue of the functional architecture of the bilingual linguistic system. We can raise it again in thinking about parameters which are lexically expressed. One of the most vexed issues in treatments of bilingualism is the question of whether bilinguals have one lexicon or two, and how the lexical entries interact. The evidence that they do is compelling. See Pal (2000) for discussion. What the implications of interacting lexicons are for the analysis of functional categories and the re-setting of parameters connected to them has yet to be discussed in the SLA literature. All of the questions raised above are legitimate ones to be asked of a P&P theory parading as a explanatory theory of acquisition. Unfortunately, the questions are being asked by the critics of the theory and not by its proponents. It is the critics who have made plain that much current work in P&P theory has confused Gregg’s two types of theories, treating a property theory as if it were a transition theory. An acquisition theory must explain what causes grammatical restructuring and must describe its course. This the P&P theory does not do. I have concluded that parameter-(re)setting is an interesting metaphor, but ultimately one which will not stand up to close scrutiny. I see no reason, therefore, to contest claims by Bates and MacWhinney (1989: 71) that their Competition Model is the only theory which can claim to be a theory of language acquisition. 1.3 Problems with the Competition Model This said, I should add that being the only game in town does not mean that the game is thereby a winning one. The Competition Model suffers very much from its unwillingness to take seriously research into the nature of linguistic knowledge. Indeed, one may question the extent to which the “Competitors” understand contemporary theoretical linguistic research. On the one hand, one looks in vain
PROPERTY AND TRANSITION THEORIES
49
in the Competitors’ publications for signs that structural constructs play a role in processing and acquisition, although it is now abundantly clear that no account can be given of linguistic knowledge without them. On the other hand, the Competitors refuse to question the relevance of the form- and meaning-related constructs they do adopt, as if it were somehow obvious that words and morphemes have psychological reality while c-command and the No Crossing Lines Constraint do not.22 It is as if the theoretical notions of American structuralism (constructs like phoneme, morpheme, word, or sentence) have ceased to have theory-dependent status simply because psychologists have heard of them (all other more recent theoretical developments being, in contrast, suspect). This is hardly a sensible approach to the characterisation of language. In addition, it is apparent that the Competitors gloss over important linguistic details in experimental work in an attempt to make induction and a structured environment the major explanatory factors.23 It is therefore obvious to many of us that structural representations have no explanatory role to play in this theory of language use. Since the Competition Model also purports to be a theory of acquisition it follows that the structural nature of grammatical representations can play no causal role in explaining the transitions between knowledge states. This remains true despite the rather frivolous response that Bates and MacWhinney (1989: 7) give to their critics.24 The reason is obvious: the Competition Model claims that the mappings between acoustic stimuli and semantic representations are direct. This means that there are really no psycholinguistically relevant mediating phonological or morphosyntactic representations. Any kind of grammatical or semantic primitive must be characterised in terms of patterns of activation of connected nets of nodes standing in for neurons. At best, therefore, grammar is a metaphor for an emergent or distributed knowledge fully characterised in terms of patterns of excitation of the auditory nerve, distributed patterns of node activation in the neural nets doing the cognitive work, and the weighted connections among them. I reject this vision of linguistic knowledge.25 Finally, at least some versions of the model (McDonald 1986; McDonald and Heilenman 1991) explain language development by assuming that incorrect interpretations are a triggering force. Incorrect interpretations are detected either when a given interpretation contradicts the state of affairs perceptible through other means than language, or when the learner is informed that his interpretation is incorrect. We have no evidence yet that this particular type of negative evidence has any role to play in the acquisition of the full range of types of grammatical knowledge.
50
INPUT AND EVIDENCE
1.4 A third approach: the Autonomous Induction Theory I intend to take a route different from both of these approaches. On the one hand, I shall go beyond my original claim that feedback and correction have a role to play in adult SLA, to show how it might work. On the other hand, I shall elaborate an independent model of SLA called the Autonomous Induction Theory. It is based on a theory of cognition developed by Ray Jackendoff (1987, 1996a), called Representational Modularity. The Representational Modularity model posits that cognitive universals play a fundamental role in explaining what we know about language and how we come to know it. These universals include Universal Grammar, but UG is just one of several types of universals in the arsenal of the human cognitive system. It also starts from the assumption that the mappings between stimuli in the environment and meanings are crucially mediated by representational “languages of the mind” which have significant structural properties (Jackendoff 1992). These languages are quite distinct from one another, and cannot be reduced to a single general type. Representational Modularity is therefore incompatable with the view that there is a general theory of learning in which the same processes and constraints operate across all cognitive domains. The Autonomous Induction Theory adopts representational modularity. This assumption alone makes the Autonomous Induction Theory different from all other current proposals, including the Competition Model, and proposals by BleyVroman (1990), Clahsen and Muysken (1986, 1989), and Schachter (1992). It follows from the assumption of Representational Modularity that not everything that is representable in our conceptual systems can be encoded in our phonological or morphosyntactic systems, and vice versa. This means that there will be severe constraints on how conceptual information can interact with information encoded in the specialised representational systems. Consequently, there are important limitations on what feedback and correction can accomplish in initiating grammatical restructuring. However, the theory elaborated is compatible with the hypothesis that feedback and correction, mentally represented in conceptual representations, can have an effect on grammatical restructuring. This makes the Autonomous Induction Theory different from theories claiming that there is no negative evidence in SLA (Schwartz 1986, 1987, 1993; Schwartz and Gubala-Ryzak 1992) or no language acquisition based on metalinguistic information (Truscott 1998b). I take the view that the investigation of exactly what the effects of feedback and correction are can shed light on the nature of modularity and information processing. Indeed, I will argue that the study of adult SLA and, in particular, the study of the nature of the input and evidence to L2 grammatical
PROPERTY AND TRANSITION THEORIES
51
acquisition constitutes an especially interesting domain for the study of psycholinguistic processing and representation. All of these ideas will be developed at length in other chapters. Here it suffices to note that the modular representational systems hypothesised provide the basic codes in which linguistic knowledge is discovered, represented, computed and stored. There is interaction among them but it is highly constrained. Despite its emphasis on universals, the Autonomous Induction Theory is nevertheless designed to provide a role for induction in explaining L2 development. To offer up a transition theory, it must focus in part on induction. Induction has not been a fashionable topic in second language acquisition research of late. Understanding why involves dealing with the so-called “logical problem of language acquisition.” This problem is based on a deceptively simple question: How is it possible in principle to acquire a language? For many years it has been suggested that if one is to answer this question there simply is no alternative to UG-based approaches to linguistic cognition. There is no alternative because induction does not adequately constrain language acquisition. The constraints problem then forces UG on us. This happens to be a statement that I agree with. But one can adopt a universalist/rationalist position with respect to the constraints problem without thereby rejecting induction and committing oneself to the claim that UG is a mechanism of language acquisition. Everything else being equal, one can adopt a universalist/rationalist position without claiming that UG is a unique mechanism of language acquisition. The solution is to carefully distinguish between UG’s function in the property and transition theories. 1.5 The limits of theism and deism as accounts of SLA Gregg analyses current views about UG in SLA, which he labels theism and deism. According to Gregg, theism corresponds to the position that “UG functions in SLA just as it does in L1 acquisition” (Gregg 1994a: 3723). Theism is not directly attributed to anyone, but since it is connected with those defending a P&P approach to SLA we may draw the inference that P&P theorists, and presumably also Gregg himself, are all theists.26 Such individuals hold the position that UG is “accessable” to adult second language learners and all conceptualise UG within the P&P framework. Deism, in contrast, corresponds to the view that “UG ceases to function once an L1 is acquired in childhood” (Gregg 1994a: 3723). A more concrete example of deism is the Fundamental Difference Hypothesis of Bley-Vroman (1990, 1994). The Fundamental Difference Hypothesis states that UG does not “operate directly”, and knowledge of
52
INPUT AND EVIDENCE
UG, specifically knowledge of principles and parameters, is available only through the particular instantiations of principles and parameters in the L1. A similar position can also be attributed to Clahsen and Muysken (1986, 1989), to Clahsen (1988), to Meisel (1991, 1997), and to Schachter (1988, 1990, 1991). These scholars have all stated that first language acquisition can be characterised in terms of parameter-setting but adult SLA grammars do not conform to UG, therefore, UG is not constraining SLA and other mechanisms must be at work. Clahsen and Muysken go so far as to argue that interlanguages are not natural languages. A slightly different position has been articulated by Felix (1985, 1986), who claims that adults have “access” to UG but that this access is in competition with the operation of inductive mechanisms so that adult SLA must take a different course than L1 acquisition. In my view, Bley-Vroman’s influential papers have successfully muddied the learnability waters by confusing the constraints problem with the question as to why L2 learners know different things than L1 learners (the ultimate attainment issue). The Ultimate Attainment Issue is a very complex problem which has received only superficial treatment in the SLA literature. Part of the difficulty in understanding the issue is that many scholars appear to take it for granted that adults can never successfully acquire an L2. The position has apparently become a piece of dogma, especially among American researchers, for which no evidence is required. Thus, one need not define the construct “successful acquisition.” Some appear to believe that successful acquisition can only be defined in terms of the knowledge states and behaviour patterns of monolingual native speakers; this is not the only option and hardly a reasonable one unless one can demonstrate that L2 learners get exactly the same kind of stimuli. In contrast, successful acquisition can be defined with respect to the amounts and types of linguistic stimuli and derived input that learners actually get. Learners have successfully learned if they have constructed specific sorts of grammars on the basis of the input available to them — complete grammars if the input is complete, partial or deviant grammars if it is not. But one looks in vain for information about the nature of the L2 exposure of learners in studies investigating the Ultimate Attainment Issue (Do they hear 5 hours of the L2 a day? a week? a month? Are they communicating largely with monolingual native speakers or with other L2 learners? Do they hear embedded clauses? Topicalisations? Double object constructions? If so, how often and under what circumstances?) Nor are we informed about the bilingual speech practices of the “unsuccessful” learners so we do not know if they are using the L1 or the L2 or even an L3 much of the time. Until these questions are answered and systematically compared with careful analyses of what learners actually know and can do, no conclusions are
PROPERTY AND TRANSITION THEORIES
53
possible about the potential ultimate attainment of adult learners. The fact that an immigrant can live for 30 years in a country with an official language and still speak it with an accent, or manifest imperfect control of the morphology of the L2 is hardly compelling evidence that adults are incapable of learning an L2. Indeed, serious research on the topic, to be discussed later, is turning up individuals who are both highly knowledgable and highly proficient in the L2. Finally, we might ask if it is reasonable within a plausible theory of bilingual cognition to define successful acquisition as meaning “knowing exactly and only what monolinguals know.” The differences observed between monolingual and L2 learner speech practices could be attributed to the fact that L2 learners are bilinguals. Or that they have had far less input, or far less practice of particular speech patterns, or have less highly automated retrieval of the L2 production schemata. There are countless ways to explain the fact that the “perfect bilingual” is a rarity.27 We need not resort to fundamental representational differences in FLA and SLA to account for that observation. To make a learnability argument, and to make the case for a fundamental difference in the nature of L1 and L2 linguistic knowledge, one needs to demonstrate that the the grammars of L1 learners and adult L2 learners are indeed fundamentally different. In my view, there is no convincing evidence for such fundamental representational differences, although the evidence for difference is increasing.28 That so many have been willing to jump into the water after Bley-Vroman shows that the issues are complex. We need not, however, feel compelled to discuss SLA in these terms. There is a third stance that one can adopt within a universalist/rationalist perspective, the stance I shall defend, which is that the “accessibility” debate between theists and deists is a debate about not much at all since UG is not a learning mechanism.29 Therefore, while it is possible to ask if interlanguage grammars manifest properties of UG, it is not possible to ask if adult SLA learners “access UG in exactly the same ways that infant L1 learners do.” These questions are not equivalent. The debate between theists and deists has been, however, only tangentially about whether interlanguage grammars exhibit properties of UG and has mostly been about whether the processes and stages of development are the same. In my view, the latter discussion misconstrues the nature of UG. UG can be understood as providing certain initial limitations on the representational systems used to encode language prior to linguistic experience. This means that for neonates about to acquire a first language, we can think about UG as the initial state of their knowledge.30 UG is not the initial state of knowledge for anyone who has had any kind of exposure to a language and is about to acquire another. Moreover, it is not a box in the mind/brain that the
54
INPUT AND EVIDENCE
learner consults when building a grammar, as the access metaphor would have it. I like to think of it as an emergent property of all mentally represented grammars. The representational system of the newborn rapidly develops into a language-specific, multiply-differentiated system, closely attuned to certain properties of the stimuli which comprise the linguistic environment. This language-specific, multiply-differentiated system is consistent with analyses of UG. Rather than UG, it is the language-specific, multiply-differentiated representational system which is the initial state of adult L2 acquisition. Let me rephrase this hypothesis: The initial state of adult L2 acquisition consists of the L1-specific parsing procedures and production schemata, the L1-specific contents of the mental lexicon, and possibly some performance-independent knowledge store that we call the mental grammar of the L1.31 Consequently, this representational system is also consistent with analyses of UG. It is wrong-headed to characterise the differences between first and second language acquisition by suggesting that children “access” UG while adults do not. Neither group of learners is “accessing” UG in the sense that they are activating stored information in a UG-black box when they have to learn to represent, e.g. which lexical item in an incoming string is the tensed verb. When we abandon the dynamic vocabulary which theists and deists alike use, we no longer can ask: Do adult L2 learners access UG? This question simply makes no sense since the biologically-given initial representational capacity is not located somewhere to be accessed. The only linguistic representational systems which can be initially accessed are those in which the L1 is encoded. If we abandon the misleading access metaphor, we are suddenly free to ask a different question: Do adult learners in fact encode L2 stimuli in the same representational systems as those used to encode their L1? Or do they encode L2 stimuli in some other non-linguistic systems? The debate about the Fundamental Difference Hypothesis notwithstanding, I believe that the answer to these questions elicits some rare consensus in the field: “Yes!” This is my interpretation of the widely accepted hypothesis that adults transfer knowledge of the L1 grammar to the tasks of parsing and producing the L2. Transfer is just the name we give to the fact that L2 speech is encoded in terms of the same categories and patterns of the L1. Transfer appears to require as a logical necessity that the encoding systems of the L1 and the interlanguage be of the same type because transfer is the analysis of a phenomenon X as being an instance of category Y. I fail to see how such equivalence relations could be established if, for example, one were somehow to attempt to encode syllable structure patterns in terms of the dynamic properties of event structures, or the temporal properties of musical cognition. The very idea boggles the mind. In fact, although there is
PROPERTY AND TRANSITION THEORIES
55
much talk about SLA involving other non-language-specific systems, no one has made any serious attempt to argue that phonological and morphosyntactic phenomena in interlanguage grammars indeed manifest the properties of other codes. Thus, the existence of transfer, indeed its prevalence in characterisations of L2 behaviour and L2 knowledge provides a strong prima facie argument for the hypothesis that interlanguage knowledge must be encoded in the same “languages of the mind” as those used to encode the L1. Having established that the same representational systems are used in parsing and producing L2 speech, precisely because transfer is so much in evidence, we can then ask: Do these representational systems manifest the same basic properties as those deployed by monolingual native speakers of the L2? The answer to this question must, in my view, also be “Yes!” The UG-based research cited above shows satisfactorily that interlanguages exhibit hierarchical as well as linear order, structure dependency reflecting knowledge of syllables and feet, maximal and intermediate morphosyntactic phrases, c-command, subjacency or other “island effects”, and other universal properties of natural language systems. No one has had great difficulty in applying the constructs of contemporary linguistic theory to L2 data, presumably because L2 grammars are fairly well-behaved, despite what Clahsen and Muysken (1986) assert. In short, there is an alternative position to the theist and deist positions sketched by Gregg which is consistent with the universalist/rationalist position.32 On the one hand, I deny that we will see evidence of UG in the interlanguage grammars only insofar as structures of the L1 are transferred into the interlanguage grammar, which is essentially Bley-Vroman’s position. On the other hand, I also deny that UG operates in L2 “as it does in the L1” because to claim this is to grossly misunderstand the putative biological basis of UG. At the same time, I deny the assertion that UG “ceases to operate” after childhood in the sense that language acquisition must then necessarily be caused by domaingeneral problem-solving mechanisms completely unrelated to linguistic cognition, and that interlanguage grammars are therefore to be expected to be rogue grammars. There is no evidence to date that domain-general problem-solving can explain any of the above-mentioned properties of interlanguages. Most claims that L2 acquisition takes place on the basis of problem-solving are not supported by any serious investigation of what these general problem-solving mechanisms are, or of how they would work. The one counter-example is provided by the various researchers associated with the ZISA project.33 The ZISA team applied to SLA proposals by Slobin (1973, 1977, 1979; Slobin and Bever 1982) made for primary language acquisition. The purpose of these proposals was to show how a learner could construct morphosyntactic
56
INPUT AND EVIDENCE
representations either from a primitive semantic system or from a primitive perceptual system. In other words, Slobin was interested in developing a transition theory. Since Slobin rejects the hypothesis that linguistic cognition is grounded in Universal Grammar, which would delimit the basic nature of the child’s grammatical representations, his problem was to explain how the learner could create a grammatical mode of representation from some other non-linguistic mode. He developed a number of pre-theoretical strategies or procedures, his operating principles, which were designed to capture generalisations about the nature of early child language, while making minimal assumptions about the child’s a priori cognitive system, and which would amount to mappings from the perceptual and the conceptual systems onto morphosyntactic representations. Slobin originally believed these proposals would be domain-general but has since granted that they are domain-specific since they mention units which exist only in linguistic representations, e.g. “Pay attention to the ends of words” (Slobin 1985a: 1243). Their status in the current debate between theism and deism has therefore become unclear, but this is ultimately unimportant. Slobin’s proposals are in the end nothing more than descriptive generalisations desperately in need of a theory of semantics — to motivate the correspondences between the syntax and meaning — as well as a theory of sound structure and speech perception — to motivate the correspondences between the speech signal and the syntax. Since Slobin has no theory of structure, meaning or the perception of sounds, he has no theory of domain-general processes relevant for language acquisition. Neither do the researchers who resort to the same domain-general strategies to describe SLA. Thus, while the research which has emerged and continues to emerge from the ZISA project makes a major contribution to our understanding of L2 development, it cannot be used to assert that domain-general learning processes explain SLA (see White 1991b; Epstein, Flynn and Martohardjono 1996, for other criticisms, specifically of Clahsen and Muysken 1986, 1989).34 In addition, the argumentation of Clahsen and Muysken that interlanguages are not natural languages because they appear to require movement rules which are not structure-preserving, root or local transformations in the sense of Emonds (1976), while clever, is ultimately unconvincing. Clahsen and Muysken’s argumentation hinges solely on the properties of rules explaining the distribution of constituents in the learners’ speech across developmental stages. In other words, they claim that in order to explain transitions from one interlanguage grammar to another, one must postulate rules which move constituents to the right in a derivation, deriving a novel morphosyntactic representation (in something like the traditional notion of a surface structure) from some other more basic morphosyntactic representation (the canonical word order). Such rightward
PROPERTY AND TRANSITION THEORIES
57
movement rules are excluded from current versions of UG. Therefore, so the argument goes, such grammars are not natural-language grammars. These properties, however, have to be attributed to the learners’ grammar only if one is also forced to grant the necessity of the assumptions displayed in (2.3–2.5) (2.3) that the data Clahsen and Muysken have at their disposal (spontaneous production data) is a direct reflection of the learners’ grammar and not a hybrid reflecting both properties of the grammar and the operation of the production system.
Since Clahsen and Muysken did not investigate what the learners could parse, we do not know if there were significant differences between what the learners could cognise and what they could control in production. We need not assume that production data is necessarily a faithful reflection of the learners’ cognised grammatical knowledge since there is independent evidence that learners learn specific phenomena (that is, mentally represent them) before they attempt to produce sentences containing those phenomena and long before they control the phenomena automatically in production (see Sharwood Smith 1986, for discussion). Moreover, there are good reasons to hypothesise that what learners can produce at a given moment in their linguistic development is severely constrained by the nature of the production system (see Pienemann 1998a, b). (2.4) that the data at their disposal informs us uniquely about the learners’ morphosyntactic systems.
Clahsen and Muysken simply assume that because they are interested in morphosyntax, their data is informing them about morphosyntactic representations and morphosyntactic operations in the interlanguage grammar. Another interpretation is, however, possible, namely that when learners are at the stage where they are consciously trying to correctly order lexical items in a sentence, they are manipulating prosodic constituents, namely prosodic words in prosodic phrases and intonational phrases. It could be argued, consistent with Jackendoff’s (1987) position that we project prosodic representations into conscious awareness, that what learners align when they consciously construct output strings which are different from the over-learned patterns which the L1 production system makes available, are prosodic constituents. In other words, when learners are consciously trying to put the verb in the right place, what they are manipulating are prosodic constituents in the initial construction of a phonological production schema. C-command is irrelevant to prosodic structure, and prosodic constituents have a much flatter organisation than syntactic structure (Nespor and Vogel 1986). It remains to be seen if realignments of prosodic constituents violate principles of UG.
58
INPUT AND EVIDENCE
(2.5) That the operations used to describe post hoc changes to an interlanguage grammar can be equated with the derivational operations of the mental grammar.
This assumption is essential to buttress Clahsen and Muysken’s conclusion that the data from their L2 learners provide evidence of rules which are not permitted by UG, namely right-ward movement rules. To make this argument, we must assume that learners are attempting to establish Slobin’s canonical word order, that is to say, a prototypical “basic sentence.” This construct is, however, not obviously compatible with the grammatical theory which Clahsen and Muysken adopt. There certainly is no reason to assume that the learner’s mental grammar at the “verb-final” stage comprises a subject verb object order to which some kind of rightward movement rule must apply to get the constituents into the correct surface order. Indeed, there is no reason to assume either that learners create a canonical word order to which leftward movement rules apply to derive actual utterances.35 It is just as plausible to assert that the developmental change involves adding an additional representation to those already encoded which directly reflects verb-final order. Under this view, the learner constructs a variety of representations for each sentence type and none need by regarded as more “basic” in the sense that other sentences are derived “on-line” from it. The learner thus maintains all previous encodings but increases over time the accessing of those which more accurately match the L2 input.36 To summarise, one need not grant any of the points which the Clahsen and Muysken analysis require. Therefore, we need not assent to their claim to have shown that interlanguage grammars are non-natural grammars. Here is just once instance where the failure to be precise, in psycholinguistic terms, about the exact relationship between grammatical theory, psychogrammars, the functional architecture of the language faculty, and one’s data set leads to problems of interpretation. What ought to strike even the casual observer is that Clahsen and Muysken’s argumentation is really too clever. They do not attempt to claim that interlanguages do not exhibit truly fundamental properties of natural languages such as those mentioned above, although there is no reason to expect such properties to emerge in interlanguages through the application of general problem-solving mechanisms or Slobin’s operating principles. Similar remarks can be made of Meisel (1991, 1997). Meisel’s careful analyses demonstrate quite clearly that child L1 learners and adult L2 learners are constructing different representations and are sensitive to different sorts of properties in the input. There can be no doubt, given Meisel’s analyses, that L2 acquisition is not “just like” L1 acquisition — a conclusion which will be further substantiated elsewhere in this book. Moreover, Meisel’s conclusion that parameter-resetting gives
PROPERTY AND TRANSITION THEORIES
59
a poor account of the L2 data appears to me to be correct. It does not follow, however, that UG has no part in a theory of SLA. It follows even less that general problem-solving mechanisms can provide a better explanation. In short, I think the critique of universals in SLA is baseless. The real problem is the invocation of parameters to characterise linguistic change and to serve as the sole or major mechanism in a transition theory. We need to rethink all of these claims and to return to a consideration of the basic problem Slobin was attempting to explain in the context of first language acquisition: How does the learner acquire linguistic knowledge on the basis of the speech in the environment? Or, what are the answers we must give to our Questions 1 to 4? Let us return again to Question 1: Question 1 What is the nature of the raw material on which particular grammatical generalisations are formed?
My answer to this question is that the raw material depends entirely on the nature of the grammatical phenomenon to be acquired and on the nature of the representational system in which that phenomenon will be encoded. The raw material of learning is not, therefore, the objective material of the speech signal.
2.
Summary
In this chapter I have been concerned with the essential properties of an explanatory theory of SLA. Following Gregg, I have argued that we need a property theory which defines for us the essential properties of grammatical systems and a transition theory which tells us how grammars can change. In addition, I have argued that we need a processing theory which will tell us how stimuli and input can enter the linguistic system. Finally, we will need a theory of learning to tell us how information becomes accessible to the learning mechanisms. Neither the P&P theory nor the Competition Model captures all four aspects. The Autonomous Induction Theory does.
Notes 1. See additional discussion in Gregg (1994a, b, 1996). 2. The same problems face the avatars of P&P theory, namely Minimalism and Optimality Theory.
60
INPUT AND EVIDENCE
3. A second mechanism linked to learning models is association, exploited in SLA research largely by connectionism, embodied in the Competition Model (Bates and MacWhinney 1981; Bates, McNew, MacWhinney, Devescovi and Smith 1982; Bates, MacWhinney, and Kliegl 1984; and especially, MacWhinney 1987a, b, 1989, 1994, 1995) and in Anderson’s Adaptive Control of Thought model (Anderson 1983, 1985). The Competition Model has spawned some very interesting investigations into preferred analyses of sentences by bilinguals under informationally reduced conditions. Some of this research will be discussed in more detail later in the book. The ACT model gets occasional mention in the SLA literature, see Towell and Hawkins (1994), or Johnson (1996) but to my knowledge has been largely ignored in empirical studies. I will have nothing further to say about it except to point out that no theory of language acquisition based on simple association will ever explain the complexity of the linguistic competence of L2 learners. Anderson (p.c., 1995) grants that a priori knowledge must play an important role in explaining linguistic competence. As he put it so succinctly: “Language is different!” 4. See Brière (1968), Selinker (1972), Tarone (1972), Flege (1987), Flege and Munro (1994), and Carroll (1992) among other studies for empirical evidence as to this sensitivity. 5. It is still an open question as to what the most basic properties of interlanguage grammars are, and what constitutes convincing evidence for knowledge of language-specific universals. At the earliest stages of linguistic production, for example, interlanguage grammars have been said to provide strong evidence for a paratactic organisation involving, among other functional properties, adjacency but specifically not embedding, see, for example, the discussion in Sato (1990: 29 et passim). Despite claims to the contrary, however, (see Perdue 1996), L2 learners do not, in fact, produce strings clearly contravening the c-command constraint on predication. Strings like in my apartment no good are not multiply ambiguous and, in particular, are not construed — and are not intended to be construed — as saying any of ; , ; or , (or reference to some other contextually desirable but not physically present object). In this string, the learner correctly maps a onto a PP, followed by a negation intended to deny the principal predication. Modifiers operate locally (itself a notion requiring constituency) and predication applies (as required by English) to the left, not to the right. All of this suggests knowledge of the relevant underlying structural distinctions. Equally important, there is no evidence that if learners heard such a sentence as in my apartment no good, they would attempt to construct any of the interpretations just given. Learners ability to make use of local relations in constructing interpretations of utterances must also be explained. In addition, what has been glossed over in most of the literature discussing paratactic organisation, is that such strings preserve the prominence patterns of English prosodic phrases (prominence final). We can just as readily conclude that at this stage of development, speech production is constrained by what the learner can simultaneously encode syntactically and map onto a Prosodic Utterance. It may be that syntactic complexity gets sacrificed in the interest of prosodic wellformedness. This might be either a problem arising from inadequate linguistic encoding or it might be a performance problem arising from the learner’s inability to simultaneously process several levels of encoding within the constraints imposed by working memory. Only appropriate empirical studies will shed light on this issue. Finally, I should say something about subjacency in interlanguage grammars since both Schachter (1990) and Johnson and Newport (1991) have argued on the basis of data involving acceptability judgement tasks that L2 learners do not cognise subjacency in the L2. I am presently sceptical of the validity of these conclusions because acceptability judgements tasks are metalinguistic in nature (Birdsong 1989; Schütze 1995) and, therefore, as will be argued here, involve information encoded in the conceptual system. Since learners are not producing strings which violate subjacency, I take this as evidence (admittedly weak
PROPERTY AND TRANSITION THEORIES
61
evidence, but evidence nonetheless) that interlanguage grammars do not violate subjacency. Alternatively, some scholars (Berwick and Weinberg 1984; Hawkins 1994, 1996) have argued that subjacency is a property of the morphosyntactic parsers. Whether subjacency then belongs to the learner’s interlanguage grammar will then depend upon how this term is understood. 6. Some researchers have taken variability to be an essential characteristic of interlanguage grammars (Tarone 1988). Others have argued (Singh, D’Anglejan, and Carroll 1983) that there is no evidence that interlanguage grammars are more variable than monolingual ones. 7. See Labov (1963, 1966, 1972a, b), Trudgill (1972, 1974), L. Milroy (1980), J. Milroy (1992) and Chambers (1995) for much discussion and empirical evidence on the subtle forms of acquired linguistic knowledge and the richness of the systematic variation. 8. Truscott (1998a) makes a stab at integrating InstanceTheory into SLA. It is far too early to predict what effects this effort might have on the field. 9. Attempting to sort out the variation problem from the variability problem in UG theorising is not easy. Liceras (1988) wades through the conceptual confusions arising from the markedness literature, and the attempt to characterise the core grammar of a specific language, arising strictly from the constraints of UG, as opposed to the peripheral grammar of that same language, arising from making analogies, relaxing constraints of UG and learning. Unfortunately, this distinction, which could do some real work for SLA theorising, has virtually disappeared from public discussion. 10. Golinkoff (1999) makes the same point in connection with a discussion of how children learn the words of their first language. 11. On typological universals, see Greenberg (1966), Ferguson and Moravcsik (1978), Comrie (1981), Hawkins (1983, 1988a). 12. I might add here, as a syntactician trained in generative linguistics, that I find the general level of critical evaluation of new proposals in linguistic theory, made by SLA generativists, pretty low, much lower than that witnessed within grammatical theory itself. Grammarians demand that novel proposals explain at least as much data as the old theory; SLA theorists ought to do the same. 13. Optimality Theoretic approaches to universals, optionality, divergence and variation are in their infancy in SLA, see Sorace (in press), Hancin-Bhatt and Bhatt (1997). I will set aside any discussion of their relevance to the characterisation of universals and L2 acquisition. 14. Shifting the locus of variation to the “interfaces” between syntax and meaning or syntax and phonology, as is proposed in Minimalism does not dull this critique, for to the extent that parameter-setting must interact with, or even be dependent upon, concept or form learning, it ceases to be deterministic and tightly constrained. The problem of making UG compatible with induction becomes acute in such an approach. 15. I think this is not the appropriate definition of successful L2 learning which should, instead, be defined in terms of stimuli and input. 16. The Subset Principle is a principle formulated in L1 acquisition to ensure that learning can proceed on the basis of essentially normal speech to the child, without feedback and correction. Given analyses of input which could be generated by either of two grammars meeting the Subset Condition, the learner automatically adopts the most restrictive one. The Subset Condition thus holds whenever two grammars describe the same set of sentences but one of them (the superset grammar) will also describe additional sentences. Note that there is no evidence that the Subset Principle needs to hold in SLA since learners do learn linguistic information from feedback and correction (the negative evidence of Chapter 1). White (1989b: Ch. 6) re-interprets the Subset Principle as a principle holding of parameter-setting
62
INPUT AND EVIDENCE relations and discusses a number of empirical studies which show that it does not in fact operate in SLA even in this narrower interpretation. White (1989a) examines adjacency relations in interlanguage grammars and comes to the same conclusions.
17. Although Archibald (1998: 66) has pointed out that parameter-resetting can only be understood in the context of a theory of cues appropriate to a parameter. Raising the trigger-problem tends to induce a lot of hand-waving in the direction of the first language acquisition literature, as if proposals from that domain can slip seamlessly into SLA. They don’t. For one example of appropriate research, see Liceras (1999). 18. This has obvious consequences for claims like those in Saleemi (1992) that there is a frequency threshold to be associated with parameter-setting. In the absence of a definition of cues or triggers, it is impossible to test such an hypothesis. 19. See Kail (1989), MacWhinney (1978), MacWhinney, Pléh, and Bates (1985), MacWhinney, (1989), and McDonald (1984, 1986, 1989). 20. Archibald’s adoption of the Dresher and Kaye (1990) trigger-based approach to parameter resetting of parameters relevant to stress is a case in point. Archibald merely assumes that the system can be taken over without change, meaning that the perception of stressed and unstressed syllables, rhythm, and “edges” of prosodic units has to occur independently of the learner’s L1 grammar in order for parameter-resetting to take place. But there is no reason to believe, and indeed good reason not to believe, that L2 learners perceive rhythm and stress independently of the L1 grammar, as Archibald (1994) has himself shown. If learners come to be able to perceive new patterns of stress, it is precisely because something else in the system has already been acquired. 21. One proposal, namely that the contents of the L1 grammar, including parameter-settings, is copied into the contents of the L2 “box” does not of itself deal with this issue since we still need to explain why the contents of the L2 box can restructure given a bit of data but the contents of the L1 box cannot. At the very least, we might need to assume that the learner’s learning is guided by some mental model of what language the stimuli belong to, but this would appear to tie parameter-resetting to metalinguistic knowledge in a way which no P&P theorist is likely to accept. 22. C-command is a relation operative across nodes in morphosyntactic representations which constrains a variety of relations holding of sentence structures and correspondences between conceptual representations and their syntactic instantiations. There are various versions of the relation, a common one is “A node x c-commands a node y iff x and y do not dominate one another and the first branching node dominating x also dominates y” (Reinhart 1976, 1983). The No Crossing Lines Constraint is a constraint holding of autosegmental representations in the phonology which requires tiers to be strictly linearly ordered relative to one another. Constituents on different tiers would not be strictly ordered if the association lines linking them were allowed to cross one another. See Kenstowicz (1994: 317). 23. Most of the experimental work done within the Competition Model has examined various cues for semantic role, in particular agentivity. Anyone hearing a sentence which contains a verb assigning Agent and Patient semantic roles will decide who did what to whom. How he goes about doing it is the processing question. What the Competition Model attempts to show is that language users vary in the kinds of information or cues they exploit since languages vary on such matters as constituent and phrase orders, case-marking, agreement, and so on. The observation is sound and the conclusion probably justified. However, we are not required to accept the Competition Model’s account of how this happens. The experiments done consistently say that various word orders are investigated; NVN, NNV and VNN are usually listed.
PROPERTY AND TRANSITION THEORIES
63
However, in many of the experiments conducted, this description is inaccurate for subjects see or hear noun phrases. Now noun phrases have their own internal structure, and there is no reason to assume, indeed the typological literature suggests otherwise, that the internal structure of the noun phrase is completely independent of the internal structure of the clause, as the Competition Model researchers all apparently assume. I am suggesting, in other words, that anyone learning a language whose NP is head initial has a good cue that the VP order will also be head initial. Similar remarks can be made about the relationship between the internal structure of PPs or other major categories. There is no reason to assume that the only cues for how to relate noun phrases to verbs must be found in the order of nouns to verbs, or in the mappings between the possible semantic roles played by noun phrases and the semantic roles assigned by those verbs. This patterned relationship has received extensive discussion in a variety of grammatical theories, and indeed can be seen as the fundamental idea behind the X-bar theory of phrase structure (Jackendoff 1977), and some of the parameters of the P&P theory (Koopman 1984; Travis 1984, 1989). In ignoring the theoretical linguistic literature, the Competitors have deprived themselves of an important source of ideas about the nature of cues and their relationship to parsing decisions. Pye (1989: 129–30) makes the same point. 24. “Other critics have, in all sincerity, accused us of not believing in grammar at all! Of course we believe in grammar, and in grammatical diversity.” 25. The Competition Model has also been criticised for not being explicit about how stimuli become input to learning mechanisms, see Gibson (1992). 26. Talking of “isms” and “ists” obscures the finessing and adjustments which emerge from careful readings of research over the years. More specifically, we would need to infer that early White (White 1984, 1985a, b, 1986, 1987a) — but perhaps not later White (White 1989a, 1990/1991) — Suzanne Flynn (Flynn 1983, 1987, 1989a, b; Flynn and Espinal 1985), Hilles (1986) — but not Hilles (1991) — Marianne Phinney (Phinney 1987), Bonnie Schwartz (Schwartz 1992), Lakshmanan (1993/1994) — but not Lakshmanan (1989, 1994), and Margaret Thomas (1989, 1993) are all theists. Whether those who stand accused would admit their guilt is a separate matter. Other researchers who pursue a P&P agenda, e.g. Finer (1991), Eubank (1991, 1992) or Thomas (1995) are hard to classify in Gregg’s terms. 27. Including the idea that the construct itself is seriously flawed. See Grosjean (1985, 1989). Notice that the question of whether there is a critical period for the acquisition of specific features of grammars is a separate, although related, matter. 28. See also the critique in Epstein, Flynn, and Martohardjono (1996: 681) which focuses on the ways in which Bley-Vroman confounds representational and processing issues better kept apart. 29. Gregg (1996: 73) and Meisel (1991) make the same point. Gregg informs me (p.c., 1995) that it does not make the theist/deist debate meaningless. I disagree because I am convinced that when SLA P&P researchers actually take the trouble to define the object of their theory — UG — in psycholinguistic terms, i.e. in terms of the relation of the relevant constructs to speech perception, language processing, storage, retrieval, and language production, it will become apparent that principles and parameters are (i) not a representation put to use in a parsing system, (ii) not a mechanism of a learning device, and (iii) not a compiling process in a production system. If they are elements of a representational system which somehow informs the parser — and they are universal — then they can have no role to play in characterising change in the learner’s interlanguage, i.e. in characterising change from the particularities of one grammatical state to the particularities of another. 30. This is inaccurate since we now know that neonates can hear their mother’s voice before birth and have learned to distinguish at least some of its acoustic properties by birth.
64
INPUT AND EVIDENCE
31. To simplify the presentation, I will assume that linguistic competence consists of a performance-independent mental grammar — a distinct “box” in the functional architecture of the mind. It would not matter much, however, if what we call “linguistic competence” were to consist of grammatical distinctions implicit in a variety of types of procedural knowledge lying behind our ability to parse speech, to produce it, to read and write text, to carry out metalinguistic tasks like putting an active sentence into the passive, to make acceptability judgements, and so on. See the discussion of the processing account of memory in Chapter 1. 32. I set aside here the “atheists”, the advocates of theories of acculturation (Schumann 1978), affective variables (Gardner 1985), variation (Ellis 1985; Tarone 1988), and discourse function (Hatch 1978a), who provide neither property nor transition theories, and therefore, cannot hope to offer us an explanatory language learning theory. See Gregg (1990, 1993, 1994b) for to-thepoint discussion. 33. See Meisel, Clahsen and Pienemann (1981), Clahsen, Meisel and Pienemann (1983), Clahsen, (1984, 1987, 1988), Clahsen and Muysken (1986, 1989), Meisel (1983, 1987, 1991, 1995), and Pienemann (1981). Note that O’Grady’s proposals, dubbed general nativism (O’Grady 1987, 1991, 1996) are explicitly not concerned with problem-solving, and hence cannot count as a “serious investigation” of the same. Since I will be examining evidence in Chapters 4 and 6 that language is not the only cognitive domain exhibiting autonomous properties, and hence discussing evidence against general nativism, I will not pursue it further. 34. To be fair to Clahsen, Meisel and Pienemann, it should be pointed out that their published proposals were based on research now 20 years old! All of them are aware of the limitations of these proposals in the light of subsequent research. Pienemann (1998a, b) has, moreover, recouched the findings of the earlier work in terms of his Processability Theory, a theory which deals exclusively with constraints on learner speech in production. Processability Theory is not incompatible with the hypothesis that mental grammars are encoded in representational systems specific to language and manifesting universal constraints. 35. While French is often claimed to have canonical SVO order, Charvillat and Kail (1991) discussed a number of studies which show that French speech, in particular the kind of speech used to and by young children, displays other orders due to the prevalence of left dislocations of the le petit châton, il regarde la souris ‘the+ little kitten, he looks-at the+ mouse’ type. The heavy use of dislocations is, of course, a property of discourse and not to be explained purely in terms of a theory of French sentence grammar. Charvillat and Kail also provided on-line data from a word monitoring task which show that subjects (both children and adults) disregard linear order when processing sentences involving NPs and clitic pronouns in dislocated sentences, meaning that the psycholinguistic status of word order in sentences with clitics and sentences without is quite different. 36. Another option is the tactic that DuPlessis, Solan, Travis and White (1987) take, which is to consider a variety of restructurings whose effects are not directly visible in the production data.
C 3 The representational and developmental problems of language acquisition
1.
Introduction
In Chapters 1 and 2, I made the claim that developing an explanatory theory of SLA will necessarily involve finding a way to link a theory of grammar to theories of perception, parsing and speech production and then to link the resulting product to a theory of language acquisition. I cited Kevin Gregg’s writings and talked about property and transition theories of language. A theory of grammar embedded in a theory of speech processing can provide us with a property theory of language. It defines the nature of linguistic distinctions, namely what the basic primitives of language are, and how those primitives are organised into more complex units and structures in individual psychogrammars. Just as importantly, a property theory spells out what representations cannot be. A property theory relevant for language acquisition will therefore specify the contents of representations of an I-language, by hypothesis what the speaker/ hearer cognises of her language. A property theory of language is thus needed to help us explain what Felix (1986: 82) calls the representational problem of language acquisition, which turns on questions like: What is the nature of the representational systems in which linguistic knowledge is encoded in the mature speaker/hearer? What are the limits of variation amongst the representations needed to account for knowledge of different languages? How do the representational systems in which language is encoded arise in the species? How are they like the representational systems of other faculties? In what respects are they unique? Also included will be questions like: What is the functional architecture of the language faculty? Given a specific representational system, what processes lead to the encoding of stimuli? How does intake become input to a specific language processor? How do representations of one type interact during processing with representations of another type?
66
INPUT AND EVIDENCE
Notice that this is not your average characterisation of the representational problem of language. While discussion of it may begin with issues addressed exclusively by a theory of grammar, it should not end there. In other words, the property theory can and must be explored in at least two different descriptive modes — structural and process.1 As Jackendoff (1983, 1987) has emphasised, contemporary American psychology (and indeed this is also true of much psycholinguistic research) has investigated process at the expense of structure. I feel that the most recent research in SLA which takes its inspiration from the theory of grammar has emphasised structure over process, as if a theory of grammatical description could substitute for a theory of language processing.2 Structure and process are inseparable in language acquisition, but they are distinct ways of thinking about language and we need to find a coherent way of linking them while preserving their distinctiveness. One way to do this is to integrate mental grammars into a language processing framework in such a way that the processes are defined as operations which build mental representations, some of which are categories, some of which are structures. One of the advantages of classic symbolic approaches to processing is that they allow us to define the distinctions language users make in both categorial and structural terms and to define learning as the process of coming to encode the relevant distinctions (see Carroll 1989, for discussion). Thus, if a learner does not cognise either gender subclasses or embedded clauses, meaning that she cannot parse or spontaneously produce them in her own speech, we interpret this to mean that she cannot represent the relevant gender categories or the relevant structures for embedded clauses. Learning gender subclasses would mean then cognising a formal distinction among the subclasses of nouns and learning to identify the cues which signal inclusion in one subclass or another. Learning embedding would mean either learning the operations needed to build sentential complements (if we have reason to think that they are altogether absent from the interlanguage grammar), or learning to apply extant operations to encode representations in just the way demanded by the input.3 Consider another example. We noted in Chapter 2 that it has often been argued that early interlanguage grammars have paratactic organisation. One way to interpret such a claim is to assume that learners initially construct “flat” representations of utterances, as in (3.1a). Learning appropriate grammatical structures would mean that the learner constructs “deeper” constructions, as in (3.1b). The structure in (3.1b) represents a minimal expansion of the paratactic organisation of (3.1a); adopting a theory of syntax which motivates richer phrasal structure might justify imputing a more complex representation of the utterance Me be British.
PROBLEMS OF LANGUAGE ACQUISITION
(3.1) a.
b.
67
paratactic organisation4 E NP
NP
N
N
me British syntactic organisation5 E NP N
VP V
NP N
me
be
British
In this book, talk of process should be construed as talk about what happens when language parsing and production mechanisms construct mental representations which are, in effect, categorial and structural objects. Once this step is made, it becomes possible to ask how language acquisition mechanisms could come to encode the specific processes which build structures and create their constituent parts. Given this perspective, it becomes possible to explain why an adequate theory of SLA must be grounded in a property theory of language adopting structural constructs provided by UG, and including the basic notion of autonomous levels of representation. Language acquisition simply is the encoding of the relevant categorial and structural distinctions in the appropriate representational “language.” Nevertheless, and despite my interest in focusing on language processing and its connection to a theory of acquisition, I feel obliged to begin with a more thorough discussion of representational matters. In Chapters 3 and 4, I therefore intend to examine in some detail how we might conceptualise acquisition in representational terms. This is necessary in order to tease apart, first of all, issues related to innateness and maturation (and hence to Universal Grammar) from those related to induction and instance-based learning and, ultimately, to positive evidence, correction, feedback and other environmentally determined influences
68
INPUT AND EVIDENCE
on learning. I shall argue in this chapter that if UG is construed as a set of potentialities available to children in the absence of exposure to linguistic stimuli, then UG cannot be “accessed” by older learners (child or adult). As noted in Chapter 2, UG is simply not encoded in longterm memory and therefore cannot be “accessed” by any group of learners. The “accessing” metaphor misrepresents in a fundamental way the hypothesis that UG is innate. A focus on representation is also necessary to broach the topic of what initiates learning. In Chapter 1, I stated that language acquisition is “input driven” in the sense that learning mechanisms are triggered when parsing of an input fails. Parsing will fail when the parser does not include a parsing procedure capable of analysing an input datum. Correcting a parse fail will mean constructing a new procedure capable of encoding a novel representation. But this hypothesis requires that we distinguish carefully between processes of encoding on the parsing and production sides of performance. In addition, we will have to distinguish between on-line restructurings and off-line reorganisations or consolidations of representations in longterm memory. On-line restructurings can be assumed to be input driven while the reorganisations described in Chapter 1 as representational redescription appear to have some other source whose operations are little understood. Finally, we must distinguish between grammar restructuring and the strengthening of encodings which makes accessing representations faster and more reliable. I might hear and represent both Vórtrag (with stress on the first syllable) and Vertrág (with stress on the second) and understand from the context, or because someone has told me, that the former expresses the idea of a talk or a colloquium or a public presentation while the latter denotes a contract. I might also be able to systematically discriminate between them when someone else uses them, but still activate both and hesitate between them whenever I myself want to ask my secretary to prepare a contract for a student: Is the right form Vórtrag or Vertrág, Vortrág or Vértrag; Which form should I say? Activation-based models of memory permit us to characterise such phenomena. Finally, we need clarification in order to make sense of current debates about the role of UG in SLA, as opposed to association-based models like the Competition Model or problemsolving type approaches like those suggested by Clahsen and Muysken (1986) or Schachter (1992). At this point in the discussion, it is appropriate to return to the contributions of various acquisition theories to the issues at hand. Recall from the discussion in Chapter 2 that Gregg makes the point that many of the past or current proposals in the SLA literature, which have been drawn largely from developmental and social psychology, fail to constitute even an approximation to an explanatory theory of second language acquisition. This judgement, while a harsh
PROBLEMS OF LANGUAGE ACQUISITION
69
one, strikes me as being correct. McLaughlin (1987), Ellis (1990), LarsenFreeman and Long (1991), Towell and Hawkins (1994), and Meisel (in preparation) all provide detailed reviews of the SLA literature. These reviews reveal that few of the various proposals provide answers to the representational problem of how learners could come to have encoding systems with the properties needed to instantiate grammatical knowledge. Indeed, few of them recognise it as a central issue (hence Gregg’s critique). Past and current proposals fall far short of explanation of how learners learn because most of them fail to specify with any precision what learners learn. In the absence of a treatment of the contents of knowledge, it is hardly surprising that the literature also fails to address the problem of how learners learn on the basis of the actual stimuli that they hear or read. In contrast, P&P theory, insofar as it offers a complete and explicit theory of grammar, meets the requirement of defining what language structures are.6 A theory of what it means to be a human language can be seen as setting the outer boundaries on a theory of language learning. A consideration of the representational problem of language acquisition alone forces us to turn our back on most of what passes in SLA as acquisition theory. But there are other reasons to be interested in P&P theory. Explicitness as to what is being represented permits explicitness about language processing. It is possible in principle to integrate some version of P&P theory into models of processing (both parsing and production). Principle-based parsing theories exist (Pritchett 1992; Gorrell 1995) and are beginning to be explored in SLA by Juffs (1998a, b) who has examined such issues as the divergence of grammatical competence and parsing abilities in very advanced L2 speaker/hearers. Taken together, P&P theory and principle-based parsing theories give us a rich picture of the human language representational capacity. Now let us look at acquisition in developmental terms. We have said we need a transition theory, which will explain how and why grammars restructure. Transition theories are designed to explain the developmental problem of language acquisition (Felix 1986: 82). Research questions related to the developmental problem are: How do representations change over time from some initial stage to the final steady state? What initiates the restructuring of a representation? Are developmental changes local or global? Why does restructuring stop? Is the representational system subject to stimulus-independent processes of integration and reorganisation, or is developmental change always dependent upon the processing of stimuli and/or input? Related to this last question are the additional questions: Must learning be driven by errors? Does it respond to negative evidence? P&P theory offers interesting ideas to some of these questions, in particular, the hypothesis that a small change in one part of the grammar can have broad-ranging consequences in other, seemingly unrelated parts. P&P theory
70
INPUT AND EVIDENCE
also offers a principled basis for restructuring insofar as it requires a theory of triggers, the latter being the necessary input to the learning mechanisms which will inform the learner how a given parameter is to be set. But here we begin to see the gap between the promise of the P&P theory and the focus of SLA research. Triggering in SLA has been broached (although not solved) in the context of accent systems in SLA research adopting generative phonology (Archibald 1991, 1993, 1994, 1999). Archibald follows Dresher and Kaye (1990) and Dresher (1999) in assuming that a theory of parameters must include a theory of triggers. UG, in other words, must provide the learner with information as to what a possible trigger can be. Whether it is conceptually plausible or even possible to stipulate triggers for morphosyntactic parameters like the null subject parameter (or pro-drop parameter) or the verb raising parameter across a range of languages and language types is far from obvious. I will provide an argument below that it cannot be the case that UG provides an independent list of triggers for verb raising. In general, the subject of triggers has been largely ignored in the SLA literature, meaning that a major research effort is required if the P&P theory, as a theory of development, is ever to be more than a “promissory note.” Nevertheless, it is possible to map out a research strategy for an SLA theory which takes P&P as its core. Seen in this light, the SLA P&P research of the last 15 years might be seen as laying the foundation for the creation of a transition theory of language development. If so, the appropriate question would then be: Where do we go from here? We have seen that one of the primary arguments against P&P theory in SLA arises from the fact that at least some researchers appear to have been misled by the dynamic metaphor of “switch-setting” and treat UG as if it were an acquisition mechanism directly involved in the construction of mental representations. While I want to argue that categories, structures and mental representations are psychologically real in the relevant sense, I want to deny that parameters are. Parameters cannot be representations in a psychogrammar which are “accessed” by learning mechanisms. But supposing that we correct this fundamental misinterpretation of P&P theory, one might then ask: If P&P theory offers so much promise, surely the sensible research strategy would be to adopt it and fix the problems, would it not? We gain nothing by adopting the vague views of language prevalent in so much of the SLA literature. While the logic expressed is impeccable, my answer is “No”, for the simple reason that I do not believe that P&P theory can be fixed. My reasons for this were partly stated in Chapter 2 and will become clearer as we proceed through Chapter 3. But first we need to see what the central characteristics of the P&P theory are. In the next section, I sketch out what I take them to be.
PROBLEMS OF LANGUAGE ACQUISITION
2.
71
Principles and parameters theory (P&P)
2.1 P&P theory: the core ideas Generative theories come in many forms — the Principles and Parameters theory (Chomsky 1981b, 1986, 1988, 1991a, b; Chomsky and Lasnik 1995) and its most recent offshoot Minimalism (Chomsky 1991c, 1994, 1995), Lexical Functional Grammar (Bresnan 1978, 1982; Kaplan and Bresnan 1982), Generalised Phrase Structure Grammar (Gazdar, Klein, Pullum, and Sag 1985) and its avatar Head Driven Phrase Structure Grammar, or Optimality Theory (Prince and Smolenski 1993; McCarthy and Prince 1995; Grimshaw 1997). They are all part of the “generative enterprise” insofar as they attempt to model linguistic cognition against a backdrop of presumed linguistic universals. All of them meet the requirement of providing formal definitions of language, and, therefore, could underpin acquisition studies. To date, however, with the exception of work by Manfred Pienemann (1998a, b), almost all generative SLA research has been couched in the P&P theory.7 Its impact in the area of SLA morphosyntax in particular has been decisive. This is so even though the research agenda of P&P theory raises some rather odd problems for acquisition theory.8 Why should P&P theory been adopted so readily in SLA? What is the nature of its appeal? The P&P framework consists of a number of core ideas. The most important is that linguistic cognition arises out of the interplay of linguistic stimuli and a species-specific cognitive faculty whose essential properties are biologically given. The language faculty consists of an innate knowledge system, called Universal Grammar, and an innate language acquisition device, or LAD, whose operations are somehow constrained in such a way as to respect UG.9 UG is that knowledge of language which humans possess in the absence of exposure to speech. It constitutes therefore the initial state of linguistic knowledge which the infant brings to the task of learning the first language. Knowledge of the L1, described as constituting the final state of the speaker/hearer, can be characterised as arising as a by-product of UG in interaction with the environment. The second central idea is that UG makes available for the construction of individual grammars neither rules, as they are normally understood, nor constructions like questions or the passive. Rule-learning has been rejected since it seems to rely on a mechanism of hypothesis-formation and hypothesis-testing. The former is viewed as too unconstrained and the latter as computationally too costly. Constructions are viewed as an epiphenomenon and this claim provides the essential insight into the P&P approach to language-specific generalisations — they arise from the operation of sets of abstract universal principles and
72
INPUT AND EVIDENCE
parameters, the latter being a set of options as to how the universals should apply (Chomsky 1995: 6). A third central idea is that problems of language variation and language typology can be characterised as the differential setting of the parameters provided by UG. If one assumes that the variation among the grammars described by typologists arises as a result of first language acquisition processes, those processes can then be construed as the selection of specific parameter settings in conjunction with learned properties of specific lexical items (Chomsky 1995: 6–7). A fourth idea is that the principles of UG exhibit certain sorts of commonalities. On the one hand, there are principles which are relevant to the characterisation of sentence derivations, namely transformations and the conditions on their operation, as well as the licensing conditions on representations. Licensing conditions include such properties as that nouns must be assigned abstract case, or that a verb must be assigned a tense feature. On the other hand, principles are organised in terms of distinct subsystems, called modules, dealing with subparts of the grammar, e.g. binding theory, case theory, θ-theory etc. (Chomsky and Lasnik 1995: 27). This distinction is necessitated by the fact that these modules or subparts of the grammar may or may not operate within a single level of representation. In P&P theory, in addition to a level of sound representation called PF (for phonetic form), a grammar also contains a level of morphosyntactic representation, with three distinct sublevels — D-structure (which reflects directly the argument structure of expressions drawn from the lexicon), S-structure (which feeds PF and organises the linear order of expressions), and LF (for logical form) which presents the conceptual system with a disambiguated syntactic structure for the purposes of characterising scope phenomena and other syntactically determined types of ambiguity. See Chomsky and Lasnik (1995).10 What exactly is the form of UG? This is the sixty thousand dollar question and we all wish we knew the answer. Stating with any confidence any particular claim from the perspective of the theory of grammar is difficult, especially at the moment, insofar as the various ideas which were commonplace in the 1980s have been modified dramatically in the early 1990s and then even rejected completely as P&P theory has merged into Minimalism, on the one hand, and Optimality Theory, on the other. I will limit myself to those ideas which have had an influence on SLA thinking, which means basically focusing on ideas prevalent in the 1980s and early 1990s. I will divide the discussion into two parts, looking first at substantive universals and then at formal universals.
PROBLEMS OF LANGUAGE ACQUISITION
73
2.2 Universal Grammar: substantive universals Substantive universals are those features from which the basic constituents and classes of a language must be drawn. This featural vocabulary instantiates speech sounds and the atomic units and phrases of the morphosyntax. In Chapter 1, in a discussion of how transducers feed input to phonological processors, I spoke of features, segments (consonants and vowels) and CV syllables. These are essential units or forms of the sound system (the phonology of a language). A theory of grammar must provide a basic vocabulary from which the sound system of any language is constructed. It is therefore assumed that UG provides a set of primitive phonetic features which capture both acoustic and articulatory aspects of speech perception and speech production. The language specific repertoire of segments and prosodic units of a given language (morae, syllables, feet, prosodic words, etc.) will be constructed from this set of primitives.11 The basic typology of prosodic units is also provided by UG. As for morphosyntactic representations, they consist of lexical categories (like noun, verb, adjective, or particle — covering both prepositions and postpositions) and functional categories (like tense, aspect, number, case, etc.). By hypothesis, it is assumed that lexical and functional categories are derived from primitive morphosyntactic features. The basic categorial features [±N, ±V] are cited in the relevant literature but a richer typology must be available in order to formally distinguish the numerous word classes of most languages.12 The basic atomic units of the sentence combine in various ways to form phrasal categories. It is assumed that the combinations instantiate function-argument pairs with the features of the functors either selecting or licensing lexical categories and/or their projections. Selection is usually discussed in terms of idiosyncratic relations between pairs of units, as when the transitive verb paint semantically and categorially selects an NP and an AdjP in sentences like Jessica painted her face green. Licensing is hypothesised to be a general relationship regulating the occurrence of specific categories and features, for example, as when wh-words are said to be licensed in Spec of CP by a wh-feature in Comp under Spec-Head agreement. UG also requires that phrasal categories include a lexical head and that complements and specifiers be attached to heads in such a way that the scope of specifiers be structurally determined and complements and heads form “local domains.” Finally, UG provides for certain correspondences between the morphosyntax and conceptual structures. Derived phrasal categories are put into correspondence with referential categories. Thus noun phrases will typically express individuals or THINGS, while verb phrases will typically express ACTIONS and adjective phrases often express PROPERTIES (see Jackendoff 1983). Categorial selection
74
INPUT AND EVIDENCE
has been argued to derive from semantic selection (Pesetsky 1982). Not all features, categories or classes are, however, instantiated in the grammar of every language. The interpretation of the universality of UG moreover does not require this. Rather, the substantive universals are usually viewed as potential elements of a language — a language learner must choose from the set provided but need not choose everything. In this respect Universal Grammar is more like dim sum than a menu fixe. It constrains the set of offerings but does not impose a specific choice on every learner. It is usually hypothesised that UG provides a full set of basic features and that children somehow recognise which ones are needed in their particular L1, based on the stimuli available, using the stimuli to derive input which in turn are used to construct the relevant L1 categories. Thus, children learning Polynesian languages will construct V and CV syllables based on the shape of the words they hear, while children learning French will construct V, CV, VC, CVC, CCV, VCC, and CVCC syllables, again based on the shapes of the phonological words constituting input to the learning mechanisms. Similarly, children learning English will learn to construct sentences containing V NP NP frames, where the NPs are morphologically indistinguishable, presumably on the basis of stimuli like Give me that glass. Children learning Finnish do not since NPs in Finnish must be morphologically case-marked. Now at precisely this point, the discussion in the theory of grammar literature tends to end, since the specification of the basic repertoire of primitives and a categorial typology appears to exhaust the representational problem. Discussion within the theory of grammar thus leaves entirely unanswered the developmental question: How does the child construct the appropriate categories of the L1? This is a complex issue as yet little understood. Moreover, it is a topic about which P&P theorists have said very little. The logic of P&P theory as a theory of linguistic development would appear to require that children know how to construct the categories of their language in the absence of input or on the basis of minimal bits of evidence. Certainly there are good reasons to suppose that in specific cases children are indeed inevitably drawn to avoid particular types of category-building although the resulting categories might be logically possible. As is well-known, there are constraints on how phonetic features can combine to form segments, for example, a vowel cannot be simultaneously [+high, +low] because the tongue cannot simultaneously be below and above neutral position in the mouth. Such constraints are functional and arise from the nature of human anatomy. Others, such as the fact that highly resonant segments constitute the nucleus of a syllable while less resonant segments occupy slots on the edges appear to arise from general properties of perception
PROBLEMS OF LANGUAGE ACQUISITION
75
which requires maxima and minima of sonority to encode sequences of units. Such constraints do not require the postulation of parameters. Consequently, although languages vary according to the number and types of categories they make use of and there are important restrictions on how categories can combine, the typology of categories has not lent itself as yet to the development of an explicit theory of parameters with deductive consequences for phonetics and phonology.13 Similar remarks can be made about the typology of morphosyntactic categories. To my knowledge, no one has proposed a set of parameters to explain why English speakers distinguish auxiliaries, modals and verbs or transitive and intransitive prepositions alongside particles. Such categories exist simply because the facts of the language require that we postulate them. Now it is at precisely this point that we stub our toes on some of the more extreme statements to emerge from the P&P literature. The question of how the languagespecific categories of a language are to be constructed within a theory which tolerates no talk of rule-learning, constructions, pattern learning, induction, association and so on, remains unaddressed let alone resolved. Category construction does get discussed in the L1 acquisition literature in relation to the various bootstrapping hypotheses which are designed to explain essentially how a learner might be led to construct morphosyntactic categories on the basis of prior knowledge of categories at other levels of representation. We find in this literature the semantic bootstrapping hypothesis (Grimshaw 1981; Gleitman and Wanner 1982; Macnamara 1982; Pinker 1984, 1987; Gleitman, Gleitman, Landau and Wanner 1988), the syntactic bootstrapping hypothesis (Gleitman 1990), and the prosodic bootstrapping hypothesis (Gleitman and Wanner 1982; Morgan 1986; Morgan, Meier and Newport 1987; Morgan and Demuth 1996). In all cases, the argument is that learners are driven by a priori tendencies or predispositions to map categories of one level of representation onto categories of another level of representation. The alternative to encoding categories in this fashion involves learning distributional cues to category location and inducing the categories from such distributional cues (Braine 1963, 1966, 1987). Bootstrapping is thus seen to be part of the solution to the developmental problem of language, a factor in a theory of language acquisition which is designed to eliminate a certain amount of induction. The logic of the bootstrapping literature is straightforward: If the theory of grammar does not constrain the nature of category acquisition, then the theory of acquisition must do so. There is a lively debate going on in the L1 literature as to the exact nature of initial correspondences in L1 acquisition. The semantic bootstrapping hypothesis amounts to the claim that children are guided to the discovery of lexical
76
INPUT AND EVIDENCE
categories by a correspondence operation which maps basic and universally available conceptual categories onto morphosyntactic features. Such a correspondence operation might be expressed explicitly as something like: “If a word expresses a THING concept, then it is a [+N, −V] category.” We might construe these initial correspondences as either learning procedures (that is to say, as something performed by the learning mechanisms) or as speech processing procedures (that is to say, as something performed by a parser and by a speech production mechanism). If we hypothesise that this is a learning procedure, then we will assume that the learning mechanisms are accessing pre-established semantic categories and a set of extant morphosyntactic features to construct an initial lexicon. Note that this interpretation of learning is inconsistent with a modular view of learning like that proposed by Schwartz and discussed in Chapter 1. It need not, however, be incompatiable with a modular view of speech processing. If, in contrast, we interpret these initial correspondences as primitive parsing and production procedures then we might be forced to assume that sentence parsers are initially interactive, resorting to semantic categories expressed in Conceptual Structures in order to assign a formal classification to a phonological form. In any event, at the very least we can assert that the semantic category, the morphosyntactic features and the correspondence rule must all be in place (either innate or acquired) before the child can acquire any forms of the word class noun, if the semantic bootstrapping hypothesis is correct. Moreover, it is also assumed that children construct the repertoire of categories conservatively, meaning that they tend to map quite specific semantic categories onto lexical types and generalise across types slowly (see Pinker 1989; and Bowerman 1987). Gleitman (1990) has, however, countered that children acquire concepts at the same time that they are acquiring grammatical classes creating a logical problem with respect to the order of acquisition. How can children acquire lexical classes based on semantic classes if they have not yet acquired the semantic categories? She also suggests that specific semantic subclasses have to be acquired on the basis of a mapping from distributional properties of words in sentences. Rather than working from concepts to word classes, the child appears to be working from syntactic frames to semantic fields. Verbs belonging to specific semantic fields tend to occur in the same syntactic frame. Thus, verbs expressing MOTION in English tend to occur in frames of the NP V (PP) (PP) sort, e.g. The scent of matthiola blew from one end of the garden to the other while verbs expressing CAUSATION tend to occur in frames of the NP V NP VP sort, e.g. The insecticidal soap made the thrips disappear. Gleitman has argued that a child who can identify such frames has learned a vital cue to the semantic classification of new words. To learn concepts in this way, learners would have
PROBLEMS OF LANGUAGE ACQUISITION
77
to be sensitive to the location of specific forms in syntactic strings and to the semantic field of at least some verbs occurring in them. Hearing a novel form in a known frame would involve assigning the form to the relevant semantic field and then attempting to figure out the specific semantic properties of the word. In the case of a child hearing something like The toy soldier zipped from one end of the field to the other, we can hypothesise that the child would assign zipped to the class of verbs of motion and be driven to distinguish it from all others in terms of the feature MANNER OF MOTION since most verbs of motion are distinguished in terms of this feature. In the case of a child hearing a novel syntactic frame, we must assume that the child would be driven not to assign it to the class of verbs of MOTION but rather to project a novel semantic subclass, following a “Difference in syntactic frame means a difference in meaning” learning strategy. To the best of my knowledge, the developmental sequencing of semantic and syntactic bootstrapping has not been worked out and the specific contribution of each to early development remains unclear. As for prosodic bootstrapping, Morgan (1986) has argued that prosodic patterning provides the learner with an initial segmentation of the stimuli which can simplify the acquisition of morphosyntactic properties by grouping forms together prosodically which belong together syntactically. Kelly (1988a, b, 1989, 1992) and Cassidy and Kelly (1991) have gone far beyond a prosodic “packaging” argument to hypothesise that there are important differences in prominence patterns among the morphosyntactic classes of English which a child could use to learn those classes. Thus, the major lexical classes (noun, verb, adjective, particle) bear prominence while the exponents of the various functional categories on the whole do not.14 A child who had figured out that some functional categories are stressless might then be predisposed to assign novel stressless syllables to extant functional categories and not to the major lexical classes, while assigning novel stressed syllables to the extant lexical classes and not to a functional category. Other potentially useful prosodic cues include the fact that the typical prominence patterns of nouns and verbs differ, with two-syllable nouns typically showing prominence on the first syllable (in accordance with the stress patterns of the Germanic foot) while two-syllable verbs typically show prominence on the second syllable (in accordance with the stress patterns of the “Romance” vocabulary of English). Adults are also sensitive to the relevant patterns and can identify them in nonce forms (Kelly 1988a, b). In both cases, for learning to take place on the basis of prosodic bootstrapping, children would initially have to be capable of analysing the acoustic speech stream in terms of syllables and patterns of syllable prominence, and use variations in such patterns to mark off the edges of prosodic words. They would then have to be driven to
78
INPUT AND EVIDENCE
map these different prosodic patterns onto distinct morphosyntactic classes, following a basic learning strategy: “Difference in prosodic prominence equals difference in morphosyntactic class.” The representation of syllables and syllable prominence would obviously have to have been acquired at an earlier stage of acquisition, before word classes are encoded, to be used to guide the learner into the morphosyntactic representations. Spelt out in such detail, the prosodic bootstrapping hypothesis raises some obvious problems, namely, why children should be driven to map prominence differences onto different morphosyntactic classes but not differences in fundamental frequency, syllable structure, vowel lengthening or any other arbitrary prosodic distinction. Note that children learning Chinese and other tone languages must be willing to map different tones onto different word classes, so why don’t children learning English do the same? The only response possible would appear to be that there is massive counterevidence available in the stimuli provided to English-learning infants, but such a response would require empirical investigation. The same analyses of the speech to children would also have to demonstrate that all of the other possible prosodic distinctions which might be bootstrapped onto morphosyntactic classes are also contradicted by the input to learning mechanisms. At the moment, rather than constraining learning mechanisms, the prosodic bootstrapping hypothesis would appear to increase the number of options that they must compute. At most it can be argued that the computation takes place at one level of analysis “lower down”, that is to say, in the phonology, and that once learning takes place there, then what has been acquired can assist learning at the morphosyntactic level. Gerken, Landau and Remez (1990) and Gerken (1996) have, however, countered that children do not need to wait on the emergence of their prosodic representations to begin to learn lexical categories. Their counteranalysis is similar to that of Gleitman’s critique of semantic bootstrapping in that they claim that children show a precocious ability to do morphosyntactic distributional analysis, thus demonstrating that children can encode morphosyntactic classes and patterns right from the very start of the syntactic stages of linguistic development. In short, morphosyntactic development proceeds at the same time as prosodic development and does not appear to be developmentally dependent on it. One can attempt to cut through the L1 developmental debates, which at the moment are inconclusive, by focusing on their logic. The various appeals to bootstrapping predispositions are an attempt to define a priori cues to morphosyntactic classes based on the hypothesis that the child encodes semantic or prosodic categories first. The a priori correspondences between prosodic categories and morphosyntactic ones or between semantic categories and morpho-
PROBLEMS OF LANGUAGE ACQUISITION
79
syntactic ones eliminate the need for the child to induce all morphosyntactic categories by computing morphosyntactic distributional cues. The counterarguments appear to be based in part on the empirical observation that the child has precocious knowledge of the morphosyntax at the time when she is attempting to produce speech. At the very least it would appear that young children during first language acquisition are simultaneously sensitive to the basic primitives of the various levels of analysis (minimally phonetic, phonological, morphosyntactic and semantic) and are capable of representing cues to categories and establishing correspondences across the levels very rapidly. This suggests that children are indeed driven to characterise the stimuli they get in terms of the types of categories made available by UG and by a universal semantics and that they may have established many of the necessary morphosyntactic distinctions before they begin to produce 2- or 3-word utterances. However the debates as to semantic and prosodic bootstrapping are resolved, we need only keep in mind that the various proposals rely on the existence of detectable cues for a particular morphosyntactic analysis, cues which are not properties of the speech stream. Rather, they are properties of analysed representations intermediate between transduced stimuli and conceptual representations. They emerge from the construction and analysis of prosodic categories like syllable and feet (or prosodic words) or from the construction and analysis of conceptual categories. They may operate from the bottom-up or from the top-down but in either case they are abstract mental constructs. Beyond that, given the considerable crosslinguistic variation we find in category types, it would appear to be necessary to assume that cue learning relevant for category construction is guided by the validity, and conflict validity of cues in much the way the Competitors say it is, once it is granted that cues are part of mental representations constructed by parsing procedures.15 Now let us consider the problem of category learning in SLA. L2 research presents a striking contrast to the L1 literature in that category learning has not been treated as a central problem for the theory to explain. Rather the idea of bootstrapping has been largely ignored and the topic of a priori mapping correspondences tends to get subsumed by the transfer literature. In other words, if the learner were automatically and necessarily mapping categories of the L1 onto the phonetic forms of the L2, as, for example, the Full Transfer/Full Access Hypothesis of Schwartz and Sprouse (1994, 1996) claim, then there would be nothing else to say about bootstrapping. Whatever may be the value of the Full Transfer/Full Access Hypothesis, it does not constitute a theory of category formation. On the assumption that L2 learners are capable of learning novel syllable types, novel morphosyntactic frames or novel patterns of word structure
80
INPUT AND EVIDENCE
(an assumption which hardly seems farfetched), we require a theory which says how they do it. I know of no relevant research studies in SLA which address the issue of how categories are formed in the context of constraints imposed by UG, although there is research on category formation done within other approaches to which I return shortly. The view adopted here is that category formation is one area where grammar-learning exhibits properties shared with other cognitive domains. Linguistic categories are prototypes and exhibit family-resemblances, meaning that categories are formed from exemplars which resemble one another more than they resemble members of other categories. Categories are not constructed on the basis of necessary or sufficient conditions of membership. In every category there are core instances which are easy to classify as well as peripheral instances where we cannot decide (Jackendoff 1983; Taylor 1989, 1994). Just as we may feel that an apple is a “good” exemplar of the category FRUIT so we may feel that a word referring to an animate agentive countable individual is a better exemplar of the category noun than the expression there. Compare sentences like Jessica danced around the livingroom versus There is someone dancing around the livingroom. We may not hesitate to classify the expression Jessica as an instance of the category noun and resist classifying there as a member of the same category despite its position before the verb. Similarly, vowels make better syllable nuclei than nasals which make better syllable nuclei than fricative and stop consonants. Attempting to construct parameters for stating how languages select among the set of categories derivable from the basic set of formal primitives is probably not the right tactic since it presupposes that linguistic categories are unique in all respects. In addition, P&P theory does a terrible job of characterising the prototypical properties of language. Perhaps Optimality Theory is the proper way to formalise the above-mentioned observations since it allows characterisations drawn from the theory of formal description to conflict. My own approach, however, is to ask what we can learn from the research in the Competition Model and other research focusing on cue-based learning in our attempts to understand category learning. Such research asks us to focus on the information which tells a learner where a category is to be found. The Competition Model hypothesises that learners are actively scanning the stimuli for constant or recurrent properties. The learning mechanism detects the co-variation of properties in the stimulus (contingencies among stimulus events). The most reliable and available of detectable properties in the stimuli can, via association, become cues for classes (Bates and MacWhinney 1989; McDonald 1989). Suppose we were to adapt these ideas to a theory rooted in a universalist framework. In the UG-based model proposed here, the assumption is that the
PROBLEMS OF LANGUAGE ACQUISITION
81
learning mechanism scans stimuli and other forms of input, guided by the features made available by UG for encodings of information in specific representational systems. I specifically hypothesise that UG is expressed in the language acquisition device in such a way that the transducers must take in certain features. At each subsequent level of autonomous analysis, the learning mechanisms will have at their disposition only certain representational primitives for encoding linguistic constituents. The learning mechanisms will induce certain classes, using information from a lower level down, from a higher level up (including conceptual information) and from distributional patterns at the same level of analysis. Learning syntactic word classes must, under this view, begin in some a priori capacity to identify a phonological form or a concept as a likely exponent of a given morphosyntactic feature or feature bundle. Accordingly, I adopt the hypothesis that in first language acquisition, there are initial correspondence operations (the bootstrapping hypotheses discussed above), identifying categories across levels of analysis. Subsequently, the learning mechanisms must locate and label categories based on level-internal information.16 What are the implications of this for SLA? Given the prevalence of transfer, we cannot assume that learners bring a priori correspondence strategies of the sorts seen in first language acquisition. Despite much talk that “L1 = L2” in all its essentials (Dulay, Burt and Krashen 1982), we do not find adult L2 learners initially exhibiting N = ACTOR or V = ACTION correspondences. From the earliest stages, adults understand nouns which express other semantic roles, such as EXPERIENCER, and they can express them too (see Klein and Perdue 1992). Nor do we find learners of English restricted initially to using nouns with a monosyllabic or bisyllabic pattern with stress on the first syllable. Indeed, it is well known that the kinds of phonological simplifications exhibited in child speech (with words consisting of either one or two CV syllables) are not found in adult L2 interlanguage. Adults acquire words with various prosodic characteristics based either on the frequency of the forms in the stimuli or on their utility to the learner. Afterall, if the form you need is Würstchen (‘sausage’), then Würstchen it will have to be, even if L1 CV-syllable structures impose vowel epenthesis on the learner struggling to articulate it. Is the Full Transfer/Full Access Hypothesis (FT/FA) the right way to conceptualise transfer? I think not, given that the FT/FA Hypothesis is defined over the interlanguage grammar and appears to be largely concerned with interlanguage production. On its own the FT/FA may describe the fact of transfer but the theory in which it is embedded provides no account for the acquisition of new categories and indeed the hypothesis itself raises the question: How could learners acquire the L2? I think the Schwartz and Sprouse approach misanalyses
82
INPUT AND EVIDENCE
the problem of transfer. In the Autonomous Induction Theory, transfer is conceptualised in terms of the transfer of parsing and production procedures. In the case of speech analysis, the parsers are assumed to be tuned to cues defined for rapid processing of the L1. This means that they will be sensitive to L1-specific cues and especially sensitive to cues which are highly informative. They will also ignore or filter out cues uniquely relevant to the classification of the categories of the L2. In Chapter 1, I mentioned the work of James Emil Flege and his collaborators who have been exploring the issue of segment perception, recognition and the acquisition of novel categories in an L2 phonology (see, in addition to the works cited in Chapter 1, Flege 1987, 1993; Flege and Eefting 1986, 1987, 1988). They demonstrate quite clearly that “matching models” of perception are inadequate to explain speech perception because L2 learners are capable of assimilating distinct acoustic information in the signal to the segmental categories of the L1 phonology. Indeed, Flege has used this fact to argue that exemplars of the L2 drawn from phonetic categories similar to those of the L1 are harder to “hear” as distinct sounds than exemplars belonging to quite different categories. These, it seems, are easy to “hear” as distinct sounds of the L2. Kuhl and Iverson (1995) have developed a theory of perception called the Native Language Magnet Theory which can account for Flege’s findings. According to this theory, segmental perception is organised around sound categoriesas-prototypes. Discriminating among sounds located near the central tendency is difficult but improves as stimuli move further away in the perceptual “space.” This research raises the difficult problem of specifying what counts for learners as “the same thing” and what counts as “different things”, a problem shared, by the way, with other domains of cognitive psychology. Indeed, the existence of equivalence classification in speech perception raises the problem of how L2 segmental learning could take place at all (Flege 1987). If it turns out to be a major phenomenon in SLA, operating, for example, at all levels of representation via the lexicon, then how do learners learn any aspect of the second language? As the literature on cognate recognition shows, this troubling question can indeed be raised with respect to lexical items (Carroll 1992; Pal 1999, 2000).17 Cue-based approaches to category formation offer a way out insofar as cues can conflict and differ in their validity. I think we have to assume that steady exposure to L2 stimuli will involve varying exposure to the various cues to a category. As the presence of different configurations of cues shift, the possibility of novel resolutions to cue conflict will arise. Thus, a cue which will reliably predict in English the presence of a voiceless stop in syllable-initial position, e.g. long-lag VOT, will not be present in Spanish L2 stimuli. Other cues will be present and the learning mechanism must learn over time which are the most
PROBLEMS OF LANGUAGE ACQUISITION
83
reliable and valid. In other words, to the extent that category classification depends on the availability of stable and unchanging configurations of cues, then category membership can be viewed as stable and unchanging. But if the cue configuration can change, and in SLA it clearly does, then category membership can change. When category membership changes, by definition, so do the boundaries of the category. Another issue relates to the question of whether learners can be sensitive to cue types which are not relevant to the processing of the L1, presumably a sine qua non for L2 learning to begin. I have been exploring this issue in the context of the acquisition of gender attribution (part of the acquisition of the noun category in French, see Carroll 1989). I assume that correspondence operations, like all parsing and production procedures, transfer automatically at the initial stage of L2 learning. Lexical items in Jackendoff’s theory of language are defined as correspondences across the different autonomous levels of representation. Not surprisingly, then, we predict that L2 learners will initially transfer automatically the lexically-stipulated correspondences of their first language vocabulary. There is considerable support for this idea in the transfer literature, although Kellerman (1978, 1986, 1987) has shown that learners can quickly form ideas about what will not transfer from the L1. In the case of learners of an L2 with gender whose L1 also has gender, the theory predicts that learners will initially map in their speech production the gender features of the L1 word onto the form expressing the same concept in the L2. The theory also predicts, however, that learners should be forced to construct novel morphosyntactic representations wherever the input unambiguously signals a different gender feature. So a German learner of French who has no evidence as to the gender of the word beurre (‘butter’) ought to attempt to say la beurre in early stages of acquisition because German Butter is feminine. Given appropriate cues to the correct gender, however, German learners ought to restructure their lexical representations for beurre. Restructuring need not mean immediate flawless speech performance since the learner has to suppress the feature of the German items when speaking French, and this requires a lot of practice. Nonetheless, restructuring ought to be measurable somehow in the learner’s behaviour, including the ability to self-correct or to notice the discrepancy between what one has said and what one ought to have said. But what of learners whose L1 does not include nouns morphosyntactically specified for gender subclasses? What of English learners of French? The theory predicts that they will also initially transfer the contents of their lexical entries and they will also transfer any correspondence operations they have. In English, the only way to interpret gender is as a semantic distinction based on the reference of nouns in context
84
INPUT AND EVIDENCE
and relevant only for pronoun selection (Mathiot 1979). We might then ask if anglophones are especially sensitive to semantic distinctions in initially attempting to learn French gender. This was one of the questions investigated in Carroll (1999a). The latter very limited study suggests that initial-state learners are indeed highly sensitive to the potential role that referential categories can play as cues to gender classification. More interesting, however, was the finding that they are also somewhat sensitive to morphological structure as a gender classification cue. In the latter case, the distinctions cannot be transferred from the L1 so they must be part of the way in which the learning mechanisms encode linguistic distinctions. Either the distinctions are just part of the way in which the language faculty analyses gender distinctions, or else they are readily learnable. Either way, the research suggests that learners can learn novel categories given minimal amounts of relevant information. Let me now summarise the conclusions of this section. Universal Grammar is hypothesised to provide a range of representational features from which categories at various levels of analysis can be constructed. In addition, it is assumed that there are a variety of functional and formal constraints on how they can combine. Nonetheless, the major role in constraining the acquisition of categories would appear to fall on the theory of language acquisition. Some constraints may arise in first language acquisition from bootstrapping procedures if learners are predisposed to equate morphosyntactic categories initially with either prior semantic or prosodic categories. In SLA, the major “bootstrapping” procedure may well be lexical transfer. In both types of learning, however, categories appear to be acquired on the basis of properties of the stimuli learners hear. Consequently, we anticipate that cue-based learning will play a critical role in defining the ways in which categories are acquired. 2.3 Universal Grammar: formal universals Let me now turn to a discussion of formal universals. Formal universals are those universals which constrain the ways in which the units of the language can combine. For example, a constraint on the formation of phrases is that every syntactic phrase must include a lexical category of the same type (Jackendoff 1977; Stowell 1981; Emonds 1985). Let us call this constraint the Head Constraint. It considerably reduces the types of connections between lexical items and phrasal categories since it allows only (3.2a) below and precludes (3.2b). (3.2) a.
NP X
N
Y
PROBLEMS OF LANGUAGE ACQUISITION
b.
85
*NP X V Y where “X” and “Y” stand for variables, strings which can include anything including no category.
The Head Constraint coupled with specific properties of lexical items, an appropriate theory of phrase structure, say X-bar theory (plus a projection mechanism and licensing conditions, etc.) is supposed to explain the facts of (3.3).18 (3.3) a. Exhaustion is not a good thing. b. *Exhausted is not a good thing.
A noun phrase must contain a noun as the central syntactic element. Semantically related expressions belonging to other syntactic categories cannot replace the head. Thus, a verb cannot stand for a noun as the head of a noun phrase. In the P&P theory, properties of lexical items are presumed to be stored in the mental lexicon, including descriptions of the basic semantic types expressed by words. I have already noted that phrases are composed of functions and their arguments. Chomsky hypothesises that the lexicon includes a specification of the argument structure of lexical items such as verbs (corresponding to the traditional intransitive and transitive distinction).19 He also hypotheses that the lexicon lists the semantic or θ-roles which a function can assign to its arguments. See Jackendoff (1972: Ch. 2, 1985) and Grimshaw (1990). Properties of lexical items can project into phrasal categories, at each of the sub-levels of the morphosyntax. General properties of UG, which comprise X-bar theory, determine that phrases are organised into functional types, namely heads, complements, specifiers, and adjuncts. These functions are shown in graphic form in (3.4). (3.4) a.
Right branching (or left-headed) phrase XP XP
YP (adjunct)
ZP (specifier) X1 X0 0
WP (complement)
where X is the head and X1 and XP are projections of its features into intermediate and maximal phrases
86
INPUT AND EVIDENCE
b.
Left-branching (or right-headed) phrase XP YP (adjunct) X1
XP ZP (specifier)
WP X0 0 where X is the head and X1 and XP are projections of its features into intermediate and maximal phrases
Languages vary as to the relative order of lexical heads to their complements. In earlier versions of generative grammar, it was assumed that children learned phrase structure rules to capture such effects. Order is now deemed to result from the setting of the head direction parameter, a label I take from Flynn (1987).20 This parameter locates the complement to the left or right of the lexical head. A second parameter regulates the order of heads to adjuncts and specifiers to head (Chomsky and Lasnik 1995: 53). (3.4) illustrates opposing specifications for all three parameters. Other orderings are possible (adjunct, specifier, complement, head; head, complement, specifier, adjunct; complement, head, specifier, adjunct; adjunct, specifier, head, complement, and so on). (3.5) expresses the head direction parameter formally (Lightfoot 1991: 6) (3.5) The head direction parameter a. XP → {Specifier, X′} b. X′ → {X or X′, (YP)} where the curly brackets indicate an unordered set
Now let us consider the matter of typological variation from the perspective of language acquisition and P&P theory. If language acquisition is just the setting of parameters (plus the learning of the idiosyncrasies of lexical items), then external influences on acquisition can be reduced to an absolute minimum. Chomsky observes: … within the principles-and-parameters approach … language acquisition involves the fixing of parameters, yielding what we may call the “core language”, including the lexicon. (Chomsky 1991b: 42)
Phrasal order has, not surprisingly, been discussed extensively in the first language acquisition literature because it is one of the dimensions along which languages vary. Clahsen (1988, 1990/1991), various studies in Meisel (1992), including Meisel and Müller (1992), Meisel (1994b), and Neeleman and Weerman
PROBLEMS OF LANGUAGE ACQUISITION
87
(1997) propose that word order is acquired in child monolingual and bilingual first language acquisition by setting relevant parameters. Part of the argumentation for parameter setting in FLA relies on the fact that first language acquisition of the relevant phenomena takes place very rapidly and children seem to cognise right from the start the relevant relationships between position, finiteness and subject–verb agreement. While children learning German may start out using verbs in first, medial or final position, they statistically prefer final position even before they manifest finiteness or tense distinctions in their own speech (Clahsen 1982, 1988; Meisel and Müller 1992). But more importantly, their development exhibits a clustering of phenomena which have no logical dependency since once they acquire agreement they never put non-finite verbs in second position (Clahsen and Smolka 1986; Meisel 1994b). Finally, child learners are apparently never unsure of the position of verbs in embedded clauses; once they reliably express sentential complements, verbs appear in final position.21 It is precisely this clustering of unrelated phenomena and the speed with which children restructure their grammars which gives the P&P theory appeal and justifies the claim that parameters have deductive consequences in acquisition. I will return to these facts in the next section. Languages also vary as to the extent to which they require phonetically spelt out subjects. The Extended Projection Principle requires external arguments of the verb, in the sense of Williams (1981), to be instantiated in sentences. Thus, it follows that all languages have subjects in all sentence types. The pro-drop or null subject languages, like Italian or Greek, where the subject appears to be missing, permit the external argument to be expressed in the syntax by an empty category pro which behaves referentially like a simple personal pronoun and is assigned a referential value by pragmatic principles. In recent versions of P&P theory, referential categories are assumed to be licensed by features of the agreement system. “Strong” agreement licenses a pro subject and personal pronouns are only used in cases of contrast focus. Thus, we get the following contrast between Greek and English. (3.6) a.
piyen+i me to aftokinito. go+3 with car ‘he’s going in the car.’ b. aft+i piyen+i me to aftokinito 3+pronoun go+3 with car. ‘He’s the one going in the car’ c. He’s going in the car. d. *’s going in the car.
88
INPUT AND EVIDENCE
This type of variation has been studied in both first and second language acquisition. Perhaps the best known L1 study is that of Hyams (1986). She argued that the tendency of early English child language to lack subjects could be attributed to the need to set the relevant parameter. Children would start out with the same parameter setting as children learning Italian (that is to say, set to null subjects) and would then reset the parameter when they discovered expletive subjects like there. Meisel (1990a), using data from learners of French and German, argued against this analysis, noting that it fails to account for the fact that children differentially omit subjects according to person, or that they leave out other categories as well. Bloom (1990) has also noted that children omit various categories and that the fact of omission alone does not motivate a parameter-setting analysis. In SLA, the acquisition of obligatory subjects has also been investigated, see White (1984, 1985b), Hilles (1986) and Phinney (1987). It is argued that early stages of the acquisition of English where, say, Spanish L2 learners omit surface subjects can be explained in terms of the transfer of the L1 parameter-setting. This would mean that Spanish learners assume that English allows pro to appear as the subject of a tensed sentence. When the learners start using phonetically overt expressions exclusively to fill subject position, they are said to have re-set the parameter. For the moment, let us lift our heads from the technical details. We see on examining the literature that it is in the area of formal universals that P&P has had its impact on SLA theorising. The grammars of the L1 and L2 are conceptualised in terms of different parameter settings in those cases where phenomena belonging to the same sub-system of the grammar are organised differently. It is hypothesised that parameter-settings transfer and thus are utilised when L2 learners attempt to speak the L2, or make acceptability judgements or perform other tasks in which they must deploy their linguistic competence. Parametersettings must be reset (in those cases where the L1 and L2 differ) before the learner can exhibit behaviour more in conformity with that of native speakers. When learner grammars change, they are said to exhibit parameter-resetting and this is said to be “explained” by P&P theory.
3.
Well, that looks good! So what’s the problem?
In this section, I would like to examine more critically the claim to “explanation” of P&P theory in SLA. I take it that explanation involves more than that the data can be made to be consistent with the theory. In other words, we will grant that
PROBLEMS OF LANGUAGE ACQUISITION
89
the parameter-resetting account of development explains how second languages are acquired if it excludes grammar types which are not exhibited, if it attributes the right properties to the interlanguage grammars (e.g. predicts and describes the error types which show up in the interlanguage), and if it predicts the right stages and sequences of acquisition, including the prior acquisition of phenomena serving as necessary triggers to the resetting of the parameter. 3.1 UG and the problem of representational realism Part of the problem of knowing what to make of parameter-(re)setting as an explanation of SLA is in figuring out if the explanations are put forth as an account of the nature of representation in interlanguage grammars, or as an account of development. My reading of the literature is that parameter-setting is offered as a solution to the developmental problem of language acquisition; it is supposed to be part of a transition theory. In other words, learners learn what they learn because they have reset a parameter, and they could only learn what they learn in this way. I suspect that some researchers believe that UG is literally constraining dynamic acquisition processes, or even that parameter-resetting itself is such a process. To make sense of such claims requires that we figure out the place of UG in the functional architecture of the language faculty. Chomsky’s own views on the psycholinguistic status of UG and grammars are easily misconstrued. Sometimes when he discusses them, he is referring only to the cognitive distinctions speaker/hearers make. Thus, occasionally, when he speaks of knowledge of a generative grammar being “psychologically real”, he means that the distinctions modelled in his theory in a propositionally explicit way accurately characterise distinctions observable in human behaviour, e.g. via acceptability judgements or in assessments of synonymy, and so on (see, e.g. Chomsky 1982: 27–8). In other cases, Chomsky seems to be suggesting that the ways in which grammarians encode the distinctions are relevant for psychogrammars, that is to say the mechanisms which parse and produce utterances. It is important to understand that these are quite different characterisations and that the distinction matters for the functions we attribute to the rules and representations we propose as part of the learner’s knowledge. Matthews (1991) makes some interesting comments on this point. While on their intended construal, grammars true of a speaker/hearer are psychologically real, it would seem an open question whether the linguistic constructs to which grammars advert, including rules, representations, and the computations that figure in syntactic derivations, are psychologically real. Chomsky (1980: 197) argues that we are justified in attributing psychological
90
INPUT AND EVIDENCE reality to the constructs postulated by a grammar true of [the] speaker/hearer In effect, the psychological reality of these constructs is assumed to be inherited from that of the grammar. But this assumption seems arguable. It might be objected that in the absence of specific evidence for their existence, these constructs must be assumed to be artifacts of the particular way in which the function computed by the speaker/hearer is specified, and as such cannot inherit the psychological reality of the grammar in which they figure. (Matthews 1991: 195).
Matthews is making the point that just because grammars encode psychologically relevant distinctions which are observable in speaker/hearer behaviour (and hence are psychologically real) does not mean that the rules or representations attributed to the grammars must be psychologically operational. In other words, we can construct formal grammars which include derivations without assuming that speaker/hearers construct derivations in real time. Similarly, we can constuct grammars to cover exactly the same behaviours, this time allowing them to include formal rules. It does not follow that speaker/hearers are computing the same rules in real time as they listen to and produce speech. The same holds of parameters. We may legitmately characterise differences across the world’s languages in terms of parameters and parameter-setting without thereby committing ourselves to the claim that learners compute parameter-settings as part of the acquisition process. We might assume that parameters are merely convenient devices which are part of our specification of the function-computing grammars and could be expressed in a variety of ways in psychogrammars. Talk of parameter-setting would thus be very loose metaphor and we would not want to build a theory of acquisition around the claim that learners are literally setting parameters as one of the operations of encoding mental representations. There is, however, a second interpretation of psychological reality, what Pylyshyn (1991) has called representational realism. Representational realism commits one to the view that the cognitive mechanisms one’s theory invokes (rules, representations, principles, parameters, etc.) are explicitly part of the cognitive computations. Let us consider another example. Early work in the L1 acquisition of English claimed that children acquire a rule of plural formation since children could be shown at a given stage to behave differentially towards a nonce form like bick and a second form wug pluralising the former with a voiceless sibillant /s/ and the latter with a voiced one /z/, entirely in conformity with generalisations statable over the adult language (Berko 1958). Such a rule might look like (3.7).
PROBLEMS OF LANGUAGE ACQUISITION
91
(3.7) The plural rule Noun + Plural morpheme = Noun + /z/ /z/ → [−Voice]/ −Voice _____ −Cont.
Since the children could not have encountered the nonce forms in question in the stimuli they got from parents and other caretakers, they could not have learned the correct pronunciation of bicks or wugs by prior exposure. Since the children’s responses were systematic, there had to be some more abstract source of knowledge they were drawing on. Berko claimed that English children learned a linguistic rule. Understanding talk of rule-learning in a way consistent with representational realism would mean that the learner learns certain forms individually at the beginning, and then formulates an algorithm based on the regular case (cat/cats, dog/dogs, bunny/bunnies). This algorithm, perhaps something much like (3.7), would then be used to generate all plural forms in production. It would follow that regular plural forms of nouns would never be listed in longterm memory. On an interpretation of the data which is not committed to representational realism, talk of rules would not commit one to the idea that children have an algorithm in their mind/brains which they use to create plural forms in real time each time they produce one. The plural “rule” might be construed as a computational process used only to create plural forms for words heard only in the singular. Once created, such forms could be listed in longterm memory alongside plural forms extracted from the speech stimuli. Or, we might understand the rule as something which basically treats wug as a variant of bug without recourse to any internalised morphophonological algorithm. This process might be called analogy in a structural representational system. It is an implicit process in the view of classical connectionism. As Rumelhart and McClelland (1986, 1987) have shown, one can train a learning mechanism to output regular morphological forms without recourse to explicit structural algorithms. The connectionist network can also handle nonce forms. One can thus get the appearance of rule-governed behaviour without postulating rules as part of the mental apparatus. Rule-governed behaviour can also be accounted for by a generalisation over all of the plural forms the child has memorised one-by-one. By this I mean that the learner might have learned each word that she knows individually based on exposure to specific input, represented each plural form heard separately, and then have analysed the plural forms and abstracted a generalisation about the regular plural which is itself stored as information in longterm memory. Such generalisations are called redundancy rules and are not assumed to be on-line operations involved in either normal parsing or speech production. Usage involving learned forms would normally involve “looking up”
92
INPUT AND EVIDENCE
stored representations and not computing a representation of the plural. In the case of novel expressions previously not encountered, the redundancy rule could be invoked to create the plural forms. In short, there are a variety of different mechanisms which could be invoked in a psychogrammar which give the appearance of rule-governed behaviour without necessitating the postulation of structural algorithms as real-time computations. This example should make clear the implications of representational realism. For the representational realist, rules are part of the psychogrammar used to parse and produce speech. The representational realist therefore invokes an explicit connection between grammatical representation and theories of processing. Is Chomsky a realist when it comes to UG? Yes, definitely, since he states that UG, the initial state of knowledge of the human child prior to experience, is coded in the genotype (Chomsky 1980: 83). Is Chomsky also a representational realist? I have found no clear evidence for this and a good deal of evidence to suggest the contrary.23 Chomsky does state that the steady state knowledge system, what I am calling here the I-language, is represented in the mind/brain (Chomsky 1980: 83) and hence can be assumed to be involved somehow in mental computation. But Chomsky does not state that principles and parameters are algorithms or functions mapping one representation onto another. On the contrary, he states only, in a variety of works greatly idealising the nature of acquisition, that UG can be considered to be a function mapping experience onto the steady state. This, it seems to me, does not commit him to saying that UG is part of the computations of the psychogrammar, directly consulted or “accessed” every time some stimuli is analysed, or even every time some rule is formed. Such a position leaves entirely open the ways in which UG comes to constrain the language acquisition process. Universal constraints on linguistic cognition might be realised in the structure-building processes of the parser. Berwick and Weinberg (1984) characterise subjacency in just this way. Or, they might be realised in the operations of the LAD. Berwick (1985) has made some proposals along this line. Dresher and Kaye (1990), and Dresher (1999) go even further and argue that the operations of any learning algorithm cannot be independent of the contents of the parameter theory. Other possibilities are imaginable. Chomsky himself has said virtually nothing on this issue. What about SLA P&P researchers? Do they view UG as an abstract and idealised characterisation of the initial state of knowledge prior to any experience with language? Do they view it as part of the genotype, being instantiated in the mature organism in the properties of the L1 grammar? Or, are they representational realists who regard UG as a store of information which can be “accessed” during the acquisition process? I have found no explicit discussion of this critical issue.
PROBLEMS OF LANGUAGE ACQUISITION
93
Some of the core SLA writings prevaricate. Sometimes they describe UG as merely a set of abstract constraints on linguistic cognition.24 Sometimes the literature, however, appears to regard UG as a dynamic acquisition mechanism, “accessed” when the learner constructs mental representations. Observe that the second position necessitates representational realism. My interpretation of the extensive debates about whether adult L2 learners have “direct access” to UG or merely “mediated access” via their knowledge of the L1 (Clahsen and Muysken 1986, 1989; Bley-Vroman 1990; Duplessis et al 1987; White 1991b; Epstein et al. 1996) is quite simply that they make no sense unless at least some L2 generativists are adopting a representational realist stance. This debate makes no sense under an interpretation of UG whereby it is merely a characterisation of the initial state of knowledge, in the genes, prior to exposure to linguistic stimuli. It makes no sense since (i) there is no way to assess in adult L2 learners what their knowledge state was prior to exposure to linguistic stimuli, and (ii) there is no difference under this view between “direct access” to UG and “indirect access.”25 See Meisel (1991) and Schachter (1991) for additional discussion. In what follows, I shall therefore assume that the P&P-in-L2 position is not claiming that language cognition, prior to any exposure to linguistic stimuli, conforms to the constraints and restrictions described by the P&P theory, since this claim has no special consequences for SLA development. I shall instead assume that it refers to the claim that principles and parameters are mentally represented in such a way that they can and do constrain and restrict language learning, can be directly “accessed”, and are causally related to the linguistic structures which learners encode. 3.2 Can parameters be biological constructs? One major assumption of P&P theory is that linguistic development is under strong biological control. I stated that UG was assumed to be universal and innate. It is, by hypothesis, a characterisation of what the infant knows in advance of experience. Postulating a biological basis for language still leaves open a wide variety of possible interactions with environmental stimuli. It makes sense at this point, therefore, to discuss parameter-setting in terms of what Saleemi (1992: 32) has called three modes of learning, which is misleading in that the first two modes involve no learning at all. (3.8) a. b. c.
maturation selective learning observational learning
94
INPUT AND EVIDENCE
Observational learning corresponds to induction. It will be taken up in Chapter 4, so I defer description until then. Maturation involves biological change following a genetically determined timetable. Since all cognition takes place in the brain, cognition is regulated by biological processes, indeed cognition is carried out by biological processes. It is plausible, therefore, to assume that cognitive development might be regulated by biological growth and maturation. Felix (1984, 1986: 114), and Borer and Wexler (1987, 1992) have proposed that certain linguistic principles mature.26 Bertolo (1995) has extended this idea and hypothesises that parameters could follow a maturational schedule. It must be the case, however, that if maturation is involved in the behavioural emergence of principles and parameters, then that emergence must coincide with or follow on some specific change in neurological structure.27 Changes in brain function would then cause changes in cognitive function or cognitive capacity, in this case, the appearance of certain syntactic constraints in the child’s grammar. What is important for my purposes is that “learning” in this instance does not involve specific sorts of stimuli determining the contents of linguistic representations. It would be incorrect to say that maturation does not involve environmental effects because the environment can always have an effect on the ways in which biological growth unfolds (Geschwind and Galaburda 1987). What must be true, however, is that if the environment is having an effect on linguistic development, then it is entirely via its effects on the biological system. I shall have little more to say about maturation in this book. Whatever might be the accuracy of the claims about the maturation of linguistic principles in primary language acquisition, they will be of no relevance to adult SLA since adult learners must be presumed to possess mature brains and central nervous systems. Indeed, were it to be demonstrable that a given grammatical phenomenon was acquired by both groups of learners with the same pacing and order, I would take this to be prima facie evidence that the phenomenon in question could not result in primary language acquisition from maturation.28,29 The second mode of learning, selective learning, involves changes in the mental states of the learner which are largely initiated by environmental effects but which are also constrained genetically and which take place within biologically prescribed limits. Lightfoot (1989, 1991) claims that parameter-setting is an instance of selective learning. Selective learning involves the implementation of domain-specific mechanisms. They might be perceptual, that is to say, mechanisms attuned to only a specific type of environmental stimuli, or they might be representational, that is encoding the stimuli in highly specific ways. Research on animal cognition has provided the best evidence for selective learning. Wiesel and Hubel (1963, 1965)
PROBLEMS OF LANGUAGE ACQUISITION
95
have shown that visual perception in cats will not develop normally unless the felines get highly specific sorts of stimuli, and this within a short period following birth. The normal adult feline visual system is supported by neural connections which must be stimulated during a critical period. In the absence of the appropriate stimulation, the connections wither. Studies of the development of bird calls have provided similar results for certain species — deprived of relevant stimuli some birds develop no calls, while other species develop highly distorted calls, and still others develop normally (Marler 1987; Hauser and Marler 1992). Lightfoot cites some of the same literature, so he clearly intends parametersetting to be construed as a phenomenon of the same type. I return to the proper interpretation of the ethology literature directly. Because Lightfoot takes parameter-resetting to be selective learning, he is committed to the claim that parameters are neurologically expressed. This means that once a parameter is set, it corresponds to a specific brain state or specific neurological structures. This has obvious implications for the claim that SLA involves parameter-resetting. First of all, we must ask: Are we talking about the same parameters? Judging from the generative L2 literature, the answer is clearly “Yes” — there is only one head direction parameter, for example, or only one null subject parameter and a learner with a given setting might have to acquire the other. Now, it follows, under the selective learning view of parameter-setting, that resetting a parameter might entail L1 aphasia, as a consequence of altering the neurological structures supporting the L1 parameter. The prediction is patently wrong. There is no unlearning of the L1 as a consequence of acquiring an L2 (although the two knowledge systems clearly interact, and can be simultaneously activated in the bilingual brain).30 Alternatively, one gives up the claim that the same parameters are involved in L1 and L2 acquisition, again keeping in mind the hypothesis that a parameter corresponds to a particular neurological structure. Selective learning would then lead one to expect that L2 cognition would have to be instantiated in different brain structures or neural pathways. Unfortunately, there is also no compelling evidence to suggest that the L2 is stored or processed in quite different parts of the brain from the L1 (see Galloway 1981; Jacobs 1988; Obler 1993), or supported by distinct neurological structures, although I know of no studies specifically examining the neurology of parameter-setting in SLA. From this it follows either that there is no parameterresetting in SLA, or else that parameter-setting is not selective learning. P&P theory is now up the proverbial creek without the proverbial paddle. If parameters in SLA are not instances of selective learning, and parameters in L1 acquisition are, then it is simply false to claim that the parameters of L1 and L2 acquisition are the same parameters. If they are not the same, their status in SLA
96
INPUT AND EVIDENCE
is entirely unclear and it might turn out to be true that the linguistics literature would have nothing at all to say about the parameters needed for SLA. If we insist that the constructs must be the same, then there simply is no parameterresetting in SLA. Finally, if we reject Lightfoot’s claim, probably the most sensible route to take given current research which shows that language processing involves the activation of large areas of the brain in distributed patterns, then we dislodge P&P theory from its biological pedestal, and we’re left with a metaphor which simply asserts that language development can be described as: “something happens in the grammar of the learner.” 3.3 The SLA P&P theory has no model of triggers Since language variation in the P&P theory is explained largely in terms of parameters, along with the theory of parameters must come a theory of how parameters are set. This involves establishing sets of triggers for each parameter and for each setting of a parameter. This means that for every type of variation identified within the theory of grammar and linked to parameter-setting, there will be some trigger at the same level of representation, a lower level of representation or at a higher level of representation which will signal to the learning mechanism that the language instantiates a given setting. Part of the language acquisition process, therefore, must be identifying the relevant triggers for, e.g. head direction, and then spelling out under what circumstances the parameter will be set. This means that the learning mechanism must be able to distinguish expressions which instantiate potential heads of a given syntactic phrase from expressions which cannot, and must be able to distinguish relevant cues for the head direction parameter from irrelevant cues. In the example in question, the learning mechanism must recognise that read a book is a suitable cue for fixing the head direction parameter in English but that a book reader is not. In short, there must be an interface theory which connects grammatical representations to particular sorts of stimuli. That interface theory must clearly be part of a theory of language processing.31 Analysis of triggering links P&P theory directly to characterisations of stimuli and input and has the potential to provide much needed clarification of these constructs. It is obvious from Lightfoot’s (1991) discussion of triggers, for example, that they can only be abstract mental constructs and therefore cannot be part of the speech signal. Since the triggers have to be derived from the processing of the speech signal, it follows that the theory of acquisition has to be embedded in a theory of speech processing which permits us to explain how the speech signal can be transformed into abstract mental representations in the first place.
PROBLEMS OF LANGUAGE ACQUISITION
97
In addition, Lightfoot makes some very specific proposals regarding the nature of triggers, namely that they have to be locatable within root sentences “plus a bit” (essentially a binding domain). This hypothesis precludes learners looking for triggers in embedded sentences, greatly restricting the nature of the input to the learning mechanisms. His discussion of the acquisition of verb position is provocative and important for one of the claims he makes is that the relevant trigger for verb position in embedded clauses is not the position of the verb itself, thus proving that the most obvious logical evidence may not be the necessary trigger (Lightfoot 1991: 44–56). Instead, Lightfoot argues that particles, their proximity to their verbal bases, and the fact that at least some of them occur uniformly at the end of main clauses may be the critical triggers for object–verb position in embedded clauses and the raising of tensed verbs to V2 in matrix clauses. Thus, Lightfoot cites the examples in (3.9) as potential triggers for verb position in Dutch (3.9a) and German (3.9b): (3.9) a. b.
Jan belt de hoogleraar op Jan calls the professor up Franz steht sehr früh auf Franz gets very early up (Lightfoot 1991: 53, his example (9))
He also cites (3.10) as an example of Dutch “motherese” which would provide a child with main clause evidence for embedded clause verb position. (3.10) Jantje koekje hebben Jantje cookie have ‘Jantje has a cookie’
He might have cited pairs of utterances like those in (3.11) which typically occur in sequence in German parental speech, providing the child with very clear evidence of the variable position of verb stems and their particles and the relationship of finiteness to verb position. This example also illustrates a second trigger Lightfoot discusses, namely the variable position of the sentential negation marker with respect to the verb. (3.11) Fass das nicht an! Nicht anfassen! touch that not not -touch- ‘Don’t touch! Don’t touch!’
If we hypothesise that the child comes to the task of cognising that negation must have scope over the verb in the sentence, then this knowledge, in addition to
98
INPUT AND EVIDENCE
helpful cues like (3.11) would lead the child to assume verb raising and a trace after the particle in simple imperatives — [[[Fass]Verb]Comp … [das nicht an t]VP]. A third trigger involves the position of modals or auxiliaries with infinitives. (3.12) a.
b.
Jan moet de hoogleraar opbellen Jan must the professor up-call- ‘Jan must call the professor up’ Franz muß sehr früh aufstehen Franz must very early up-get- ‘Franz must get up very early’
Lightfoot’s discussion provides an explanation of why children can encode verbs in final position even in the absence of encodings of embedded structures. Moreover, the theory of triggers plus the theory of parameters, provides an account of the rapidity with which children discover the relationship between finiteness (or tense) and V2. They are not, it would seem, constructing hypotheses about verb position and testing them carefully for approval or correction by their parents. It is therefore somewhat surprising that Lightfoot’s analysis in particular, and the issue of triggers and how they are identified more generally, has been virtually ignored in the SLA literature. L2 learners apparently attend to the ambiguous nature of the Dutch and German input based on assumptions rooted in the L1 word orders (Clahsen and Muysken 1986; Meisel 1991, 1997, in preparation; Haberzettel 1999, in preparation). In other words, speakers of SVO languages (like Spanish, Portuguese or Italian) assume that German and Dutch are SVO languages too, presumably by attending to the evidence of main clauses. But we have seen that this evidence is in fact inconsistent — subjects can appear to the left of the verb but so can many other types of constituents. Moreover, if such learners do indeed analyse German and Dutch as SVO on the basis of the position of the tensed verb, it must mean that they are not drawing the proper information from sentences with sentence-final particles, modal + infinitive sequences, or post-verbal negation. The obvious question is: Why not? If verb-raising is linked to such triggers by UG, then L2 learners cannot be identifying the relevant types of sentences as triggers. If they cannot identify the triggers, it follows that they cannot be setting the relevant parameter regardless of the actual position of verbs in their own speech. Turkish learners of Dutch and German, in contrast, appear to attend to the position of verbs in final position. But recall that Lightfoot specifically argues that the necessary triggers must be found in a main clause. Consequently, if Turkish learners treat the position of the verb in embedded sentences as the relevant evidence for underlying object–verb order, they cannot be relying on the same trigger as infants. Once again, if
PROBLEMS OF LANGUAGE ACQUISITION
99
triggers are specified by UG, it would follow that Turkish L2 learners of Dutch and German are not identifying triggers and not resetting a parameter. There are numerous difficulties involved in defining triggers for parameters (Lightfoot 1991: Ch. 1; Clahsen 1990/1991; Valian 1990a, b, 1993). First of all, it must be decided on a principled basis if triggers are to be defined in terms of stimuli, or in terms of input, as defined in this book. In other words, are triggers objective properties of the environment, like the Competition Model’s cues, or properties of mental representations? The logic of UG suggests that it must be the latter, although this position has never been explicitly argued and is inconsistent with the way in which many acquisition researchers talk about input. Quite to the contrary, to the limited extent that there has been discussion of such matters, it has sometimes been claimed that triggers must be frequent features of the environment. Such claims make it seem as if triggers are part of the speech the learner is exposed to — not the consequence of an analysis of the stimulus. Even Lightfoot’s in depth discussion of triggers works from unanalysed sentences to draw conclusions about parameter-setting rather than from the analysis of the strings provided above in examples (3.9), (3.10) and (3.12). A more accurate treatment would define the triggers with respect to representations like (3.13) and a novel string fass das nicht an which cannot be analysed by the parsing procedures which would derive (3.13). VP
(3.13) Spec of VP
V′
nicht
V an+fassen
If the child is constrained to analyse the verb as in the scope of the negation and indeed cognises the position of the sentence final particle an then (3.13) together with the string fass das nicht an could provide the evidence for verb raising. Other possible analyses plus novel unanalysable strings are imaginable as triggers forcing grammatical restructuring. Whatever position is adopted, there will be important consequences for the use of P&P theory in SLA. If triggers are environmental contingencies, one can imagine defining a given parameter uniquely regardless of the nature of the learner. If, however, triggers are input (analysed representations) then what might be a trigger for an infant acquiring the L1 could simply cease to be relevant for an adult acquiring an L2 since the representational systems may be quite different.
100
INPUT AND EVIDENCE
If triggers are defined over mental representations and properties of parameters, then environmentally determined properties might turn out to be irrelevant or even undefinable.32 Do we assume that triggers must be readily detectable, as cues are assumed to be (Bates and MacWhinney 1989: 57; McDonald 1989: 376)? Must triggers also have the properties that valid cues are assumed to have? Or can triggers be associated with the occasional but cognitively significant event? Can parameters be set only by triggers defined on input derived bottom-up from stimuli, as we might infer from Schwartz’ writings? Or can triggers be defined as well from the conceptual system? To date, there has been no serious discussion of such matters in the SLA P&P literature. Consequently, there can be no claim that we understand how parameters are, or could be, re-set. 3.4 How does one set a parameter in the face of ambiguous stimuli from two different languages? It is not clear how a parameter could be set in the face of ambiguous or contradictory stimuli and/or input (Valian 1990b). Thus, both L1 and L2 learners of English will hear sentences with subjects and sentences without, but they must not conclude from this that English has the null subject parameter set to a plus value. This observation raises difficult questions about the timing of parametersetting. In the abstract learnability proposals of Chomsky, this issue need not be addressed, since Chomsky factors out the temporal dimension of language acquisition and assumes simultaneous acquisition of linguistic competence. The temporal dimension is, however, one of the primary aspects of a transition theory. I noted above that triggers might correspond to robust and possibly frequent stimuli. One might hypothesise therefore that parameter-setting or parameter-resetting can only occur after some minimal exposure to relevant triggers has occurred. There is nothing implausible about such a proposal, however, it is nonetheless true that robustness and frequent remain undefined terms. Similarly, it is not clear why parameter-setting should occur only once (Randall 1990, 1992). This deficiency has a particular manifestation in SLA. SLA theorists often talk as if the learner’s two languages were in two distinct and non-interacting “containers” (Romaine 1989: 235), meaning that one can define linguistic knowledge for each language quite separately from the other (except, of course, when it is convenient to assume interaction, as when talking about transfer or code-switching). The presumption is erroneous. Grosjean (1985, 1989, 1995), in particular, has urged us to abandon what he calls “monolingual” or “fractional” views of linguistic competence in bilinguals and to adopt instead the idea that the bilingual’s knowledge systems are integrated in such a way that
PROBLEMS OF LANGUAGE ACQUISITION
101
they cannot easily be decomposed into two separate components (see also Green 1986). This he calls the “wholistic” view of bilingualism. From a wholistic perspective, it is not clear why all of the stimuli the learner is exposed to, regardless of how the external observer might label it, shouldn’t count as triggering data. In other words, it isn’t obvious why data the linguist labels as “English” shouldn’t combine with data the linguist labels as “Italian” to serve as triggers for a single parameter-setting. Notice that it will not do to invoke the adult learner’s metalinguistic awareness that she is learning distinct languages. In particular, it is not clear in a modular system why the learner’s beliefs about which language she is hearing should affect the status of triggers defined for UG. Therefore, we can ask: Why shouldn’t hearing two sets of linguistic stimuli with conflicting values for a parameter induce Randall’s pendulum effect, where the learner sets and resets the same parameter with every new set of input? Why should the learner assume L1 stimuli are permanently characterised and only the L2 stimuli can be subject to parameter-resetting? Obviously, the same problems arise more acutely when dealing with different parameter settings for different registers or varieties of the same language, as the discussion of parallel systems in Chapter 2 made clear. Finally, note that parameter-(re)setting alone does not entail gradual, stage-like acquisition of language (Crain 1991: 601). If it doesn’t, what does? Surely the answer to this question is at the heart of a transition theory of SLA. There are no answers to these questions to be found in the SLA P&P literature. Until there are, it can make no pretences to explanation of the stage-like nature of SLA acquisition. P&P theory thus not only needs to be linked to models of processing, it needs a sophisticated theory of adult bilingual cognition. Given the radically different cognitive systems of young infants and mature adults, it seems somewhat naive to assume “once a trigger, always a trigger.” And yet, this might be one consequence of adopting both representational realism and the hypothesis that second language acquisition is “just like” primary language acquisition in that it is “accessing” the same information. 3.5 The deductive value of parameters is now questionable It is important to keep in mind that the original idea behind parameters was not merely to explain formal variation across languages; that could be explained by postulating any number of cognitive objects and mechanisms, including the phrase structure and transformational rules of earlier versions of generative grammar. Rather, the point was to explain variation without resorting to induction. Parameter-setting, it was originally claimed (Chomsky 1981b: 37–45,
102
INPUT AND EVIDENCE
1991b: 41; Hyams 1986: 5; Williams 1987: viii; Valian 1990b: 120) has deductive consequences; by learning phenomenon W, phenomena X, Y, and Z are also automatically specified and organised in the learner’s grammar. Parameter-setting thus offers itself as an alternate account of contingencies in the structure of representations, or ultimately of contingencies in the structure of events. In the case of the earliest formulations of the head direction parameter, fixing the linear order of the head with respect to the complement was said to determine the order of specifiers as well. They appear on the opposite side. The head direction parameter therefore was to have consequences for the location of quantifiers, adverbs, negation markers, and a host of other elements. It was also to determine the branching direction of the language as a whole since setting the head direction parameter for one category type had implications for the head direction of other categories. English is a head-first language, which makes it a rightbranching language. Japanese is a head-final language, which makes it a leftbranching language. Flynn (1983, 1987, 1989a, b) and Flynn and Espinal (1985) tried to exploit this particular difference across languages and connect it to certain deductive consequences for SLA. In particular, Flynn tried to make the case that branching direction has consequences for the interpretation of pronouns and other anaphoric elements appearing in adjunct clauses since these phenomena crucially involve the construct c-command.33 C-command is the basic component in the definition of scope.34 I can most easily illustrate by discussing specifier scope, an intuitively obvious notion. The determination of specifier scope depends upon c-command guaranteeing that the string the big men and women is ambiguous between a reading where only the men referred to are big and one where both the men and the women referred to are big. This ambiguity follows from the fact that linear order and c-command can interact in interesting ways with conjunction. Since the conjunction of NPs can include an adjective phrase either inside or outside of the NP, the adjective phrase can modify either the NP it is next to, or both conjuncts. See (3.14). Structurally, the adjective phrase can c-command both men and women in (3.14b) but not in (3.14c). In contrast, the string the men and big women is unambiguous and has only the reading that the women referred to are big. See (3.14d). It can be seen there that the adjective can only c-command the NP women.
PROBLEMS OF LANGUAGE ACQUISITION
(3.14) a. b.
103
the big men and women DetP Det
NP AdjP
the
NP
big
NP
conj
men DetP
c.
NP
and women
Det
NP
the
NP AdjP
N′
big
N
conj
NP
and
women
men
d.
DetP Det the
NP NP
conj
men
and
NP AdjP
N′
big
N women
In a right-branching language, the prediction is that the string equivalent to men big and women the will be unambiguous while the string equivalent to men and women big the will be not. To return to the empirical issues investigated by Flynn, we should note that c-command also regulates other relations such as relations between a pronoun and a possible antecedent. Thus, to count as an antecedent for a referentially
104
INPUT AND EVIDENCE
dependent pronoun, a DetP must c-command it. What Flynn tried to show was that the ability of Spanish and Japanese learners to correctly produce or comprehend complex sentences involving adjunct clauses and forwards or backwards anaphora, e.g., When hei entered the office, the janitori questioned the man, or The mani answered the boss, when hei installed the television, depends upon the branching directionality of the L1 of the learner (because of the c-command constraint on anaphora), and ultimately on resetting the head direction parameter. The first conclusion is probably correct, the second probably not.35 In other words, it probably is true that the correct construal of anaphora depends on the branching direction of the morphosyntactic representations of sentences. It is not at all obvious that correct construal depends on the resetting of the head direction parameter, probably because no such thing exists. In the last chapter, it was noted that grammarians have retreated to the hypothesis that parameters are “lexical”, in the sense that they are to be restricted to specifications made in the lexicon involving functional categories (Borer 1984; Chomsky 1986, 1988, 1991b: 42; Ouhalla 1991; Tsimpli and Rousseau 1991a). The lexicalisation of parameters has made Flynn’s specific empirical claims uninterpretable since the specification of complement position with respect to a verb, say, or the stipulation of specifier position with respect to an X′ category is unrelated to the position of adjunct clauses. White (1990/1991) investigated a different version of the theory, namely Pollock’s (1989) parameterised analysis of verb-raising, but came up with mixed results and no real conclusion as to its utility in characterising what her learners were doing. More problematic, however, is that she made no attempt to demonstrate that changes in interlanguage grammars related to phrase orders are triggered by prior changes in the learners’ representations of the licensing factors (Case, Agreement, θ-marking, etc.). Since the entire theory of parameters has been reduced to differences in the characteristics of the functional categories, it follows that if parameter-resetting is to occur, it has to be triggered by the acquisition of the relevant functional categories and their properties. Criticism of the parameter-resetting agenda as a solution to the developmental problem of language acquisition has focused on precisely this issue. Above I mentioned the literature studying the first language acquisition of German which shows a clustering of effects related to the mastery of verb position. I noted that many studies with both monolingual and bilingual children reveal that children cognise a link between V2 position and agreement (or finiteness) and show no confusion about the differing position of verbs in matrix and embedded sentences despite the ambiguous nature of the stimuli that they are exposed to. In Chapter 2, I mentioned Meisel’s careful comparisons of the
PROBLEMS OF LANGUAGE ACQUISITION
105
acquisition of word order phenomena among bilingual infants and adult L2 learners (Meisel 1991, 1997, 1998, in preparation). His research of the developmental sequences followed by both groups demonstrates that they follow different developmental paths. This is awkward for anyone claiming that they are setting or resetting the same parameter, but perhaps not fatal. Different developmental paths might be explained by other intervening (if as yet unstudied) factors. The “cruncher”, however, comes from the fact that children develop correct word order patterns, as predicted by the theory, following on the acquisition of finiteness and tense distinctions, while adults alter word order patterns quite independently of morphological distinctions. Indeed, while many of the learners studied ultimately acquire verb-final position in German embedded clauses and V2 position in main clauses, almost none of the learners studied acquire the finiteness and tense morphology of German verbs. While there have been some attempts to apply the parameter-resetting analysis of word order to the same or similar data sets (Schwartz and Tomaselli 1990; Schwartz 1991; Eubank 1992), there has not been any satisfactory explanation of the disparities in the developmental sequences in L1 and L2 acquisition, or of a specific and uniform temporal ordering in the acquisition of these unrelated phenomena in L1 acquisition (both monolingual and bilingual) and the absence of a temporal relationship in adult SLA. If the reason for invoking parameter-setting in L1 acquisition is precisely to explain why finiteness or tense should be related to the acquisition of verb order, then claiming that adults are setting the same parameter on the basis of the same triggering events ought to lead to the prediction that we find the same clustering of phenomena. The absence of the same clustering effects in L1 and L2 acquisition is a serious problem for the P&P theory in SLA. It is appropriate to insist on the importance of this problem for a developmental theory. In Chapter 2, it was noted that parameters are interesting for acquisition precisely to the extent that phenomena cluster in languages and in development. Grimshaw (1985), Travis (1989: 264), Meisel (1991) have all pointed out that a theory of parameters where there are almost as many parameters as there are dimensions of variation is of little interest for a theory of acquisition. While in principle UG could contain as many stipulations as we might like to invent, there are important computational costs involved even in setting small numbers of parameters (Gibson and Wexler 1994). A theory of parameter-setting with hundreds of independent parameters to set is computationally intractable and likely to be difficult to define in terms of the necessary and empirically plausible triggers.
106
INPUT AND EVIDENCE
3.6 What if there were no parameters in a theory of UG? Crain (1991) offers a radically different perspective on the fact of primary language development. On the one hand, he rejects parameter-setting as an explanation of the developmental problem of language acquisition. He also argues against the idea that principles of the grammar mature. And, following Chomsky, he is assuming that there is no need for a theory of induction. Instead, he argues that the formal universals are present basically right from the start so that one can speak in first language acquisition of “virtual simultaneous acquisition.” Crain argues that if the formal universals are not manifest in children’s performance on various tasks, this is rather a fact about performance. As a fact of performance, it requires that one make explicit the relationship between the psychogrammar and attention, memory, parsing, and production. We do not have any such theories at the present time, so testing children’s knowledge in the face of performance constraints is difficult. Nonetheless, Crain and other L1 researchers are pushing back the moment in development where one can say with some certainty that children cognise certain linguistic principles, making children’s grammars more like adult ones (Chien and Wexler 1990; Grimshaw and Rosen 1990; McDaniel, Cairns and Hsu 1990; McDaniel and Maxfield 1992; Grodzinsky and Kave 1993/ 1994; Crain, Thornton, Boster, Conway, Lillo-Martin and Woodams 1996). Crain’s position with respect to UG has the virtue of being extremely parsimonious because it eliminates parameter-setting as a mechanism of selective learning, and it relieves one of the necessity for looking for independent biological evidence for the emergence of such late “maturing” phenomena as knowledge of backward anaphora. While this does not obviate the need for induction in development, in particular to account for category-learning, it does greatly restrict its domain of application. See Goldin-Meadow (1991), Slobin (1991) and Smith Cairns (1991) for commentary and criticism of Crain’s proposals. Crain’s position is of obvious interest for SLA since the failure to observe UG constraints in child linguistic behaviour is attributed to the operation of immature faculties and mechanisms not fully controlled by young children. The mechanisms include the “horizontal faculties” which interact with language, namely attention, and memory. If these are the sources of failure to observe UG constraints in L1 acquisition, such failure should not be observed among adult learners. If any group of learners should manifest something close to instantaneous acquisition, it ought to be adults, everything else being equal. Clearly adults learners do not manifest anything like instantaneous acquisition of an L2 in their production, as every major study has documented (Cancino, Rosansky, and Schumann 1974; Cazden, Cancino, Rosansky, and Schumann
PROBLEMS OF LANGUAGE ACQUISITION
107
1975; Klein and Dittmar 1979; Clahsen et al. 1983; Huebner 1983; Klein and Perdue 1992; Perdue 1993; see Meisel, in preparation). These studies are, however, mostly not directly concerned with the display of UG principles. As noted above, many L2 studies which do investigate UG principles give mixed results, indicating that older children and adults are doing different things from what is required by the target language, and different from what children are doing (Thomas 1989, 1991, 1993; Finer 1991; White 1990/1991). Whether adult L2 learners manifest instantaneous acquisition of UG principles via perceptual and recognition data has yet to be empirically established; the arguments advanced here in favour of knowledge of c-command, structural dependency and other abstract principles are all indirect. Nonetheless, it turns out to be the case that the question of when learners manifest knowledge of UG is troubling mostly for parameter setting theory. I think all of the SLA literature is compatible with the claim that the autonomous representational systems which encode language and which embody UG principles are instantaneously available for adult second language learners. This does not mean that learners do not need time to induce L2 specific categories to achieve native-like knowledge of the L2, but it does mean that encodings must take place in the “right” representational systems. That just leaves the parameters part. Unfortunately, the little research available which speaks to this issue suggests that parameters are not instantaneously available. 3.7 What might it mean now to say that UG is “innate”? Making sense of the P&P literature, and interpreting evidence for or against immediate “access” to UG involves understanding the claim that UG is “innate.” What this actually means, however, is not clear. Smith Cairns (1991: 614) points out that ethologists define innate knowledge in the same way as Chomsky has defined UG, namely, as what is present in the species in the absence of experience. How they identify it, however, is different from the way linguists identify principles and parameters. For example, innate birdsong is identified as that song which emerges when an individual is kept isolated from other members of the species. We know from studies of human isolates (Curtiss 1977, 1982, 1988) that their speech does not manifest the properties attributed to UG, a fact which has not been taken to show that the characterisation of UG is wrong but rather that there is a sensitive period during which UG must be stimulated by relevant experience in order to emerge. (This sensitive period being explainable in terms of selective learning.) A second approach to innate animal cognition has been to identify behaviours which have no known environmental input. This also
108
INPUT AND EVIDENCE
excludes human language. Smith Cairns rejects the label “innate” and instead characterises human linguistic cognition in terms of a developmentally fixed behaviour. She then goes on to reject a blueprint metaphor for the relevant developmentally fixed behaviour we see in first language acquisition in favour of a recipe one. She writes: There is no one-to-one mapping between genes and bits of the body, but the genes constitute a set of instructions for carrying out a process. One cannot locate baking powder in a cake, but its action affects the whole cake. We would be well advised to adopt the recipe metaphor with respect to universal grammar as well. It is not a preformed blueprint of human language, devoid of content. Instead, it is a recipe for the construction of a grammar of a human language as such grammar develops in response to a child’s linguistic environment… On this view universal grammar is expressed as the result of the developmental process, not something that exists prior to it. (Smith Cairns 1991–614–5)
If UG is part of a recipe for the development of a grammar, then the question to ask is: What are the procedures which are followed in “baking the cake” such that the effects of UG become apparent in the resulting psychogrammar? This places the onus for explanation on the interaction between UG and the processes building linguistic representations. Note too that this view of UG is fundamentally incompatible with the hypothesis that adult L2 learners “access” UG when constructing their interlanguage grammars. Goldin-Meadow also takes issue with the standard view of UG being innate. She points out (Goldin-Meadow 1991: 619) that a constraint observed in a child’s knowledge system can be internal to the child at some point in time without being internally regulated on the ontogenetic time scale. She adds that nothing is gained by linking UG to a genetic program; the genetic code is the wrong level of description. Rather, if UG guides the learner’s search of the environment for relevant data it is via a process she calls “canalisation.” Citing work by Gottlieb (1991), she points out (Goldin-Meadow 1991: 620) that canalisation can be environmentally caused. Exposure to a given stimulus can make an organism more sensitive subsequently to that stimulus type and less sensitive to other stimulus types. Canalisation gives a very nice account of the development of early phonetic categories. I mentioned earlier that it would appear as if all possible linguistically relevant phonetic distinctions are either available at birth or very shortly thereafter, including those which do not appear in the neonate’s input (Eimas 1974, 1975; Eimas et al. 1971; Jusczyk 1981, 1985, 1992). Moreover, despite the fact that stimuli are highly variable, infants can form equivalence classes based on the segmental units of speech, and indeed manifest a range of categorisation procedures for the perception of speech sounds (Eimas 1985).
PROBLEMS OF LANGUAGE ACQUISITION
109
This is a classic case of the infant cognising information for which there is no direct evidence in the input. There is still much debate about the speciesspecificity of speech processing capacities. Eimas is clearly leaning to something like a language-specific capacity. Jusczyk (1985) argues the contrary position that the data can be explained on the basis of general auditory mechanisms and processes. Jusczyk (1992) makes a different case, namely, that the non-linguistic auditory system must be interacting with a language-specific faculty in order to account for the rapid specialisation and reorganisation of the input into linguistic categories. In other words, the infant’s representational system is not merely projecting categories onto the input, that is after all what it means to say that “all possible linguistically relevant phonetic distinctions are available at birth”, but projecting in a differentiated way in response to the stimuli.36 What happens subsequently in the second half of the first year? As we saw from the research by Eimas, Jusczyk, and Werker, the infant quickly represents and becomes sensitive to just those features which are in the stimuli and becomes desensitised to those which are not. This is where canalisation becomes relevent for it can explain how and why the enfant rapidly attunes to just those distinctions which are in the environment. Goldin-Meadow also discusses a definition of innateness as meaning that an individual is “developmentally resilient” (Goldin-Meadow 1991: 620) and learns a language despite the limitations of the stimuli they hear. She writes … children routinely go beyond the sentences they hear, although not beyond the linguistic system to which they are exposed; that is, children do not invent the system de novo, they just induce it from data that do not appear to be sufficient to justify (let alone compel) that induction. (Goldin-Meadow 1991: 620).
She then cites evidence from studies of hearing-impaired children whose hearing parents have not exposed them to any conventional sign language. Such children not only develop gestural communication systems, those systems have some of the structural properties of other conventional languages (Goldin-Meadow and Mylander 1984, 1990a, b). To summarise, the view of parameter-setting as biologically based, in particular that it is innate needs refinement. It clearly is not knowledge which is manifest in the absence of experience, as many generativists would have it, encoded in the genes, since environmental deprivation leads to tragic consequences — the absence of basic structural properties of grammar (Curtiss 1977). Primary language acquisition appears to manifest instead canalisation meaning that there is a constant interaction between the child’s initial representational
110
INPUT AND EVIDENCE
capacities and the environment, resulting in a rapid specialisation for certain properties of the speech the child hears. I conclude from this that Crain may indeed be correct in that the representational system is at some point instantaneously available. As Bickerton (1990) puts it: “The brain just does it”, meaning that the brain just makes the representational capacity available and children are driven to encode essential properties of languages in their grammars in ways consistent with UG. But if the brain is “just doing it”, then this leaves little room for a selective learning account of linguistic variation. The implications of this for SLA seem fairly straightforward. We must reject a selective learning interpretation of parameter-setting for L1 acquisition, which would appear to be a good thing since parameters in SLA cannot involve selective learning. However, this leaves us uninformed as to what parameters really are. Canalisation entails that the learner is initially sensitive to all relevant distinctions and involves the increasing insensitivity of the learner to universal properties of language not manifested in the stimuli. One major difference between L1 and L2 acquisition therefore would be that only the former manifests canalisation. If parameter-setting is recast as canalisation then, once again we must conclude SLA does not involve parameter-resetting. There is no evidence that adult L2 learners have available a full range of universal distinctions when they begin learning the L2, and gradually become sensitised to just those in the L2 input. Rather, they impose the features and categories of the L1 phonological system on the stimuli. It seems reasonable to regard L2 phonetic, phonological and morphosyntactic acquisition as a case of recategorisation. As to whether adults are developmentally resilient, the jury is still out. Much of the literature on the so-called ultimate attainment issue (Oyama 1976, 1978; Patkowski 1980; Newport 1990; Johnson and Newport 1989, 1991) takes it for granted that adults do not regularly go beyond the input they get to acquire an L2. Indeed, much of this literature takes it as self-evident that adults routinely fail to acquire a second language despite the input.37
4.
Summary
In this chapter, I have refined the idea, introduced in Chapter 2, that an explanatory theory of language acquisition must consist of both a property theory and a transition theory. I have done this by arguing that language acquisition theories must explain two distinct problems. The first is the representational problem — how humans come to be able to represent or encode their languages in the way that they do. Most approaches to SLA have simply ignored this problem.
PROBLEMS OF LANGUAGE ACQUISITION
111
The generativist approach, including the P&P approach to SLA, has faced it head on, arguing that the human mind/brain comes equipped innately with the appropriate representational systems. What innateness might mean for language remains, as we have seen, a controversial issue. But even if we set the controversies aside and take at face value the claim that UG solves the representational problem, this still leaves unsolved the developmental problem of why one mental grammar changes into another. We have seen that the notion of trigger is crucial to an account of parameter-setting but remains largely at a pre-theoretical level. Most acquisition researchers have been content to ignore the questions: What is a parameter? What are the triggers for parameters? How are parameters set? An answer to the first question, due to Lightfoot, is that parameter setting is a form of selective learning. This hypothesis would provide the P&P theory with more content but unfortunately, it makes the prediction that parameter-resetting is irrelevant to SLA. If parameter-setting is selective learning, then L2 learners cannot be “resetting” the same parameter. What the term “parameter-resetting” might be referring to is then anybody’s guess. We also examined the concept of canalisation. Canalisation entails that learners have available distinctions not present in the input and quickly become desensitised to all others. The work on speech perception in L1 acquisition suggests that it may plausibly be described as canalisation; some phonetic features which are available to the neonate are either lost absolutely during the first year of life or become unavailable in speech processing mode as speech processing becomes finely attuned to the cues relevant to the recognition of L1 words. Additional important research on SLA suggests that adults “hear with a foreign accent”, to use an apt metaphor of Jay Keyser’s (1985), lending support to the claim that canalisation defines low-level aspects of speech perception. SLA studies on the development of an L2 phonology show nonetheless that canalisation will not explain all aspects of L2 development. Speakers of Japanese whose L1 has no complex onsets can acquire languages like English which does. There is no research to suggest that such learners cannot process these structures, whatever difficulties they might continue to have with the perception and production of particular phones like [r] and [l]. Similarly, there is no evidence to suggest that an L1 speaker of a language without a separate category of modal verbs, cannot acquire English modal verbs. So it would seem that the mere fact of differences in categorial typologies across languages does not entail the narrowing of the ability to represent phonetic and morphosyntactic categories. It might be that some categories are extremely difficult to encode (gender among anglophones comes to mind) but we have no convincing evidence to date that adult learners cannot encode gender distinctions.
112
INPUT AND EVIDENCE
I also raised the metatheoretical problem of the status of parameters in a psychologically real theory of grammar. I noted that there are no theoretical grounds for adopting a dynamic view of parameter-setting as a mechanism of language acquisition. Where does this leave us? Still with the problem of defining the genetic origins of parameters. Still with the problem of defining triggers for specific parameters. Still with the problem of relating those triggers to stimuli. Still with the problem of stating when there has been enough exposure to a cue to trigger parameter-resetting. Still with a fractional view of bilingualism which requires some statement of how resetting a parameter in the L2 affects parameter-setting in the L1. Basically what the SLA P&P literature has offered us is a metaphor and not a transitional theory. I would suggest that the metaphor has outlived its usefulness. P&P theory does not explain how or why the initial acquisition of a specific language system begins, how it changes over time, or why acquisition stops. Currently, it explains nothing about the restructuring of the learner’s psychogrammar given particular sorts of stimuli, input or evidence. With an undefined central construct, we are hardly on good footing to develop an explanatorily adequate theory of SLA.
Notes 1. Jackendoff (1983) mentions three other modes of description: biological, functional architecture and phenomenological. I do not mean to suggest that these other modes of description should not be pursued in SLA as well. But I do believe they play second fiddle. Functional architecture descriptions will be partly dependent upon what one’s structural and process models are, and perhaps also on one’s neurological models. As for the other two modes, I will largely ignore them. Eubank and Gregg (1995) have shown the difficulties in trying to get causal explanations from biological stories, especially at the neurochemical level. As for phenomenological stories, I find them very interesting and even essential to understanding certain issues such as what a learner at a given stage can “hear” (as opposed to detect), but take a strict non-interactionist position when it comes to the question of whether properties of mind cause behaviour. I assume here that intentionality, purpose, and consciousness are by-products of brain activity; they do not cause it. Phenomenology must enter into our explanations somewhere (it would be bizarre if I learned Mohawk when I intended to learn Italian) but my intentions will not explain why, e.g., I initially perceive my L2 as a continuous stream of noise. Phenomenological accounts cannot possibly explain why the beginning learner finds it impossible to answer the question “How many words are there in that sentence you just heard?”, while the question is normally a trivial one for the same individual when the sentence heard is one of the L1. In short, we will seek explanation at the level of structure and process, and ultimately at the level of brain structure and function. 2. This situation is actually more confounded in that many authors use the metalanguage of processing to describe structural issues, as we shall see.
PROBLEMS OF LANGUAGE ACQUISITION
113
3. This proposal should not be taken to mean that I am unaware of all the problems inherent in the “computational” approach to mind (also known as functionalism), especially the problem of what a mental state might be. For relevant discussion, see Putnam (1989). 4. The constituent “E” is used here as in Banfield (1982). In the theory of mappings between syntactic structures and conceptual structures adopted here, referential categories map universally onto syntactic phrases. The occurrence of referential nouns in a learner’s interlanguage thus licenses the inference that he is also encoding NPs. See Jackendoff (1983) for more discussion. 5. Unlike Schwartz and Sprouse (1994, 1996), I do not assume that the initial stage of production of the interlanguage is the mature state of the L1 (including lexical and functional categories) transferred to the interlanguage grammar. If we assume that at the time our hypothetical learner passes from me British to me be British, he is not yet producing any markers of tense, then there would be no reason to assume that he is encoding utterances as Tense Phrases. See Eubank (1993/1994) on the Full Transfer/Full Access hypothesis. I note in passing that by the time the learner is capable of making complete sentences in the L2, he has normally acquired a certain amount of the specifics of its grammar. This is not to deny that L1 production schemata are brought to bear in attempting to speak; the evidence for that is overwhelming. 6. Keeping in mind the point made in Chapter 2 of the need to flesh out the minimal assumptions of discussions focusing on the truly universal aspects of language with mechanisms for describing the marked, peripheral, idiosyncratic and learned aspects of language. 7. I regard this as an unfortunate occurrence. SLA researchers have been reluctant to investigate the formal properties of other theories of generative grammar, or even the formal motivation for particular analyses within P&P. It often appears that particular analyses are adopted only because “it’s the latest thing”, while other promising analyses are abandoned because they are not. This gives a peculiarly ad hoc quality to P&P work in SLA. Psycholinguists can hardly afford to take such a naive attitude to theoretical linguistics. SLA would benefit from a thorough airing of the empirical and formal motivation for particular analyses, especially as they relate to language acquisition and language processing. Given the widespread assumption that external data can be used to test linguistic theories, there is certainly room in the field for research papers of a purely formal (as opposed to an empirical) type. 8. For example, eliminating formal redundancy is a major driving force of formal generative research, something which would strike most psychologists as an odd thing to impose on a theory of mind since there is ample evidence that psychological systems, including language, exhibit considerable redundancy, and that redundancy aids learning (Ritter 1995). 9. Kevin Gregg (p.c.) insists that for generativists these two terms are synonomous. I am not so sure that the reduction is legitimate. Certainly, under the usual idealisations that Chomsky has made, especially in the last 15 years, Gregg is correct. If “language” is nothing other than grammar, and “grammar” consists only of those aspects of structure which are invariate, then UG and the LAD can be treated as the same thing with no loss of explanatory value. However, formal discussions of learnability (Pinker 1979; Berwick 1985; Dresher and Kaye 1990; Clark 1992; Gibson and Wexler 1994; Bertolo 1995; Fodor 1998a, b, c) incorporate a number of constructs, e.g. the Subset Principle, the Triggering Learning Algorithm, the Single Value Constraint, the Greediness Constraint, etc., which are all part of the learning theory, and not part of UG. Moreover, once we move from the idealisations of learnability theory and start talking about the acquisition of psychogrammars, and, in particular, once we grant that psychogrammars contain all sorts of learned phenomena, then it seems to me that Chomsky’s early distinction between UG and the LAD is legitimate, and indeed required. I take the view that Chomsky intends UG to be part of the LAD, that part responsible for what is universal. The LAD itself will account for much else.
114
INPUT AND EVIDENCE
10. In Minimalist syntax, this view of the grammar is dramatically altered through the elimination of D-structure (Chomsky 1995: 186–91). 11. It is important to keep in mind that these features and units belong to the linguistic cognitive system. The same features and units are put to work in different ways in distinct phonetic and phonological systems, precluding any determinism of a strictly anatomical or perceptual sort. This fact emerges from a careful reading of the phonetics and perception literature but is illustrated in quite dramatic fashion by the literature on the phonology of sign languages. 12. What a universal theory of categories should look like is very much an issue on which theorists divide. There is controversy as to the typology of major categories, there is even more controversy as to the typology of functional categories. Recent work has seen an explosion of functional categories as some researchers seem determined to project onto syntactic categories every possible morphosyntactic distinction observable. The basis for devising a typology is anything but clear. See Hoekstra (1995) for relevant discussion. 13. One exception to this claim is provided by Underspecification Theory where the traditional unordered phonetic matrices of structuralist and early generative phonology (see Chomsky and Halle 1968) are replaced by ordered matrices which take the form of tree structures consisting of sub-segmental constituents. In such a case, postulating a featural distinction at, e.g. a Dorsal node cannot be done in the absence of a higher Place node. This gives the elaboration of the segmental contrasts a certain deductive nature. See Clements (1985), Sagey (1986), Hayes (1986) for early proposals and Clements and Hume (1995) for a state-of-the-art review. 14. As is well known the class of prepositions in English is mixed. Many (above, across, away, behind, below, beneath, beyond, over, under, etc.) have one stressed syllable even when not in focus positions. The status of of, to, for, up, on, and the other one-syllable prepositions is complex. In some contexts, they are clearly functional categories, in others they appear to have referential content. 15. Note that this adaptation is hardly trivial since we have no direct access to the mental representations that hearers construct of the speech stimuli they hear and therefore have no direct access to the frequency of cues in the environment. We will only be able to make guesstimates as to the frequency of cues-as-mental-constructs based on counts of the presence of stimuli in texts. 16. One nice consequence of this proposal is that it helps to give some more precision to the debate about the temporal nature of learning (contrast Klein 1991a, b, with Hyams 1991). While the correspondence mappings might be assumed to be instantaneous or at least very fast, learning which cues the language makes available will take time since this acquisition involves induction over the presentation of many instances of stimuli. 17. These remarks reveal nothing about the nature of sound perception or sound production. As Flege, Frieda, Walley and Randazza (1998) make clear, perceptual processing could involve recognising segments in any one of several domains (syllable, word, etc.) rather than learning to recognize some abstract construct like the phoneme. The same goes for speech production. Targeted segments might be phonetically and motorically organised around production domains larger than the phone. 18. A radically different picture is presented in Chomsky (1995). 19. A monadic or single-argument predicate like SLEEP will be expressed by an intransitive verb sleep. A dyadic or two-argument predicate like KICK will be expressed by a transitive verb like kick. 20. I do not adopt Flynn’s description of the head direction parameter, which proved to be an
PROBLEMS OF LANGUAGE ACQUISITION
115
improper interpretation of P&P theory. Flynn (1987) was basically a reworking of her doctoral dissertation which was attempting to explore experimentally ideas not fully worked out at the time in the theory of grammar (Flynn 1983). Unfortunately, by 1987 the idea that there is a single parameter at work in determining the order of adjuncts, specifiers and complements had been superceded by other proposals. To be precise, Koopman (1984) and Travis (1984) derived asymmetrical aspects of constituent order (needed for German verb order in main clauses where it occurs in “second” position and embedded clauses where it occurs finally) from two parameters, one regulating the position of the head to its complement, and a separate parameter involving the direction of case-assignment. Hoekstra (1984) also treated the same type of phenomena in terms of two parameters, one involving direction of government. Travis (1989), noting that there were still too many ordering possibilities permitted by previous analyses, proposed three separate parameters to explain constituent order: a head direction parameter, a directionality parameter for semantic role assignment, a directionality parameter for Case assignment, with the restriction that these be ordered with respect to one another. Once a subdomain direction is specified, say direction of Case, no other setting is possible. This analysis makes the parameters dependent on one another. Kayne (1994) decided to dispense with the whole thing and argued that all languages order heads, complements and specifiers in the order specifier, head, complement. In his analysis, variation in surface orders across languages derives solely from differences as to which elements move to the left, to higher positions. Under Kayne’s view there simply is no parameter-setting. Before leaving the problem of the theoretical foundations of Flynn’s SLA work, I should point out that there are also methodological problems and problems of data interpretation (see Bley-Vroman and Chaudron 1990). 21. Even the apparent exceptions support this statement. Müller (1993, 1998) reports on a bilingual child Ivar who misanalyses the complementiser für as a preposition and subsequently takes an idiosyncratic and slow route to the acquisition of sentential complements. This has dramatic consequences for his acquisition of the proper position of finite verbs in embedded sentences. 22. Müller (1994) provides arguments that parameters can not be reset and draws on evidence from the acquisition of word order and complements. 23. Not the least of which are his statements that current work follows directly on, and preserves the essential ideas of his earliest publications, and this despite considerable differences in the mechanisms and formal expression of the theory. 24. White (1989b) can be read this way; White (1998) is an explicit rejection of representational realism. So also is Schwartz (1998). 25. The papers in Second Language Research 12(1), 1996, edited by Eubank and Schwartz, which all purportedly deal with the L2 initial state and at the same time refer constantly to access to UG illustrate this point. None of these papers is compatible with a view of UG where it is the knowledge system of the pre-linguistic individual. 26. See Chomsky (1969) for an early exposition of the same idea. 27. According to the maturational hypothesis, principles of the language faculty may be absent not only from the child’s behaviour but also from the earliest stages of the child’s grammar. The relevant representational capacity will not manifest itself until the language faculty matures and the representational capacity can be expressed. Unfortunately none of the authors have made any effort yet to link their maturing principles to some independently documented neurological or biological transitions. This deprives the “UG Matures” hypothesis of much of its interest. 28. This is one topic where research into SLA could shed light on the nature of primary language acquisition, a relationship we hardly see exemplified in contemporary research on language acquisition.
116
INPUT AND EVIDENCE
29. Flynn (1987: 188) makes the same observation but fails to draw the obvious conclusions. 30. A similar problem arises in certain models of connectionist learning since learning is realised as specific connections among nodes (MacWhinney 1995). 31. There is a small but important literature in learnability research, e.g. Clark (1989, 1992), Dresher and Kaye (1990), Wexler (1990), Gibson and Wexler (1994), Fodor (1998c) and Dresher (1999) which addresses the question of how triggers connect to parameters. It isn’t obvious, however, how to connect the results of these studies to the data of first language acquisition and even less clear how to connect them to SLA. 32. Trahey and White (1993: 185–6), in one of the rare discussions of the triggers relevant for resetting a parameter, state that the virtual absence of inflection on English verbs and various word order phenomena are sufficient to identify Agr as weak. From this discussion, we may conclude the triggers are to be understood as properties of intermediate analyses of stimuli. In this particular case it would appear that learners would have to be able to encode the absence of inflectional morphemes where they might otherwise be hypothesised to occur. This appears to be an appeal to indirect negative evidence (Chomsky 1981a: 9). 33. The basic idea is probably sound; branching direction probably does have consequences for the interpretation of anaphora, but since the phenomena is so complex, and the interpretive variables still not well understood, it stands to reason that the investigation of other simpler phenomena would have served Flynn’s purposes better. 34. C-command is a relation between two nodes in a structure such that node A c-commands node B when the first branching node dominating A also dominates B (Reinhart 1976, 1983: 18). See (i) (i)
W A
X
Y B The first-branching node dominating A in (i) is W. W dominates X, Y, and B. Hence A c-commands B. It also c-commands X and Y. The first branching node dominating B is X. X does not dominate A so B does not c-command A. The node X, however, does. C-command is a crucial construct in the P&P theory of grammar. See Reinhart (1983) for discussion and exemplification.
35. It seems to me that all of Flynn’s subjects would have had to have “reset” their head direction parameter just to be able to parse the stimuli. Differences in the comprehension of sentences containing anaphora among the various groups could not, then, be attributed to the hypothesis that some (the more advanced learners) have reset the parameter while others have not. Flynn also speaks of learners “working out” the deductive consequences of parameter-resetting which suggests that she has not quite grasped the function of parameter-setting within her own theory. By definition, parameter-setting does not involve inferencing or even large amounts of computation. Inferencing takes place outside of the language faculty. To suggest that learners can do problem-solving related to parameter-setting seems to suggest that learners somehow can explicitly analyse the contents of UG and/or the contents of their interlanguage grammars. There is no evidence for such a claim and it runs counter to the very logic of UG-based accounts of acquisition. To assume that inferencing plays a central role in parameter-resetting is to empty the theory of explanatory interest as an alternative to induction.
PROBLEMS OF LANGUAGE ACQUISITION
117
36. Note that it is not claimed that the perceptual system which makes the universal phonetic features available is part of UG. On the contrary, it is assumed to be part of a universal but non-language specific system. However, the cognitive system for language makes particular use of the phonetic features, organising them into linguistic categories to drive the development of the L1 phonology. 37. This assumption is, however, unwarranted because there has been no investigation of the potential for native-like ultimate attainment in the face of controlled native-like input. See Chapter 4 for further discussion. Other views of the ultimate attainment issue can be found in Long (1990), Birdsong (1989, 1991, 1992), Sorace (1993), Ioup, Boustagui, El Tigi, and Moselle (1994), and White and Genesee (1996).
C 4 The autonomous induction model
1.
Introduction
In Chapter 3, I began to define the terms acquisition and learning, focusing there on maturation, selective learning, and canalisation. In other words, I concentrated on how knowledge structures might arise on a strictly biological basis. In particular, I made the obvious claim that maturation of grammatical principles or parameters, parameter-setting as selective learning, or the tuning of the learning mechanism to specific forms of stimuli via canalisation as a result of the loss of, e.g. neuronal connections, should be irrelevent for adult SLA since the neurological system in which learning occurs is by adulthood stable and no longer “growing.” Were it to turn out to be the case that parameter-setting had to be understood in these terms, we could dismiss it outright as a mechanism of adult L2 learning. In this chapter, I shall focus on acquisition and learning in the conventional sense, namely as instance-based learning, observational generalisation, or instance-based retrenchment from a too general generalisation, all forms of induction. I will introduce a formal theory of induction based on proposals by Holland, Holyoak, Nisbett, and Thagard (1986) in their book Induction: Processes of inference, learning and discovery.1 In the interests of readability I am referring to this theory as the Induction Theory and limiting page references to those places where I am quoting or paraphrasing closely. At the same time, I will adapt this theory to my own purposes, making it conform to the hypothesis that linguistic cognition is realised in autonomous representational systems. My own proposals, as should be clear by now, will be referred to as the Autonomous Induction Theory.2 One objective of the chapter is to show how induction can be constrained in various ways. It is not too provocative to characterise the metatheoretical debate between proponents of UG and inductivist approaches as saying: “Either it’s UG at work, or there are no constraints.” This is basically the argument made in Archibald (1991, 1993) who concludes that parameter-(re)setting in the
120
INPUT AND EVIDENCE
learning of L2 stress systems must be at work because “crazy rules” do not show up in the interlanguages of his subjects. Flynn (1988a) makes similar claims, as do Finer (1991) and Thomas (1991) who speak of the absence in SLA of “rogue grammars.” See Epstein et al. (1996) for examples of the type of bizarre generalisations that induction is supposed to lead to. Rogue grammars are grammars which fail to manifest essential properties of natural language grammars. They are, in addition, grammars which manifest properties which never occur in natural language grammars. The basic assumption of critiques of induction appears to be that rogue grammars are a necessary by-product of induction. Arguments on behalf of the P&P theory are thus grounded in a rejection of induction: Induction ought to lead to rogue grammars, there are no rogue grammars, so UG must be “accessed” during the SLA process. The critiques do not carry through, however, in that being unconstrained is not an essential property defining induction. We shall see below that such claims arise from a failure to take into consideration advances in cognitive psychology which deal specifically with induction. Secondly, we shall have to distinguish carefully between induction, as a kind of learning, occurring in a wide variety of cognitive domains, and inductive reasoning, which takes place in the conceptual system and is habitually associated with problem-solving, hypothesis-formation and hypothesis-testing. Important constraints are imposed on induction by locating it within autonomous domains and by characterising inductive learning in terms of the operations and primitives of specific representational formats. In Section 2, I will present the basic assumptions related to Jackendoff’s Hypothesis of Representational Autonomy. This will provide the necessary background to the presentation of the theory of autonomous induction in Section 3. Further discussion of the constraints problem will be deferred to Chapter 5.
2.
The language faculty in outline
In Chapter 1 (Section 2.2), I briefly introduced the notion of modularity and argued for an interactive model of speech processing in which information can flow both bottom-up from the signal to the conceptual system and top-down from the conceptual system to the morphosyntax and phonology. In this section, I wish to argue, however, that there are severe limitations on how information encoded in conceptual structures can interact with grammatical representations relevant for parsing or speech production. I will present the details of a modified modular approach, termed Representational Modularity, by Jackendoff (1987: Ch. 12, 1992: Ch. 1, 1995: 5). In contrast to the views of Fodor and Schwartz, to be
THE AUTONOMOUS INDUCTION MODEL
121
examined in Chapter 7, in which the language faculty constitutes a module, the theory of Representational Modularity proposes instead that a particular type of representation can be modular. The language faculty has several autonomous representational systems and information can flow in only a limited way from the conceptual system into the grammar via correspondence rules which connect the autonomous representational systems. 2.1 Representational Modularity: The hypothesis of levels In Jackendoff’s theory of mind, there are various cognitive faculties, each associated with a chain of levels of representation from those which interface most directly with stimuli, called the “lowest” — using the spatial metaphor I adverted to implicitly in Chapter 1, to the “highest”, defined as those which interface with conceptual structures. The chains intersect at various points allowing for the possibility of information encoded in one chain to influence the information encoded in another. These proposals are reproduced in (4.1) as the hypothesis of levels. (4.1) Hypothesis of levels a. Each faculty of mind has its own characteristic chain of levels of representation from lowest to highest. b. These chains intersect at various points. c. The levels of structure at the intersections of chains are responsible for the interactions among faculties. d. The central levels at which “thought” takes place, largely independent of sense modality, are at the intersection of many distinct chains. (Jackendoff 1987: 277)
The last point is critical for the theory of learning presented here. Rather than viewing conceptual structures as the endpoint of processing, the Jackendovian model places it in the centre of much of the action. It is the place where information of various sorts converges, possibly to be re-organised before connecting back to the various faculties where the information can be acted on. The faculties are described as having chains of levels of representation. In Chapter 1, I made use of such a chain in discussing speech processing in the language faculty, noting that speech perception and comprehension appear to require a chain of processing consisting of at least low-level auditory analysis (both peripheral and central), acoustic-phonetic analysis, phonological analysis, lexical activation, lexical selection and lexical integration, morphosyntactic analysis, semantic analysis/pragmatic inferencing and integration. This picture of
122
INPUT AND EVIDENCE
things gives us about six levels of representation. There could well be more. Each level of representation has a unique set of formation rules (partly learned and partly the result of UG providing certain primitives) which derives the categories of that level and their principles of combination. These levels of representation are incommensurate. They are therefore all linked by correspondence rules which translate information from one autonomous representational system to another. I show Jackendoff’s picture of the faculties in Figure 4.1.
(spatial transformations)
retinal input
→
primal sketch
→
2 1/2D sketch
↔
3D model (inferences)
phonological structure segmental phonology metrical grid prosodic tree intonation
?
conceptual structure
syntactic structure (body representation)
auditory input
? rhythmic structure
musical surface
grouping metrical grid
? ? time-span reduction
prolongational reduction
motor output haptic, vestibular, etc. input
Figure 4.1. The organisation of all levels of mental representation (Jackendoff 1987: 248)
THE AUTONOMOUS INDUCTION MODEL
123
In this model, the language faculty includes the auditory input, motor output to the vocal tract, all of the components of the phonetic, phonological, syntactic and conceptual structure, and their correspondence rules (Jackendoff 1987: 247). Only some aspects of the language faculty, however, are modular, namely the autonomous representational systems and the associated processors. For empirical evidence and argumentation that precisely this kind of architecture of the mind is required, see Jackendoff (1983, 1987, 1997) which summarise a broad range of findings from cognitive psychology. With respect to the autonomous representational systems of the language faculty, Jackendoff (1987: 258) has proposed that there must be a translation processor for each set of correspondence rules linking one autonomous representational type to another. Indeed, he suggests that there is a separate processor for each direction of correspondence during, e.g. speech processing. That means, for example, that there will be two translation processors for the processing of input linking the phonological and the morphosyntactic levels of representations, one for the translation “upwards” of phonological units into morphosyntactic ones, and a separate processor for the translation “downwards” of morphosyntactic units into phonological ones. Each level of representation is regulated as well by an integrative processor which integrates into a structure all of the information that it gets from the translation processor. Note that it is the integrative processors which processing theory habitually refers to, building sentences, prosodic constituents or conceptual structures. Each of these processors is a module. All function automatically, unconsciously, and outside of our intentional control. (4.2) The Representational Modularity Hypothesis Each translation processor and each integrative processor is a module.
The claim in (4.2) is a claim about “information encapsulation” (Fodor 1983: 37). It is not a claim that each processor can be localised in the brain. Note that these modules are all domain specific. They deliver information only in highly specialised formats, and they have access only to information in their specific input format. No processors match up more than two levels (Jackendoff 1987: 263). This has obvious consequences for my proposed model of feedback and correction because it entails that information in the conceptual system directly relevant to linguistic form can only be linked to phonological representations via s-structures, or via the lexicon. The proposals are spelt out in (4.3–5). (4.3) Bottom-up processors a. Transduction of sound wave into acoustic information (via peripheral and central auditory analysis). b. Mapping of available acoustic information into phonological format.
124
INPUT AND EVIDENCE
c. d.
Mapping of available phonological structure into syntactic format. Mapping of available syntactic structure into conceptual format.
(4.4) Top-down processors a. Mapping of available syntactic structure into phonological format. b. Mapping of available conceptual structure into syntactic format. (4.5) Integrative processors a. Integration of newly available phonological information into unified phonological structure. b. Integration of newly available syntactic information into unified syntactic structure. c. Integration of newly available conceptual information into unified conceptual structure.
The bottom-up and top-down processors convert one type of representation into another. The integrative processors build representations. The processors described in (4.3) and (4.4) are thus the correspondence processors. The integrative processors are those described in (4.5). For Jackendoff, it is also true that belief fixation and thought is modular in that it appears to be domain-specific and involves the creation and integration of specific types of information in longterm memory. Thus, even though there is a processor for conceptual structures, it is not functioning like a central processor. I will provide some support for this viewpoint below, and more in Chapter 7. In the case of the grammar, the overall organisation is as shown in Figure 4.2. Conceptual formation rules
Phonological structures (PS)
Syntactic structures (SS)
Conceptual structures (CS)
PS–SS correspondence rules
SS–CS correspondence rules
Syntactic formation rules
Phonological formation rules
Lexicon Figure 4.2. The organisation of the grammar (Jackendoff 1997: 100)
This model shows the exact relationship between the conceptual structures level of processing, where meanings are computed, and the rest of the grammar. We see that the points of contact are quite limited. Conceptual structures can communicate via correspondence rules with s-structures and the lexicon, which itself is a set of correspondences. S-structures can in turn communicate with phonological structures. The lower levels of phonetic processing appear, in contrast, to be
THE AUTONOMOUS INDUCTION MODEL
125
input systems in Fodor’s (1983) sense, meaning that they are isolated from the conceptual system and cannot be put into correspondence with concepts. The model of processing is, as I have emphasised, partially interactive, meaning that information can flow in both directions. This is illustrated in Figures 4.3 and 4.4. In speech understanding, each part of each level of representation from phonology up is derived by virtue of correspondences with neighbouring levels (Interactive parallel). The numbers “1” and “2” refer to two distinct units being processed. 2 conceptual
1 2
syntactic
1 2
phonological
acoustic
1
1
2
Figure 4.3. Processing Hypothesis for perception (Jackendoff 1987: 101)
The interactive nature of processing forces on the theory the differentiation of correspondence processors and the integrative processors, which automatically attempt to incorporate in working memory all incoming information from the various other translation processors, one from each level, into a single “most coherent” whole (Jackendoff 1987: 258). Jackendoff hypothesises that the representations from all levels except the most peripheral must also be available to be stored in longterm memory since information from all levels gets encoded
126
INPUT AND EVIDENCE
In speech production, each part of each level of representation from syntax down is derived by virtue of correspondences with neighbouring levels. Once again, the numbers “1” and “2” refer to two distinct constituents. conceptual
1
2 syntactic
1 2
phonological
1
2
motor
1
2
Figure 4.4. Processing Hypothesis for production (Jackendoff 1987: 109)
in lexical entries. The model also incorporates a selection function that restricts the number of structures under analysis and that at each moment during processing marks one candidate as the best (Jackendoff 1987: 259). The selection function, which is domain-specific, replaces a domain-general executive function. Now that we have a clear picture of the functional architecture of the mind, including autonomous representational systems, we can consider the implications for induction. Since the conceptual system interfaces with a variety of faculties as Figure 4.1 shows, inductive learning guided by the contents of conceptual structures can occur across faculties. Since the conceptual system interfaces with language, this means that inductive learning guided by meaning can occur in the language faculty too. However, since conceptual structures can only communicate with the levels of the grammar via the correspondence rules, induction will only occur at the points of contact, which are s-structures, prosodic structures, and the lexicon.
THE AUTONOMOUS INDUCTION MODEL
127
2.2 The intermediate level theory of awareness The picture outlined above is, however, relevant only for implicit or unconscious learning. Jackendoff’s theory of mind forces an even more restrictive position on us with respect to the explicit learning habitually associated with consciousness or awareness. So far, I have been discussing the problem of induction and feedback and correction largely in abstraction from considerations of awareness. I will hypothesise that inductive learning can involve both implicit and explicit learning since this is what the empirical evidence suggests. What of feedback and correction? To my knowledge, the relationship between feedback, correction, metalinguistic instruction and awareness has not been sorted out. See Hulstijn (1997) for relevant discussion. I am prepared tentatively to assume on the basis of the data presented in Chapters 8 and 9 that the deployment of negative evidence may require awareness. This has important implications. Jackendoff (1987) argues that awareness is modality specific and leads to the projection of single interpretations onto our phenomenological mind. The former property is explained by the modularity of the mind. The latter property, he hypothesises (Jackendoff 1987: 285) is accounted for by assuming that the selection function operates in working memory in such a way as to project only a single set of representations. That what we are aware of seems to us to be especially vivid, he explains in terms of the increased and more detailed processing of stimuli which results from attention. On the basis of a discussion of linguistic images, i.e. the inner voice which talks to us without our actually using the motor-articulatory system, he suggests that the only level of representation which presents itself to awareness is the phonological level (Jackendoff 1987: 288). I have focused first on linguistic imagery rather than speech perception or production because the linguistic structure in this case seems so purely gratuitous. Why should thought be accompanied by syntactic and phonological structure? The answer comes from the nature of short-term memory: as soon as a conceptual structure is deposited into short-term memory, the fast processes of the language modules automatically translate it, insofar as possible, into linguistic form. It may seem even more strange, perhaps, that awareness comes clothed in phonological form. Yet this seems to be the case — we hear this inner voice speaking to us in words. (Jackendoff 1987: 288–9. Emphasis in the original.)
The particular phonological form which is projected into awareness is precisely that representation selected by the selection function as the best one.3 Now consider what this means for a theory of feedback and correction. When a native speaker (NS) becomes aware that a learner has made a mistake, that mistake will
128
INPUT AND EVIDENCE
necessarily be encoded in phonological structures. The NS will be aware that the learner has done something wrong. Errors will be characterisable (by us the observer) as the selection of the wrong phonological form to express a given lexical meaning, or erroneous sequencing of prosodic words, or the mispronunciation of a prosodic word.4 In other words, the what which a NS can notice and be aware of will be whatever is encodable in a prosodic representation. It follows that the NS will not notice or be aware of anything consisting of the morphosyntactic contents of the learner’s representation of the erroneous sentence (the degree of embeddedness, the categorial features, the improper binding of an antecedent-trace). Introspection certainly suggests that this view is correct (although, once again, the question must be answered by serious empirical research). It also follows that when the NS corrects, and the learner attends to the correction, she will only be aware of the same sorts of information, namely information represented in phonological representations. One crucial conclusion follows: to the extent that feedback, correction, and metalinguistic instruction depend on awareness, they cannot have any effect on the acquisition of grammatical information available exclusively to the morphosyntactic representations, or to information in conceptual structures. I will call this the awareness constraint on negative evidence. (4.6) The awareness constraint on negative evidence When we process learners’ errors in speech, or when learners’ process feedback and correction from native speakers, the only levels of representation that either we or they can be aware of are phonological representations. Consequently, the only information that either we or they can process is the information encoded in phonological representations.
We can make a number of predictions regarding the effectiveness of these kinds of input on SLA. These are shown in (4.7). (4.7) Predictions regarding the effectiveness of feedback, correction and metalinguistic information on conscious learning a. Phonetic representations should be completely impermeable to feedback and correction. b. The phonological information stored in lexical entries, i.e. the linear order of segments and syllables, but not the content of segments, should be permeable to feedback and correction. c. Feedback, correction, and metalinguistic information should be able to influence how we hear integrated phonological representations.
THE AUTONOMOUS INDUCTION MODEL
d.
e.
f.
129
They should not be able to influence the linear and hierarchical patterns of s-structures except insofar as these are directly encoded in phonological representations. They should be able to influence the morphosyntactic information stored in lexical entries, e.g. s-selection and c-selection properties of individual words to the extent that these are associated with phonological forms. They should not be able to influence semantic properties of linguistic units except insofar as these are instantiated in phonological forms.
We have seen that the lower levels of parsing seem to be unavailable to reflection or awareness; this leads directly to the prediction in (4.7a), namely that no amount of being told “It sounds like […]” should be able to influence our perception and learning of the acoustic features of acoustic-phonetic stimuli (more specifically, the details of intonation, stress, or segments) independently of what our perceptual systems will process bottom-up anyway. Again, the prediction must be tested in a controlled fashion, but it does appear to be consistent with the experiences of both classroom learners of foreign languages and their language teachers. Predictions (4.7b) and (4.7c) are that learners will be able to notice or learn from feedback or correction the order of prosodic constituents, that is to say, e.g. prosodic words, or larger prosodic units. It is important to keep in mind in this regard, that while prosodic words often correspond to morphosyntactic words, there are many cases (classic problems of juncture come to mind of the “I scream, you scream, we all scream for ice cream” sort), where the two levels of representation are quite distinct. We predict therefore, that feedback and correction will not be usable for learning the order of syntactic units in precisely such cases. Thus learners will not be able to learn from feedback and correction such things as the leftward or rightward direction of syntactic attachment, although the leftward or rightward direction of cliticisation might be learnable through phenomena like ellipsis, deletion and so on which involve the omission of prosodic words (and their cliticised “guests”), cf. *Bill’s breeding dahlias but Jane’s alliums vs. Bill’s breeding dahlias but Jane alliums. Prediction (4.7d) states explicitly that the contents of morphosyntactic representations will not be learnable from feedback or correction. This will be true especially of morphosyntactic distinctions which are encoded only through the integration of lexical information in s-structures. However, since argument structures, and subcategorisation properties can all be associated with the phonological forms of the arguments and their subcategorised expressions, which can in turn be encoded in lexical entries, it is possible that these could be learned via feedback and correction. It follows that properties like membership in a given
130
INPUT AND EVIDENCE
word class, semantic roles or semantic role and case assignment functions might be learnable on the basis of feedback and correction to the extent that these can be directly mapped onto the phonological representations of a word. Thus, for example, learners might be able to learn from feedback and correction that accusative case in a given Latin declension ends in -um or that sink but not think belongs to the same conjugation class as drink (at least, in standard varieties of English).5 Where case or word class cannot be instantiated in specific phonological distinctions, then it should not be learnable via feedback and correction. I will return to the accuracy of these predictions in the Chapter 8, although it needs to be stressed that in the absence of an explicit theory of feedback and correction, little research has been done which directly tests these predictions. Consequently, it will be impossible to explore (4.7) in any significant detail in this book. Let us now turn to the details of the theory of induction.
3.
Induction and i-learning
In this section, I will introduce the Induction Theory of Holland et al., which has served as the basis for the development of the Autonomous Induction Theory. In particular, I will be attempting to place an interpretation on the Induction Theory in which certain of its components are operating autonomously within the theory of modularity just described. This is a serious departure from the Holland et al. theory, and it should be acknowledged up front. In my view it is a necessary departure in order to truly constrain the Induction Theory, which in its present form will not explain linguistic cognition, be it L1-based or L2-based. To be able to distinguish between induction as defined by the Induction Theory and induction as defined by the Autonomous Induction Theory, I shall refer to the latter construct as i-learning. 3.1 Basic properties of induction Holland et al. (1986: 1) define induction as … all inferential processes that expand knowledge in the face of uncertainty
which distinguishes induction from canalisation since the latter does not involve inferencing and eliminates representational capacity rather than expanding it. It clearly distinguishes induction from parameter-setting too in that the latter does not involve the creation of knowledge in the face of uncertainty. The reader should keep in mind that parameters are associated with specific triggers and that
THE AUTONOMOUS INDUCTION MODEL
131
the availability of a trigger in the input ought to automatically result in the setting of the parameter in a precise way. In the Induction Theory the inferential procedures have three essential properties. They operate (4.8) a. b. c.
within a larger framework of problem-solving cognitive activities; through the activities of one or possibly more than one nonspecialised, cognitive subsystems, based on feedback regarding the success or failure of predictions about the environment generated by the learner (Holland et al. 1989: 9).
I shall argue that this notion of induction is too limited in that it locates all induction, and therefore all categorisation, in the conceptual representational system. It seems to confound inductive learning with inductive reasoning. Since I want to be able to say that instance-based learning, categorisation, overgeneralisation and the retreat from overgeneralizsations all occur in other, autonomous systems, these kinds of learning cannot be tied to representations in the conceptual system. Rather, it would be best to see induction as all cognitive processes which expand knowledge in the face of uncertainty
regardless of the representational system in which the expansion of knowledge occurs. It is nevertheless worth noting here that this revised definition is consistent with the view espoused earlier that acquisition involves the coming-toencode-information-in-a-representational-system. Cognitive processes which involve the novel encoding of information in a representation are precisely inductive processes. 3.2 Induction as a search through a search space? Holland et al. (1989: 10), borrowing a metaphor from now classic work on problem-solving by Newell and Simon (1972), describe induction as problemsolving. Problem-solving in turn is defined as a search through a mentally represented space. The “problem” for the learner is to fix the current representation of a stimulus so that it is consistent both with analysed stimuli outputted by the perceptual systems and input to processors derived from stored information which is believed with considerable conviction to be true. In formal terms, the problem is equated with four things: the initial state of the organism, one or more definable “goal states”, which are equated with the end state of the organism, a set of operators which are defined in such a way that they can
132
INPUT AND EVIDENCE
convert one state into another, and finally a set of constraints that determine the properties of an acceptable solution. Problem-solving is modelled in this way as the transition from the initial state to an end state. See Figure 4.5. Goal 1 Goal 2 Problem state
Goal 3 Goal 4 Goal 5
Figure 4.5. Problem-solving as the transition from an initial “problem state” to a “goal state”
As Figure 4.5 makes clear, there might be many possible solutions to a given problem. Clearly, a theory of learning must make clear how a learner “selects” among them. We shall see below that the classic problem-solving approach to induction breaks down because it cannot provide a unified solution to this selection problem for all cognitive domains. Figure 4.5 can be characterised as a static approach to induction, consistent with the aims of property theories. We can re-cast the model in Figure 4.5 in terms of a temporal metaphor to express the temporal dimension of learning. See Figure 4.6.
Initial or Current State
Time i
Intermediate State 1
End State 1
Intermediate State 2
End State 2
Intermediate State 3
End State 3
Intermediate State 4
End State 4
Intermediate State 5
End State 5
Time i + 1
Time n
Figure 4.6. Problem-solving as the transition from an “initial state” to an “intermediate state” and to an “end state”
THE AUTONOMOUS INDUCTION MODEL
133
This is an extremely simple view of the possible relations between the initial, intermediate and end states, but it will do for expository purposes. The Newell and Simon model postulates that the current state is systematically compared to the end state in the presence of feedback. On the basis of an interpretation of the feedback, differences between the current state and the end state are identified. Feedback thus plays an essential role in enriching the information the learner gets from the perceptual systems and in moving the learner from one mental state to the next. Problem-solving would not be possible without it for it initiates transitions among the internal states of the device. Given information from feedback about how well a given intermediate state “fits” as a solution, one or several operators are selected which will reduce the differences between it and an end state. The application of the operators converts the initial or current state directly into some end state or into an intermediate state. If the application produces a state identical with the end state intended (if they match), the induction has been successful, and computation ends. If the application produces an intermediate state, by definition one which does not match the intended end state, then further sub-goals are defined which will help to reduce the differences, and the operators are then re-applied until the goal is attained or some failure criterion is reached. It should be obvious that the Newell and Simon model is general enough to lend itself to an analysis of everything from how to move a 1.4 metre wide sofa through a 1.3 metre wide space to figuring out how to drive from Berlin Mitte to the autobahn without passing by a roadworks or a major construction site. Will it work for language acquisition? Definitely, not, for precisely the reasons raised in discussions of “rogue grammars”; it simply is too unconstrained. It should be pointed out, however, that the model is recognised as inadequate as a model of conventional problem-solving. Holland et al. (1989: 10–2) criticise it because it depends upon general methods such as means-end analysis. These methods are implausible as models of human cognition, as investigations of knowledge and problem-solving in particular domains have revealed. Holland et al. point out that research on expert and novice differences in solving problems has shown that expertise is crucially dependent upon the development of domainspecific representations and processes. Similar conclusions follow from research on reasoning tasks which shows that a person’s familiarity with a domain affects his ability to make valid inferences (Wason 1966, 1968, 1977; Johnson-Laird and Wason 1977a; Nisbett 1993a, among many others). Finally, research on conceptual development in children reveals that change and conceptual reorganisation does not apply “across-the-board” but rather is domain-based and domainspecific (Keil 1979, 1989). A child’s ability to handle conceptual problems, such
134
INPUT AND EVIDENCE
as classifying living entities as, e.g. cats or skunks, based on external appearance (shape of the head and tail, colour of the fur, etc.) or non-visible biological properties (such as the ability to have kittens or skunk babies), is determined by the child’s “theory” about those entities (Carey 1985; Murphy and Medin 1985; Keil 1989; Markman 1989). These theories develop at different rates and moments in time and exhibit domain-specific ontologies. Holland et al. (1989: 11) conclude (and I concur): At a global level, then, problem-solving appears to be a process based on domain-specific knowledge, if that is available, or reliance on general methods, if it is not.
These different kinds of empirical results all point in the same general direction: human problem-solving is defined by complex conceptual representations which guide inferencing and these complex representations are specific to particular domains of knowledge. Such results have important consequences for SLA theorising. First of all, they are compatible with the claims by generativists that knowledge of language constitutes a special domain of knowledge to which “general problem-solving strategies” are largely irrelevant. This is because such domain general problem-solving strategies are largely irrelevant to other domains of cognition too. I will return to this issue in Chapter 5. Secondly, it would appear that the SLA debate (pro-UG/anti-induction versus pro-induction/anti-UG) presupposes that there is only one model of induction, namely the Newell and Simon type, and only one view of cognition, namely that it consists of some great undifferentiated mass to which general problem-solving strategies can apply. Even those who argue that language is unique tend to assume that what is left over from cognition once one subtracts the language faculty is still some great undifferentiated thing. Views of cognition which have emerged from research in cognitive psychology over the last 20 years suggest a quite different picture: human cognition consists of a number of autonomous representational domains, and even within what I have been calling conceptual representations — that level of representation which is an interface between vision, language, nonlinguistic audition, touch, etc., inferencing and problem-solving proceed on the basis of domain-specific encodings. Such a view of cognition requires a new theory of induction. The Newell and Simon model is also indequate in that it does not constrain the creation of the goals or endstates. This failure is connected to the problematic view of cognition just discussed. It is this fact which has, in part, led to the belief in linguistic circles that induction is necessarily unconstrained. It is this belief which motivates arguments that induction cannot explain language
THE AUTONOMOUS INDUCTION MODEL
135
acquisition because it leads to “crazy” results. The Newell and Simon model also does not adequately constrain the nature of the operators. Consequently, it must be abandoned and replaced with a better theory of induction. Some of these problems are directly addressed in the Induction Theory of Holland et al. Induction involves changes to mental representations made by operators. There are a set of procedures in the learning mechanism for selecting one or more operators to apply, in some particular order, to the initial or current state. We can think of the procedures as a set of functions. As we shall see, the selection of the operators is partially constrained by the activated mental representations of the current state. This means that no function can be selected which does not correspond to part of an already activated mental representation. In the Autonomous Induction Theory, this translates in the following way: I-learning functions will be triggered when a parse of a given string fails. As noted in Chapter 1, language learning begins when parsing cannot proceed to analyse current input with the extant parsing procedures. The constraint limiting the functions to activated mental representations of the current state means that the “search” for a solution to a given problem, e.g. how to classify and attach the next word in a morphosyntactic parse, cannot extend beyond the entities currently activated within the relevant autonomous representational system. If the learning mechanism is dealing with the form blik in the context of a successful analysis of I saw the the search for a successful solution to the classification will be constrained by the preceding representation and classifications relevant to the activated words. If the grammatical system already includes a representation of the information that there is a high probability that the next word following the will be a noun, the learning mechanism will favour a noun classification. As we know, this could turn out to be the wrong analysis, since determiners in English can also be followed by adverbs such as very and adjectives such as round. The solution is not, however, “crazy.” In the Induction Theory, the functions can transform representations by recategorising various elements, by constructing or activating new representations, and by constructing or activating associations or analogies. Holland et al suggest (1989: 12), therefore, that the search space metaphor be re-worked as a search through a set of alternative categorisations which can compete as possible solutions. The more successful a given solution is in solving a specific type of problem, the more likely it is to win out over its competitors. In adapting the Induction Theory to an autonomous cognition, it is best to drop the problemsolving metaphor and the search metaphor altogether since they actually interfere with our ability to understand language acquisition. These metaphors attempt an explanation at the wrong level of abstraction, and suggest, erroneously, that
136
INPUT AND EVIDENCE
categorisation and induction are to be defined in terms of intentions and actions. This not only leads to the mind/body problem, or explaining how the brain can have intentions and perform actions, it also raises the thorny problem of attention in learning. We have seen that many aspects of language are not dependent on the learner’s ability to notice and think consciously about the distinctions which are to be learned. Moreover, we are assuming here that attention and awareness involve the projection of phonological representations. I also take a radically different position on what the representations involved in i-learning consist of, and therefore, of what the functions are doing. The notion of competition, however, is central to the Induction Theory and will be retained here. As the example just discussed illustrates, learning English requires that the parser be able to select both a Det N, a Det Adj and a Det Adv analysis when confronted with a string the + some novel form. The learning mechanisms must construct such parsing procedures and develop production schemata for encoding such strings as well. As noted in Chapter 3, in the discussion of the various bootstrapping hypotheses, exactly how the learning mechanisms do this is at present not known in either L1 or L2 acquisition. Given our present state of knowledge, we might reasonably assume that learners rely on cues from a lower level of representation, e.g. the phonological level, or a higher level of representation, e.g. the conceptual level, or on distributional information at the same level of representation, e.g. the morphosyntactic level. Information from all such levels might therefore compete as the solution in analysing a novel form. 3.3 Domains of knowledge as mental models Holland et al. (1986) attempt to formalise inductive reasoning procedures within the framework of a mental models (MMs) model of cognition (Johnson-Laird 1983). Induction is then defined as a set of procedures which lead to the creation and/or refinement of the constructs which form MMs. What are mental models? MMs are temporary and changing complex conceptual representations of specific situations. They comprise both explicit representations as well as the conceptual representations which exist implicitly in an organism’s mind/brain in that they can be generated from a currently activated set of conceptual representations.
MMs, according to Johnson-Laird (1983: 397), are semantic entities which represent … objects, states of affairs, sequences of events, the way the world is, and the social and psychological actions of daily life. They enable individuals to make inferences and predictions, to understand phenomena, to decide what action to take and to control its execution, and above all to experience events by proxy;
THE AUTONOMOUS INDUCTION MODEL
137
they allow language to be used to create representations comparable to those deriving from direct acquaintance with the world; and they relate words to the world by way of conception and perception.
MMs are necessary to explain why problem-solving is domain specific. Problemsolving takes place within the limits imposed by a given mental model, which will include, for example, default assumptions about what is typical or normal in an event, in the absence of contradicting information. Canadians have a mental model of a which includes the information that it involves food consumption, the food normally being served at a table on dishes and consumed with the aid of cutlery, some time shortly after one normally gets up in the morning. The food served might include a broad range of items such as fresh or stewed fruits or juice, breads, toast, muffins or pastries, hot or cold cereals, pancakes or waffles (with maple syrup), eggs and bacon, coffee or tea, and so on. It will not include cold cuts or highly seasoned raw ground meat (as will a German MM of a ) nor will it include boiled rice or pickled fish (as will a Japanese MM of a ).6 Like other concepts, is a prototype. A typical might consist of orange juice, a bowl of Red River cereal, a fried egg and bacon and a cup of coffee, consumed from china sitting down at a table, with fork, knife and spoons. But a event will not cease to be one if it deviates from this scenario in some way. Pancakes eaten in a sleeping bag on the ground, outdoors, by a lake, is still an instance of within a Canadian mental model. So perhaps too is cold pizza eaten standing at the open fridge door, with the fingers and directly from a cardboard box at 1 p.m., if it is the first meal of the day after rising. Classifying such an event as a event, however, involves activating “enough” properties of the event type in a classification decision. Generational differences among Canadians are to be noticed as to whether pizza “counts” as breakfast. For my parents’ generation, pizza barely constitutes a meal. As such it deviates too much from the prototypical breakfast, and the relevant classification will not be made and perhaps disputed when others make it. Mental models are relevant to accounts of SLA. We should assume that they play a large role in the acquisition of vocabulary, in particular, in defining what an expression could mean for a given learner in a given situation. I have previously suggested that we explain nothing when we say that learners “transfer” meanings of the L1 to the acquisition of L2 vocabulary if we have no theory of meaning worthy of the name. We will move towards an explanation if we can be precise about what MMs a learner has for a given event, as well as how those MMs guide inferencing in the face of evidence which is both compatible and incompatible with the L1-based MMs. Research on concepts will define what
138
INPUT AND EVIDENCE
possible and plausible interpretations will be available for a learner and will also add some much needed content to claims that learners find the meanings of utterances “transparent”, when specific words are used in context.7 There are, however, additional ways in which MMs can contribute to a theory of SLA. Bley-Vroman (1990) has asserted that L2 learners are guided by certain assumptions in using the L2, namely that the next language they learn will have certain basic properties of the language they already know, e.g. words for objects, means to formulate questions, expressions for negation, etc. In the Autonomous Induction Theory, we can be more precise about what Bley-Vroman’s claim might mean. Given previous experience, including language instruction, learners will have developed a mental model of what a natural language is and will use this mental model, including its default representations, when inferencing and problem-solving with the L2. One specific area where this mental model will be useful is in the interpretation of feedback and correction dealing with the learner’s own utterances. I will return to this point in Chapter 10. There are other examples. It is well-known that L2 learners often assume that every word in the L1 will have a lexeme in the L2 (Blum-Kulka and Levenston 1983). This assumption can be derived from a mental model of language, the postulation of such a MM then being used to explain why learners are often surprised when they discover discontinuities across language lexicons (the “What do you mean there’s no word for shallow in French?” phenomenon, or the giggling that occurs whenever Europeans and non-native North Americans are confronted with translations of Eskimo words). It will also explain why they look for a form in the L2 dictionary solely on the basis of the existence of a given L1 word, why they ask native speakers “What’s the word for …” and so on. This said, the limitations of the mental models as a constraint on L2 acquisition must be acknowledged. Since the MMs are constructions in the conceptual representations, there is no reason to assume that the MM of language will constrain the nature of phonetic, phonological or morphosyntactic representations. For that, one needs other mechanisms. I have chosen here to recast the formalisms of the Induction Theory in terms of the formalisms of Conceptual Semantics, which offers a sophisticated theory of word meaning (Jackendoff 1983, 1985, 1987, 1990b). Holland et al. (1989: 12) instead prefer to characterise MMs as schemas. They observe (Holland et al. 1989: 12) The premise underlying the schema notion is that information about the likely properties of the environment is stored in memory in clusters that can be accessed as large units and that can serve to generate plausible inferences and problem solutions.
THE AUTONOMOUS INDUCTION MODEL
139
Although schemas can be large and complex, they do not have internal structural representation. Schemas do not have “syntax.” This is precisely what is wrong with them, since there is ample evidence from linguistic semantics that conceptual representations are more than clusters of concepts linked by networks (Jackendoff 1983, 1990b). I shall reject the inadequate conceptual theory adopted by Holland et al. and propose that an MM is a complex of propositions, consisting minimally of referential units and referential types (e.g. EVENT, THING), predicates and predicate types (e.g. ACTION, STATE, ATTRIBUTE), quantifiers (e.g. ALL, SOME, ONE), and including structured and complex correspondents to perceptual representations, for example the two-and-a-half-D representations of visual perception and object recognition (Marr 1982; see Eysenck and Keane 1995: Chs 2 and 3), and certain auditory perceptual representations. In sum, we can think of an MM as a complex of conceptual structures. I will retain the idea, however, that conceptual structures can be stored and accessed as large units. Clusters of conceptual information will arise as a consequence of categorisation and rule activation. These clusters will be stored in longterm memory. The ability to simultaneously access clusters of information will ultimately serve as a constraining feature of the theory. MMs will be created by rules which construct simple and complex propositions. Within Conceptual Semantics, categorisation judgements are accomplished formally by certain basic epistemic rules. These rules are inferential rules such as those in (4.9a, b). Rule (4.9a), for example, expresses the inference that a given instance belongs to a class if and only if it in fact is an instance of that class. It lies behind the categorisation judgement in (4.9b) which might be a representation of the statement YumYum is a cat and which involves the twoplaced semantic function IS AN INSTANCE OF. (4.9) a.
STATE TOKEN ↔ IS AN INSTANCE OF ([TOKEN]i, [TYPE]j,) TOKEN INSTANCE OF ([TYPE] j),
b. c.
i
STATE TOKEN THING TOKEN THING TYPE IS AN INSTANCE OF ( , CAT ) YUMYUM THING TOKEN YUMYUM THING TYPE ) INSTANCE OF ( CAT
140
INPUT AND EVIDENCE
d.
STATE TOKEN ↔ IS AN INSTANCE OF ([TOKEN]i, [TYPE]j,) TYPE EXEMPLIFIED BY ([TOKEN] i),
e.
j
THING TYPE CAT THING TOKEN ) EXEMPLIFIED BY ( YUMYUM
The representation in (4.9b) is a description of the process which permits us to make a categorisation judgement about some already identified entity, here YumYum, my Dad’s cat. In other words, the process allows us to say that some given individual or is a member of a category. In the sentence YumYum is a cat, the proper noun YumYum represents a , namely . It is asserted that belongs to the (or , if you prefer). But this conceptual structure can be true only if it is so that the short furry beast my Dad calls YumYum actually instantiates the TYPE, and that we are willing to assent to this. Our assent means being able to associate the particular token with whatever complex of properties defines the concept . This requires the existence of an operator INSTANCE OF, shown in (4.9c), which maps TYPE constituents into features of TOKEN constituents, and which presupposes the category membership of the TOKEN. So if other members of the category possess properties like , , , , , , and so on, then will have to be at least partially characterisable in these terms too. If the furry beast in question lacks enough of the relevant properties, or has completely weird properties like and , the classification will be rejected.8 The relevant inference rule is shown in (4.9d). The representation in (4.9e) shows an operator EXEMPLIFIED BY which maps a TOKEN into a feature of TYPE, and which is activated when new examples are added to the list of the TYPE concept. So if one’s experience of s has been limited to Felix and Fluff, operator (4.9e) will permit the concept to be extended to include . See Jackendoff (1983: 77–94) for further discussion. This story is fine as an example of how one categorises a novel object when making a categorisation judgement. It also captures the standard claim that the category or will emerge as the sum of the categorisation representations involving the same but different , with more weight being given to properties which are represented more often. Thus, we develop our notion of what a cat is, not merely by exposure to the idiosyncratic properties of YumYum, but also by exposure to Fluff, Ginger, Felix, and the countless nameless
THE AUTONOMOUS INDUCTION MODEL
141
neighbourhood cats, who may share some but not all of the properties of my Dad’s beast. This is at least the story that Holland et al. tell. But this story cannot be the whole story. First of all, as we will see in the next chapter, children start out with conceptual systems considerably richer than many have been willing to grant. This means that there is less conceptual i-learning than previously thought in the sense that certain categories or features appear to be available a priori. If this is correct, we can conclude that the development of categories is guided by such a priori information, and that this information is specific to the domain of knowledge. We might say, therefore, that certain primitive MMs are available to humans as part of their biological endowment and that these guide the construction of basic categories.9 As I have suggested, cognition appears to consist of a variety of domains and we need a theory of learning which is consistent with this perspective. Secondly, as previously noted, category formation in the Induction Theory is assumed to occur in the conceptual representations. In a theory adopting autonomous representations, categorisation must occur as a result of operations occurring within the autonomous systems. It cannot be the case that all categories of the grammar emerge from the operations responsible for categorisation judgements. It is a terrible mistake to confound these two types of mental operation. 3.4 Condition–action rules The propositions in (4.9) can in principle be rendered in other formats without loss of information. The basic format of the Induction Theory is the condition–action rule, which I adopt here. These are formalised computationally in a standard way as productions, whose computational properties are well understood.10 See Simon and Kaplan (1989: Section 1.3), for further discussion. These rules consist of two distinct parts formally and functionally, as seen in (4.10). (4.10) If Condition A holds, then implement Action B.
On the lefthand side of the rule appears a condition, which has the form “If such-and-such.” The condition is some mental representation which will be currently active in the system. It becomes active either by being constructed by the rules which construct representations, by being activated from longterm memory, by being constructed by the mechanisms which construct perceptual representations, or by some combination of all three. In the theory of i-learning adopted here, it is hypothesised that i-learning in the grammar takes place when parsers fail to assign an analysis to a string. We can now make this hypothesis more precise. When a parse fails at a given point, the parse will consist of an
142
INPUT AND EVIDENCE
activated representation containing the constituent to be classified or attached plus other expressions which have been parsed successfully up to that point. The i-learning mechanisms will thus operate on this representation plus any others linked to it. The constituent to be parsed will therefore appear in the lefthand side of a condition–action rule of the i-learning system. Learning will take place when a novel action is implemented, e.g. attaching the expression in a parse at a given level of representation, assigning it a morphosyntactic feature, or putting it in correspondence with a unit of another level of representation. Constraining the theory involves limiting the actions which the right hand side of the condition–action rule can perform. In the Induction Theory these actions can be quite diverse, ranging from the alteration of representations to the activation of the various processing mechanisms. The formalism will permit, for example, the explicit encoding of instructions to the speech processors, as shown in (4.11). (4.11) a. b.
If Speaker intends to make a speech utterance, then activate Speaker’s sentence generator. If Speaker intends to make a speech utterance, then activate all precompiled syntactic strings in memory.
Specifically, in the Induction Theory the condition–action rules fall into three types: system operating principles, categorisation statements or empirical rules which describe the environment and its likely next states, and inferential rules, which are defined as domain-independent rules for changing the general knowledge base.11 System operating principles are innate and therefore are neither learnable nor teachable; they run the performance system. The rules in (4.11) would be an example. While it might sometimes be useful to characterise the operations of the processing mechanisms in such terms, I will avoid doing this, assuming rather that the internal system operations take place independently of the rules determining the content of cognition. In any event, all cognitive models must have some mechanisms which drive the computational system, and the exact form they take in the Autonomous Induction Theory is irrelevant for my purposes. Categorisation judgements will involve the implementation of two distinct mechanisms, namely mechanisms to create new condition–action empirical rules, and mechanisms to revise the strength of these rules. The representations and inference rules in (4.9) illustrate some of the basic constituents of a categorisation judgement. We assume, consistent with Holland et al. (1989: 179–204), that humans possess a rich set of conceptual primitives which enter into categorisation judgements. The rule creation mechanisms will therefore deploy these primitives and conceptual structures in the development of empirical rules.
THE AUTONOMOUS INDUCTION MODEL
143
I hypothesise specifically that the rule creation and rule emendation functions involved in categorisation judgements alter conceptual structures in quite specific ways: adding constituents, deleting constituents, indexing, deleting indices, etc. They also create new mental models by adding relevant conceptual structures to these complexes. I stated above, however, that categorisation and the making of categorisation judgements are two distinct types of processing. Categorisation judgements are made in the conceptual system. Categorisation of grammatical entities involves analysing input in terms of the primitives and categories of the different autonomous representational systems. It involves, as we have seen, in part selecting particular cues as more or less informative for classifying any given instance of a thing. The format of condition–action rules is general enough to be used for this function too. However, we have already seen that cues to a category located in a higher or lower level of representation will be deployed by the correspondence function: Category X at level m = Category Y at level n. We can phrase correspondence rules in terms of condition–action rules. As an example, we might formulate the cue that a prosodic word with two-syllables and stress on the second syllable cues the grammatical class Verb as: If the morphosyntactic item M currently active corresponds to a prosodic word [σW σS], then assign M to the word class [−N, +V]. It has to be kept in mind, however, that a condition–action rule involved in classifying kick as a verb or [p] as a consonant is not operating at the level of conceptual structures but rather operating at the level of morphosyntactic representation or at the level of phonological representation respectively. Categories are seen here as complex and derived entities, exhibiting a rich variety of properties. Over time, cues will emerge which are tied to these properties (Jackendoff 1983). Some cues will be represented as more typical, reflecting the frequency with which these cues are associated with an instance of the category. Clusters of typical cues will in turn be more informative than a single typical cue or clusters of atypical cues. Cues compete in an analysis of a string. To the extent that a stimulus instantiates several cues in a cluster, and the parsing system is already able to recognise and deploy those cues, then it will be able to analyse an element as a member of a category. In Chapter 3, I referred to the notion of equivalence classification introduced by Flege (1991) and investigated by Flege and his colleagues in a number of studies. Recall that equivalence classification occurs when an L2 stimulus which has specific properties is assimilated to a phonetic category of the L1 which has slightly different properties. The theoretical problem is that things which are “different” (stimuli of the L1 and the L2) are treated as if they were “the same thing.” An explanation for this phenomenon emerges once we grant that linguistic categories
144
INPUT AND EVIDENCE
are indeed complexes of properties (or prototypes). Lacking one property may not be enough to force the creation of a new category if the stimulus has enough properties which win the competition to be relevant cues. When a stimulus exhibits several atypical properties, then the likelihood of equivalence classification occurring is accordingly reduced. Only at this point, must learning take place, meaning that the learning mechanisms will encode properties of the item currently being parsed and store them as cues to a novel category. Let us now consider the implications of one important constraint on the theory of i-learning: Categorisation rules do not create primitives; they exploit the primitives provided by the autonomous representational systems. What does this mean? Formally, it means that categories are expressed or represented in terms of the symbols made available at a given level of analysis — phonetic features in the case of segments, morphosyntactic features in the case of word classes and phrases, conceptual features in the case of concepts. The parsers identify the presence of a category during a given parse on the basis of a variety of cues which compete. Cues and formal representation together define categories. Since categories are such complexes, rules creating new categories are doing so by creating new combinations of cues to specify the presence of a specific formal description. They create new constituents by combining or recombining primitives or derived features. Learners learn new categories but they do not do so randomly or in an arbitrary or “crazy” way. In fact, we have good reason to suppose, given the existence of equivalence classification as a general phenomenon of L2 learning, that category creation will take place in an extremely conservative fashion, learners mapping L2 stimuli onto L1 categories wherever they can. Inferential rules were the third type of condition–action rule of the Induction Theory. They regulate the categorisations. I list the major types below in (4.12) with examples from Holland et al. (1989: 43). I do not mean to suggest that this list is exhaustive; presumably detailed empirical investigation will suggest others. Each type serves to constrain the nature of rule creation. It would be better therefore to think of them as constraints on rule formation and not as rules themselves. In what follows, I will abandon the term “inferential rules” in favour of the term “constraints on rule formation.” (4.12) Constraints on rule formation a. the Specialisation Constraint: If a prediction based on a strong rule fails, then a more specialised rule that includes a novel property associated with the failure of its condition and the observed unexpected outcome as its action is created. b. the Unusualness Constraint: If a stimulus or input has an unexpected property, then that property will be used in the condition of a new
THE AUTONOMOUS INDUCTION MODEL
c.
d.
145
rule, if rule generation is triggered in close temporal proximity to the occurrence of the stimulus or input. the Law of Large Numbers Heuristic: If S is a sample providing an estimate of the distribution of property P over some population, then a rule stating that the entire population has that distribution will be created, with the strength of the rule varying with the size of S.12 the Logical Implication Constraint: Given the rule “To do X, first do Y”, the rule “If one does not do Y, then one cannot do X” will be created.13
The condition stated in (4.12a) states that when an expected outcome associated with a rule which is otherwise strongly confirmed fails, then one makes minimal changes to “save” the rule. This constraint means that rules which function quite well most of the time will not be abandoned at the first bit of disconfirming evidence. Rather, one develops situation-specific “rules for the occasion.” The minimal changes involve coming up with a more situation-specific rule as part of the condition which will not be satisfied by the general rule. The action consists in re-representing the current situation. (4.12a) has as a consequence that this specific rule will now apply instead of a general one. It should be seen therefore as a general formula for recovering from an overgeneralisation on the basis of positive evidence. Every instance of the failure of a generalisation, inferrable or derivable from positive evidence, can lead to the creation of more specific empirical rules. Overgeneralisation is a readily observable part of first (Brown 1973; Bowerman 1978; Mazurkewich and White 1984) and second language acquisition (Wode 1981; Carroll and Swain 1993). Consequently, (4.12a) must play a part in explaining how learners retreat from overgeneralisations given appropriate input. This condition should also strike a chord among grammarians; it can be seen as relatable to the Elsewhere Condition in phonology (Kiparsky 1973). The Elsewhere Condition states that when two phonological rules can apply to the same structural description, the more specific one operates. In the terminology of the Induction Theory, the Elsewhere Condition presupposes that two phonological rules can compete to apply. The Elsewhere Condition then regulates the triggering of the rules. If this comparison between (4.12a) and the Elsewhere Condition is legitimate, then it would appear that the latter, invariably assumed by linguists to be specific to language, is not. It would appear to be a more general property of cognition that specific rules apply in the place of valid more general ones. The Elsewhere Condition says nothing about how general and specific rules are created in the first place, it merely regulates them once they are part of a grammar. It is necessary therefore that any theory of language countenancing learned rules include a learning theory for language incorporating something like the Specialisation Constraint.
146
INPUT AND EVIDENCE
The Unusualness Constraint in (4.12b) also makes a link between rule creation and other operations of the cognitive system. It is really referring to saliency. Saliency is often discussed as if it arose from objective properties of stimuli, as if, in other words, it were part of the stimuli. I take a different view. Saliency is a result of the fact that our perceptual systems are guided by innate or acquired tendencies to record certain stimuli (babies crying is a good example). Our attention is attracted to events, actions or objects in situations which our MMs lead us not to expect. A former flatmate Susanne Rohr decided to put a lifesize cardboard cutout of Bill Clinton in the front hall of our flat. It startled the devil out of me the first time I saw it. It continued to startle me for weeks afterwards. It is hard to argue that the presence of “Bill” was inconsistent with my mental model of the flat because I knew “Bill” was there. One can argue that my default representation of the flat entrance was stronger and more readily activated than the newer competing model. It might also be that I had a stimulusspecific representation of “Bill” which was disturbed by the fact that “Bill” tended to take on a different appearance (sometimes he was wearing a hat, sometimes he was not wearing a hat, etc.). The constraint in (4.12b) is designed to capture this kind of response. Our attention will be drawn to and we will tend to incorporate unexpected aspects of a represented situation which are inconsistent with our default representations. Only repeated reconfirmation of the new representation (“Bill” is still there) will lead it to win out over other representations. The Unusualness Constraint amounts to the claim that our cognitive systems are so organised that we treat discrepancies between unexpected elements in our current mental states and our MMs as problems with our MMs, and not problems with our perceptual systems. Thus, we treat unexpected elements of a situation, say “Bill” lurking in the front hall, as being a problem of wrong belief; we do not say, normally, “I need new glasses” or “I must be hallucinating.”14 (4.12b) thus makes a link between attention and representation. It accordingly expresses the fact that saliency is not a property of the environment, nor of the perceptual system but something which will be derived from the functions and/or representations of the cognitive systems. It is a derivative of the processing of the input. Bill Clinton in the front hall is a salient event in the context of a life led without him. Hillary must have a different perspective. Now consider the construct of unusualness in the context of autonomous representational systems. There is good reason to suppose that speech perception systems become attuned to the cues which are most relevant for recognising words. Jusczyk (1993) proposes that cue-based learning guides the application of attentional resources so that the speech processing systems ignore cues which are irrelevant for distinguishing among the distinctive features of a language’s
THE AUTONOMOUS INDUCTION MODEL
147
segmental system. Word recognition in English depends on being able to distinguish voiced from voiceless stops. It does not require being able to distinguish coronals articulated with the blade of the tongue from coronals articulated with a retroflex tongue position. In processing speech, anglophones can discard the cues used for segmental classification of these distinct sounds.15 Equivalence classification can thus be seen as arising because of acquired attentional biases which lead the processors to discard information which is irrelevant for certain classification functions in the L1 (here word recognition). But the research on feature detection discussed in previous chapters has shown that “unusual” cues, such as those relevant for detecting and representing clicks can break through any attentional bias. Clicks are irrelevant for the recognition of English words too. However, these sounds do not appear to possess enough phonetic features of English consonant categories to be assimilated to any of them. They are just too “unusual.” We might therefore see “unusualness” as a property inherent to stimuli which survive the acquired biases of the processing systems. The constraints in (4.12c, d) have a somewhat different status. These are abstract rules of reasoning which may be learned rather than being innate. This point is controversial; certainly Piaget claimed that the Law of Large Numbers Heuristic is not learned. Nisbett, Krantz, Jepson, and Kunda (1983) and Nisbett (1993a), however, have claimed that we learn rules of inferencing involving statistical properties of samples and populations. If Nisbett and his colleagues are correct, then (4.12c) may well be part of a number of learned rules connected via pragmatic reasoning schemas to the individual’s mental model of .16 For example, Holland et al. (1989: 230–54) address the difficult issue of the confirmation of information and argue that it is constrained by rich MMs about entities in the world. Part of the knowledge inherent in these MMs will be knowledge about how much individuals and instances of a can vary on some given property. This knowledge can be seen as part of the knowledge of what constitutes a typical instance of a category.17 In the theory developed here, we have to distinguish knowledge of the preference rules associated with particular cues for a category, knowledge which is explicitly encoded in conceptual structures, from the cognising of typicality conditions on categories, something which emerges from the competition among the processing procedures used in categorisation. We have no reason to assume that speakers of English have explicit representations of what constitutes a typical instance of a voiced consonant or of a modal verb and yet the processors can and do distinguish among the typical and atypical cases. Holland et al. also claim that we can estimate the variability of events. They remark (Holland et al. 1989: 232)
148
INPUT AND EVIDENCE To estimate the variability of events, it is necessary to establish, at least tacitly, the reference class for the events, that is what kind of events they are. More generally, as Thagard and Nisbett (1982) put it, “our confidence in inferring a generalisation ‘All F are G’ depends on background knowledge about how variable Fs tend to be with respect to Gs. This knowledge is based on F being a kind of K1, G being a kind of K2, and on the variability of things of kind K1 with respect to things of kind K2” (pg. 380). When people believe that the kind of object in question is highly variable with respect to the kind of property in question, they infer little or nothing from the observation that several examples of a subkind have a given property. On the other hand, if they believe that a given kind of object is invariant with respect to a given kind of property, or nearly so, then even a single example may serve to establish a confident generalisation. (page 380 = page 56 in Nisbett (1993b))
This property of reasoning plays an important role in explaining why in some learning situations, the presence of a single token of an unexpected thing can lead to knowledge restructuring while in other learning situations even multiple disconfirming instances have no such effect. Cultural stereotypes offer a perfect illustration of the second type of reasoning. No amount of exposure to fat Germans evokes commentary from my neighbours, since extensive experience has taught them that Germans come in all sizes and shapes. However, a single instance of a fat American tourist (preferably in shorts, runners, baseball cap, and Walkman) is sufficient to raise discussion about Americans being obese. The stereotype is that Americans are fat, but Germans are not. A single fat American is sufficient to establish a confident generalisation while multiple instances of fat Germans are not. The solution lies in the law of large numbers, coupled with real world knowledge about the variability of kinds of objects with respect to kinds of properties. A “single instance is sufficient for a complete induction” when we take it for granted [i.e. have an MM which requires it, SEC] that objects of the kind in question are invariant with respect to properties of the kind observed. For example, observing the colour of a sample of a new chemical element leaves us in little doubt about the colour of future samples. Myriads of concurring instances do not convince when we take it for granted that the kind of object is highly variable with respect to the kind of property observed. For example, observing a bird in the rain forest that is green does not convince us that the next bird we see of the same type will be green because we do not assume invariability of colour for bird types.
Holland et al. add (1989: 235) that if the organism believes that chance has no role to play in the behaviour of an entity, or in its possessing certain properties,
THE AUTONOMOUS INDUCTION MODEL
149
then a strong inference is likely to be made. Learners learning an L2 are far more likely to be surprised at differences between the L2 and the L1, then persons learning an L3 or an L4, who have already experienced the extent to which linguistic systems can vary from one another. The MMs we develop from exposure to the L1 appear to lead to just this sort of strong inference. Moreover, Holland et al. (1989: 248) observe that people generalise less about populations with which they are familiar, an interesting if at first somewhat unexpected result. One might suppose that familiarity means greater knowledge and therefore a greater ability to predict the properties and behaviours of such populations. Holland et al. acknowledge that while this is true to some extent, it appears to be also true that familiarity makes one aware of the variability inherent in a population (as in the case of fat members of one’s own society). A moderate degree of generalisation is possible whenever a central tendency can be represented (Holland et al. 1989: 248). Finally, note that statistical properties of individuals and events are relevant to the development of both an MM about those things and to the tuning of autonomous processing systems. Humans share with other animals a remarkable sensitivity to the probabilities of occurrence of patterns in perceptual arrays, and patterns in analysed mental representations, including representations of language (Kelly and Martin 1994). As they point out, this is a domain-general property exhibited in the operation of domain-specific processes. But variability also has a role to play in inferencing and autonomous processing. Holland et al. (1989: 253) claim that events falling within an expected range of variability might not attract attention, and go unnoticed, however, those events which fall outside of the range of normal variation would draw attention and thus lead to the creation of emendations in the current representations. In short, the informativeness of events is determined in part by what we already know about objects in our experience, and what we know often is not the result of direct experience but rather acquired indirectly through the retelling of our culture. Similarly, the informativeness of specific features in classifying given input may depend on the range of variability of exemplars. In the case of (4.12d), we are dealing with relationships among the action parts of condition–action rules. If we interpret this formulation literally, then (4.12d) will constitute learned constraints on behaviour. But (4.12d) expresses a more general logical consequence of inference. It is moreover an instance of deductive reasoning. It is a logical consequence of the deductive rule If P, then Q. Holland et al. (1989: 265–86) ask the question: Do ordinary people use formal deductive rules to reason about ordinary everyday problems? After a review of some of the literature on reasoning tasks, some of which we cited earlier, they conclude that people use pragmatic schemas to reason and not formal logical rules.
150
INPUT AND EVIDENCE
In the mental model view, reasoning is based on MMs constructed using “general linguistic strategies for interpreting logical terms such as quantifiers, and specific knowledge retrieved from memory” (Holland et al. 1989: 269). In other words, we represent problems in terms of conceptual structures based on the linguistic properties of the problems, and then use situationally-based representations to solve them. The pragmatic schemas are knowledge structures at an intermediate level of abstraction in comparison to the formal rules of classical deductive logic. See Kelley (1972, 1973), Cheng, Holyoak, Nisbett, and Oliver (1986), and Cheng and Nisbett (1993) as well. While acknowledging the role of heuristics in solving syllogistic reasoning, or distinguishing valid from invalid inferences, and the types of problems discussed in note 13, Sperber and Wilson (1986: 93–103) argue correctly that the empirical results do not show that humans possess no system of deductive reasoning. They make a case for at least a limited form of deductive reasoning, which includes one very important constraint (Sperber and Wilson 1986: 96–7), namely that inferencing does not include what they call introduction rules. An introduction rule is any rule whose output representation contains every concept in the input representation plus one more. The only deductive rules permitted are those which eliminate concepts in representations. I will adopt both the hypothesis that deductive reasoning is possible and the constraint against the use of introduction rules in deductive reasoning. I will also adopt their distinction between trivial and non-trivial implications. A set of assumptions {P} logically and non-trivially implies an assumption Q if and only if, when {P} is the set of initial theses in a derivation involving only elimination rules, Q belongs to the set of final theses. (Sperber and Wilson 1986: 97).
Only non-trivial implications are involved in comprehension processes. In addition, Sperber and Wilson make a distinction between analytic (or deductive) and synthetic (or inductive) implications, the first being those which are necessary and sufficient for grasping the content of a set of propositions involved in a deduction, the second being the implications which follow from the application of at least one induction rule to a deduction (Sperber and Wilson 1986: 104).18 I assume throughout therefore that deduction rules exist, that induction rules interact with them, and that the distinction between grasping the contents of a proposition which is the result of an inference and the inductive implications of an inference is a valid one. I will make use of both types of inferencing in Chapter 10.
THE AUTONOMOUS INDUCTION MODEL
151
3.5 Competition among rules I have said that a condition in a rule (the lefthand side) has to “fit” the current situation if it is to be able to alter current representations. A “fit” makes it possible for a rule to compete for activation and firing. In the Induction Theory quasi- or q-morphisms (instead of isomorphisms) are used, which means that multiple activations can occur which do not necessarily match, e.g. a representation at a current or lower level of analysis. See Holland et al. (1989) for details. These multiple activations create dynamic options within the system which is looking for “the best fit.” In the Induction Theory, rules compete to fire. This means that rules can be variably competitive. We have already seen two factors related to the competitiveness of a rule: specificity (the Elsewhere Condition) and saliency. Competition also depends on a rule’s strength. This was illustrated above with the “Bill” example. Each rule is associated with a numerical value, its strength, which reflects how well it has done in the past. In other words, the system tracks how often a given rule has successfully applied. This is the formal rendition of the observed sensitivity humans exhibit to contingencies. Old rules are stronger than new rules because their continued existence depends on their being either fairly accurate representations of the stimuli, or at least “useful” to the organism. This is yet another reason why induction need not give rise to “crazy” rules. At least some “crazy” rules will simply not function optimally within the mental model, nor be accurate representations of the stimuli or input. Strongly supported stored rules get triggered before weakly supported stored ones, subject to the constraints listed above. This probably does not need to be independently stated; rather we may assume that the repeated processing of a given representation involves changes in its threshold of activation. In that case, we could derive the strength factor from the functional architecture of the mind, a desirable result. Rules receiving support from other rules are more likely to be activated. This means that rules which will fire whenever some other rule fires will get additional support each time that other rule is activated. Rules will fire in clusters. The consequence of this is that a single stimulus event or input could have multiple consequences in the system. By activating one rule in a cluster of associated rules, all of the others in the cluster will fire. Rules appearing in clusters are going to be more competitive than rules functioning alone. Note that the hypothesis that representations compete as possible solutions to a representational “problem” is hardly novel. The Competition Model assumes as a central premiss that competition plays an important role in directing change within the system. Competition has also been worked into Optimality Theory. Indeed, we
152
INPUT AND EVIDENCE
may conclude that competition among representations has become a standard feature of formal approaches to learning and learnability. Let us consider now two examples, the first dealing with i-learning of a grammatical phenomenon on the basis of metalinguistic instruction, the second based on the bottom up i-learning of the same phenomenon. Suppose that a French L2 learner is told “The noun oiseau is masculine” and encodes this information as in (4.13a). If the French word oiseau is being processed, (4.13a) will be activated, but so will (4.13b), if it is part of the system, which could, in turn, activate (4.13c).19 (4.13) a. b. c.
If oiseau = /uazo/ appears in a representation, then oiseau = /uazo/ is MASCULINE. If oiseau = /uazo/ appears in a NOUN PHRASE then its DETERMINER AGREES IN GENDER. If the SUBJECT NOUN PHRASE is MASCULINE, then a PREDICATE ADJECTIVE AGREES IN GENDER.
Metalinguistic instruction, for example, the contents of a grammar lesson will, by hypothesis, be encoded in conceptual representations like those of (4.13). These conceptual representations require the separate encoding of conceptual categories for grammatical constructs like masculine gender, noun phrase, agreement, and so on. Each of these concepts will consist of a mental model involving an ensemble of properties and predications. The concept MASCULINE NOUN can, however, only be understood in terms of various cues for gender, e.g. the form of determiners and agreeing adjectives, the sex of the referent of a noun, and so on (Carroll 1989, 1995a, 1999). At its most basic, the category might simply mean that oiseau = /uazo/ co-occurs with un = /oen/, or some other modifier. Thus, in contrast to the claims implicit in Krashen’s work contrasting metalinguistic learning and naturalistic acquisition, and made explicit in Schwartz’ model, the information about the grammatical construct gender which is operationalised in the parser and encoded in the lexical entries via gender features must come together with any metalinguistic information explicitly represented in the conceptual representations for this is the only way that the concept GENDER can be meaningful. There is no other way for us to develop a mental model of gender. Schwartz would be correct in claiming that correct parsing and production of noun phrases in French does not depend on having a metalinguistic representation of GENDER; such claims are not in dispute. Rather, the point I want to make is that it is wrong to suggest that one could have an accurate metalinguistic concept GENDER independently of the cues used to identify it. These cues must come from stimuli which provide evidence for the construct.
THE AUTONOMOUS INDUCTION MODEL
153
Constructing a meaningful representation of the concept GENDER does not, of course, mean that one has constructed an accurate representation. Indeed, it is entirely likely that for many learners of French, the concept of MASCULINE GENDER is simply a listing of the forms which occur with certain exponents of gender. I have argued in previous work (Carroll 1989) that French gender may be a grammatical phenomenon which can only be accurately controlled if it is acquired bottom up, through the operation of the autonomous morphosyntactic representational systems. This is because, for (4.13) to have any consequences whatsoever on the morphosyntactic representations used in parsing, the categories in (4.13) must correspond to categories and primitives available at that level. However, the modular nature of linguistic cognition places severe constraints on the correspondences across representational systems. The sound form /uazo/ can be put into correspondence with a morphosyntactic unit, or lemma in the sense of Levelt (1989), here (oiseau). Can the concept MASCULINE be put into correspondence with a morphosyntactic feature? Perhaps, in principle, but this would require a metalinguistic analysis of the entire system of nouns learned. In Chapter 3, I mentioned work in which I showed that adult anglophones readily identify MASCULINE with the concept SEX category (Carroll 1999a). Since French gender does not correspond to this concept, and, indeed, since the meaning of nouns is largely irrelevant to the functioning of this subsystem of the morphosyntax, this kind of semantic analysis leads learners in the wrong direction. Other analyses of gender demand other correspondences. Tucker, Lambert, Rigault, and Segalowitz (1968) and Tucker, Lambert, and Rigault (1977) hypothesised that francophones encode gender in terms of the phonetic shapes of the ends of nouns. Might the concept MASCULINE be put into correspondence with phonetic cues to gender? In the theory articulated here, this correspondence is excluded. The conceptual system and the phonetic system operate largely independently of one another, and while the conceptual system can link to lexical entries, it is hypothesised that these contain fairly abstract phonological representations of the pronunciations of words. The kinds of phonetic schemata that Tucker et al. appear to have had in mind would not be encoded in those entries. Finally, consider that more recent analyses of languages with binary systems treat gender as a privative feature on the marked nouns (Roca 1989). This means that gender attribution is just an arbitrary association between a morphosyntactic feature and a lexeme or between a morphosyntactic feature and a derivational suffix (Carroll 1989). The agreement system operates on the basis of a contrast between those elements marked with the feature and those which have no marking, to which a default rule assigns a spellout of the determiners. Let us assume that this means, in the case of French L1 speakers,
154
INPUT AND EVIDENCE
that all feminine nouns would be assigned a feature [+Gender] in the mental lexicon. Masculine nouns would not be marked at all. Their representations, in other words, are underspecified, and a default rule will subsequently fill in an appropriate value. See (4.14). Masculine (4.14) +N ⇒ −V Gender
If this is the correct way to analyse gender morphosyntactically for a language like French, there will be no identity of gender features to the concept MASCULINE because the former is not encoded directly anywhere in the lexicon. In other words, there is no unit for the concept to correspond to. Since the correspondence rules do nothing more than establish an identity relation across the levels of representation, there will be no possible correspondence between a unit of the conceptual level and a non-existent unit of the lexicon or of the morphosyntactic level. Let us now reconsider the same problem as a problem of autonomous i-learning. As stated previously, the fact that learned metalinguistic information is encoded in conceptual representations and can have an effect on the morphosyntactic or phonological systems in only certain cases does not mean that i-learning cannot occur independently within these autonomous systems. We might, for example, assume the rules in (4.15). (4.15) a. b. c. d. e. f. g. h. i. j.
If /oen/ appears in a string, project a Det. If Det appears in a string, project a DetP. If /oen/ immediately precedes NPi, attach NPi as a right sister to Det. If N appears in a string, project an NP. If /oen/- Det-INDEF. precedes /uazo/-BIRD in a string, encode /uazo/-BIRD as [+N, −V]. If /bon/-[−N, −V]-GOOD precedes /uazo/-BIRD in a string, encode /uazo/-BIRD as [+N, −V]. If /ynV/ appears in a string, project a Det. If /ynV/ immediately precedes NPi, attach NPi as a right sister to Det. If /ynV/ Det-INDEF. precedes /grenujV/-FROG in a string, encode /grenujV/-FROG as [+N, −V, +Gender]. If /bonV/-[−N, −V]-GOOD precedes /grenujV/-[+N, −V]-FROG in a string, encode /grenujV/-[+N, −V]-FROG as [+N, −V, +Gender]. · · ·
THE AUTONOMOUS INDUCTION MODEL
155
Each of these rules involves linking the occurrence of a cue for gender with featural markings. The representation of masculine gender emerges as a contrast between those items encoded with a feature for [+Gender] and those which are not. If enough such rules occur, the exact number being an empirical matter, then a generalisation will take place, i.e. that /ynV/ is a marker for feminine gender. The cues for masculine gender will then be the complementary set. The occurrence of /ynV/ in a string will activate all rules in which it appears in the condition part, here (4.15g, h, i). These rules have various actions and the carrying out of each one will entail activating the condition part of other rules, whose actions will entail activating the condition part of yet other rules. And so on. So the activation of a single rule can have a “ripple effect” throughout the grammar. Linkages across the system can take place which create clustering effects. Although induction must be the mechanism explaining incremental learning, it is incorrect to assume that that is all that it can explain. 3.6 Clustering of effects? It might be objected at this point, however, that when generativists talk about “clustering” effects and the “deductive” consequences of parameter-(re)setting, they are not talking about effects involving a single grammatical distinction generalised across a number of contexts, but rather the claim is that learning one aspect of the grammar has immediate consequences for different and completely unrelated grammatical distinctions. Recall from previous discussion that learning the particular setting of the head direction parameter for a given language is supposed to have immediate consequences for negative placement, for quantifier placement, or for the relationships between adjunct clauses and anaphora. The objection would be well taken. However, as mentioned before, the deductive consequences of setting specific parameters has become, under intense scrutiny, more and more local, making the P&P theory less different from the Autonomous Induction Theory than one might think. If there are no cross-systemic consequences for learning tense or head direction, then the Autonomous Induction Theory can hardly be criticised for failing to explain them. Secondly, my response is to say: What evidence is there for such broad superficially-unrelated clustering effects in SLA in particular? I think this issue is entirely open at present, but I am not sanguine. To date no evidence has been amassed to show that L2 learners know a lot when they know a little, beyond what can be explained by the transfer of acquired and universal knowledge. This means that there is no evidence suggesting that “accessing” UG has deductive consequences for SLA. On the contrary, there is evidence suggesting that adult learners fail to link phenomena
156
INPUT AND EVIDENCE
in the ways predicted by P&P theory. Let us review a number of studies which have implications for the claim that learning one part of the grammar has “deductive” consequences for other seemingly unrelated phenomena. Let us consider first a number of studies dealing with the acquisition of phrasal order by L2 learners. Hyltenstam (1977) presents detailed data involving negation, produced by learners of Swedish, which argues against rapid acquisition and rapid restructuring and for lexically-specific restructuring. What do the stimuli look like? In Swedish main clauses, the negative functor inte comes after the finite verb. In subordinate clauses, it is placed before the finite verb. In the terms of current generative analyses, one would characterise the facts of Swedish by hypothesising that in main clauses the finite verb raises out of VP to some equivalent functional head with the negation remaining in situ before the VP. I shall assume that it raises to Comp, the usual analysis of so-called “V2” ordering. See (4.16) which is taken from Platzack (1992: 65).20 In embedded clauses, the verb remains in the VP, the assumption of Platzack being that the Tense feature lowers onto the verb. (4.16)
CP SpecCP
C′ COMP
IP SpecIP
I′ INFL
VP Neg
VP NP
subjectj verbi
tj
ti
inte
tj
V′ V
NP
ti
object
Whatever the learners learned, it was not the various generalisations encoded in (4.16). Hyltenstam’s study shows quite nicely that the learners disregard the obvious and critical differences in the stimuli they get from native speakers and generalise the location of negation across both main and embedded clauses. In
THE AUTONOMOUS INDUCTION MODEL
157
other words, they are initially insensitive to the differences in order of the negative marker and the verb in main and embedded clauses. This may be a consequence of simplification in production. Alternatively, it might be the case that they do not treat the variable position of the negator with respect to the finite verb as a cue to underlying order when analysing input. At the first data collection time, most learners were correctly ordering all finite verbs in main clauses before the negative marker and incorrectly using the same order in the embedded clauses which suggests that the main clause data is taken as evidence for ordering constituents rather than the embedded clause data. Secondly, the learners also distinguished between finite auxiliaries and finite non-auxiliary verbs, something Swedish does not do with respect to the location of negation. The learners tended to first locate the auxiliaries before the negative functor, only later extending the analysis to all finite verbs in main clauses. This means that at the first data collection time, all learners correctly ordered auxiliaries before the negative marker in main clauses, but only about half of the learners raised all verbs in this context. Moreover, in the embedded clauses, most subjects incorrectly ordered all verbs before the negative marker, and only the subjects distinguishing between modals and main verbs got some of the embedded clauses right because they did not raise the main verbs. Thirdly, only after they had generalised the class of raised verbs to the full set in main clauses, did they begin correctly locating negation with respect to finite verbs in embedded clauses. What conclusions does this study impose on us? Nothing in this data suggests that the learners recognise that the variable position of the verb with respect to negation in main and embedded clauses is a cue to underlying order. If learners are indeed “accessing” UG and, in particular, accessing a head direction parameter, then we might expect them to be especially sensitive to the critical triggers of that parameter. This the learners of Swedish were not. The entire data set can be accurately described by hypothesising that learners take the main clause order as the relevant cue for underlying order. Parameter-setting does not appear to play any role in explaining the direction of change either. Nothing suggests that learners are restructuring their grammars on the basis of inclusive features, e.g. [+V]. On the contrary, they appear to be reclassifying verbs into two sub-groups (raise/don’t raise) based on features of lexical subclasses. Indeed, one can describe the data set by claiming that learners impose an “English-style” analysis; the learners appear to be treating Swedish at first like contemporary English (auxiliaries and models raise), and then like middle English (all verbs raise), making minimal changes to their analysis by changing the grammar verbby-verb. See Meisel (in preparation) for further discussion of this data.
158
INPUT AND EVIDENCE
Let us examine a second phrasal order study. White (1991a) assumes that a parameter related to verb raising explains the distribution of facts in (4.17).21 (4.17) a. Jean a souvent visité le musée. b. Marie regarde souvent la télévision. c. John has often visited the museum. d. *Mary watches often television.
Both English and French are underlyingly SVO. The difference here therefore does not hinge on setting the head direction parameter, but a second lexicallybased one. The parameter in question requires all verbs in French to raise from VP to INFL while in English, only auxiliaries raise. Resetting the parameter presumably means reclassifying the class of [+V] elements which can raise. The relevant cue is therefore the position of the adverb with respect to the tensed verb. If the adverb follows the verb, the verb has raised, if the adverb precedes the verb, it has not. White (1991a) studied two groups of francophone L1 ESL learners, one group of which were taught certain aspects of English adverb placement, the other being taught aspects of question formation. Her results clearly show that the subjects benefitted from instruction about adverbs, including information about where adverbs cannot occur in English, e.g., between the verb and the direct object (*He sang immediately the song). However, her results also show that learners did not acquire a difference between Subject Verb Adverb Object (SVAO) and Subject Verb Adverb PP (SVAPP) frames. If learners are learning which verbs are not raising in English then they should cognise that the first order is out but the second is fine.22 Moreover, no clustering effects were observed, even prior to instruction. White specifically emphasises that her subjects do not cognise that the presence of SAV in the input entails the absence of SVAO. Schwartz and Gubala-Rybak (1992), in a critique of the White study, conclude: This is not parameter-resetting. I agree. White’s learners are inducing the relevant phrasal orders on the basis of positive and negative evidence. Grammatical restructuring of surface linear orders in SLA is, I would argue, largely the product of inductive processes. Next, consider work by Hawkins, Towell, and Barzergui (1993). This study, like the White experiment, is an exploration of the Pollock phrase structure proposals, adverb/neg placement, and the parameter of opacity/transparency of agreement. These authors investigated, using a cross-sectional design, how subjects develop intuitions about acceptability as they acquire the L2. Subjects were British university students studying French and were at two different levels of proficiency. Acceptability judgements were elicited using a written test and
THE AUTONOMOUS INDUCTION MODEL
159
contextualised sentences. Results show that subjects were often unsure of whether a given stimulus was acceptable or not. They also show that subjects were very accurate in judging the correct position of the adverbs pas and souvent. However, the subjects were not accurate in locating positions for the quantifier tous, an element whose location ought to be readily identifiable based on universal principles and the resetting of the relevant parameter. No clustering of the position of negation and position of quantifiers was observed, contrary to the predictions of the parameter-setting theory. Pishwa (1994) reports on a longitudinal study involving 15 Swedish L1 children in a German L2 immersion school. Pishwa’s main objective is in showing that her learners do not follow the putatively universal order given by Clahsen et al. (1983) for the acquisition of German. See below. As should be obvious by now, Swedish and German both exhibit “V2” effects, that is to say they manifest subject–verb inversion in topicalisations as well as questions. The Swedish subjects show inversion right from the start of their acquisition of German which makes their order of acquisition out of whack with that of other learners. Inversion occurs independently of the development of agreement marking, so it cannot be claimed, as P&P analyses of verb-raising would have it, that verb-raising is triggered by Agr-S. More interesting from the perspective of the head direction parameter, is that the position of the verb relative to its complements is acquired following a typology of complements. First, the infinitive is placed with respect to the complements that it directly assigns a semantic role too (the direct object, in other words). At recording (II) we find, therefore, data like the following in (4.18a, b) from a single subject Ulrika. (4.18c) is provided to show that the position of the infinitive in third position cannot be explained in terms of the form of the non-finite form. Ulrika develops the correct morphological forms slowly, as do all of Pishwa’s subjects. Many use the third person singular marker “-t” for all third person verbs whether they pick out singular or plural referents. (4.18) a. b. c.
Ich will der Stein haben. ‘I- want the-. stone have-’ Ich will fahre in Amerika. ‘I- want travel in America’ Und dann hat habe Mutter und Vater gehen in die Haus ‘And then have-3/ have mother and father go- in the- house’
Pishwa reports, although she doesn’t give examples, that at stage two other complements also appear to the right of the verb. She claims that goal PPs,
160
INPUT AND EVIDENCE
which she calls telic, are the next to be placed to the left of the non-finite verb, and only then do other PPs (locative, instrumental, etc.) appear to the left. In short, these Swedish learners do not treat sentences like (4.18a) as cues that German underlying word order is SOV. Moreover, if they are changing a parameter, it isn’t showing up in their production. The appearance of the ±tense distinction in their production is not the signal for a general restructuring of the grammar nor is their any apparent connection between agreement and word order. As Meisel (1991: 250) has pointed out, the ZISA study (Meisel et al. 1981; Clahsen et al. 1983) has provided some of the most robust data showing that learners from different L1 backgrounds pass through the same stages in acquiring and using surface word orders. Meisel also points out that these same sequences show up among both adults and children L2 learners of German (Pienemann 1981), and among both untutored and tutored learners (Pienemann 1987, 1989). See Ellis (1989) for a review of a range of studies of the acquisition of German which confirm the developmental order given. Meisel (1991, 1997) uses ZISA data to show the absence of the predicted correlations between parameters and grammatical restructuring. Meisel’s studies are all the more convincing in that he systematically compares the acquisition of verb order and finiteness and negation in adult L2 development to their L1 development including studies of the simultaneous acquisition of two languages. Meisel (1997), for example, examines the relationship between negation, verb placement and finiteness/tense/agreement. In primary language development, he argues, the learning task involves analysing the negator as the head of a NegP or as a maximal projection. In German primary language acquisition, nein can appear non-anaphorically and appears before the subject. nicht appears almost exclusively in final position. See Clahsen (1988a). Both of these positions have been described as clause “external.” Meisel (1997) argues that negative markers are at this stage attached to the VP which also contains the subject. All other aspects of the problem of locating negation with respect to the verb involves projecting the appropriate functional categories. He points out … child language data from different languages confirm the hypothesis that NEG is initially placed externally. Structural analyses support the claim that it is adjoined to the VP containing the subject. It may appear to the left or to the right of the VP, but one can observe a strong preference for the position favoured by the adult target language, relative to the verb. As soon as one finds evidence for a productive use of finite forms, NEG is generated in the head of NegP, it is raised together with the verb. In other words the option of analysing NEG as a functional head or as a maximal projection adjoined to VP does not appear to represent a problem for the child. (Meisel 1997: 239)
THE AUTONOMOUS INDUCTION MODEL
161
Negators appear within the string only when the child has mastered the finiteness distinction and verb raising. At this stage nicht always follows the finite verb and precedes non-finite forms. Meisel then proceeds to analyse data involving the acquisition of French and German as L2s. The French data comes from Noyau (1982) and Trévise and Noyau (1984) who grouped their subjects into three different groups depending on the preferred form of negative marking. Their groups I and II used both Verb+pas and ne+Verb+pas sequences. Group I used Verb+pas most often while Group II used ne+Verb+pas most often. Group III used ne+Verb+pas as well as ne+Verb. Meisel examined data from two Spanish learners of French and found no clear evidence for an early phase in which the negator is placed preverbally or externally. He also shows that internal negation first shows up in formulae and fixed phrases like sais pas ‘I don’t know’, which may provide learners with important cues to the order of negation. The German data come from the ZISA study and include cross-sectional and longitudinal data. Negation is used only with a small group of verbs suggesting that for these learners too memorisation of fixed phrases like weiss nicht or versteh nicht is important. Learners in the early stages of linear ordering (SVO/ Adv and Adv/Prep SVO) placed the negator before the verb. Learners in the later stages (Adv VP and V-end) placed the negator after the verb. The learners in the middle stages behaved in a mixed fashion with some using the order of the earlier phases, some the order of the later phases and yet others using both. Meisel points out that adult learners place the negator after the subject, externally to a VP which does not contain the subject. Most strikingly, the acquisition of the placement of negation occurs independently of the mastery of finiteness, which makes it quite different from L1 acquisition. Finally, Meisel points out there may not be any universal developmental order in the L2 acquisition of negation. Not all learners go through a stage where negation is placed externally. Clahsen et al. (1983: 148ff) characterise preverbal negation use as a learner strategy and not as a developmental stage. This study thus clearly shows that there are no clustering effects involving verbal position and the position of negation or adverb phrases such as those predicted by the parameter-setting theory. In particular, there is no raising of elements triggered by the necessity of getting inflectional features. Or, turning the problem around, learners do not assume that linear ordering indicates the need for movement from an underlying position to get inflection. We have mentioned the pro-drop or null subject parameter in several places. It has been claimed by many researchers in L1 acquisition to be triggered by the acquisition of tense, finiteness or agreement. Mechanisms vary in individual
162
INPUT AND EVIDENCE
studies and changes in linguistic theory have usually been followed by extensive reanalysis of child language data (cf. Meisel 1990a, 1994b, for discussion). Two claims, however, emerge. The first is that children’s language does not closely reflect properties of the stimuli when it comes to the use of subjects. Even in languages where subjects in finite sentences are obligatory and where we can assume, given the studies showing that language to children is largely wellformed, that children are hearing large numbers of finite sentences with subjects, nonetheless, they produce substantial numbers of sentences without overt subjects. The second claim (itself in dispute, as we have seen) is that children’s production of subjects begins to reflect the stimuli, that is they show obligatory use of subjects in finite clauses, only after they have acquired tense/finiteness/ agreement.23 There is no evidence for this type of association in adult L2 acquisition. Meisel (1991) demonstrates this using longitudinal data from the ZISA project whose adult learners acquire patterns of obligatory subject use independently of their acquisition of agreement. See also Meisel (1987) which presents ZISA data on tense marking, and Köpcke (1987) which analyses ZISA data with respect to person and number marking. Meisel (1991: 258) reports that mastery of verbal inflection is slow, highly variable among learners, not attained by certain learners, and does not proceed in a continuous fashion. Citing data from six different learners, Meisel shows that there are four different patterns of learning. Some learners do not learn much of the verbal morphology. Some are able to use markers from early on. Some make progress slowly. Others make progress rapidly. Moreover, citing Köpcke (1987), Meisel notes that adults do not use specific inflections according to a semantic classification of verbs, as children have been reported to do. In other words, these learners are not using aspect as an entrée into the morphology.24 What matters for the grammatical theory is the correlation of subject use and grammatical agreement between subject and verb. In L1 development, subjects become obligatory as inflection emerges. No such correlation is found in Meisel’s data. Some learners use subjects categorically, right at the beginning; they then start omitting them and finally supply them again in obligatory contexts. [Footnote omitted, SEC] Others omit subjects frequently during early phases, but the frequency of omission decreases over time. For both patterns, there are variants such that the number of uses or of omissions does not continue to increase after a while. These observations amount to saying that the emergence of subjects in the speech of L2 learners is a phenomenon totally independent of the development of agreement markings on the verb. (Meisel 1991: 264)
Andersen (1991) makes a similar claim. Hilles (1991) reports on cross-sectional
THE AUTONOMOUS INDUCTION MODEL
163
data with children, adolescents and adults, examining the acquisition of the same phenomenon. Correlations between use of inflectional suffixes and subjects were found for the children and one adolescent, but not for the rest of the subjects.25 Clearly, then, the parameter-setting theory is making the wrong predictions for the acquisition of obligatory subjects and agreement marking for adult learners. Adults are not using verbal morphology as a cue to the presence of subjects. The studies just reported on all involve the analysis of spontaneously produced production data. As a fallback position, one might argue that production data is not sufficient to show that adult learners do not “access” UG since they might have control problems. This would amount to claiming that the Principles and Parameters theory is simply irrelevant for the analysis of production data — an unhappy claim, but perhaps one that many might accept in preference to rejecting the theory completely. However, Clahsen and Hong (1995) have done the right kind of study to dispel any hopes that the predicted correlations will emerge between, e.g. functional categories and category raising, or between agreement and the null subject parameter, in studies of comprehension. Clahsen and Hong used a sentence-matching technique and measured the response times of subjects in making decisions as to whether a stimulus pair was identical or not. This technique relies on visual stimulus discrimination (subjects look at sentences on a computer screen) and on classification. But notice, that unlike acceptability judgement tasks, subjects do not have to classify a pair of stimuli according to some explicit or implicit standard. They merely have to decide if the pair looks the same. Research with native speakers has shown that response times to strings which can be chunked and analysed at a higher level (e.g. as words) are faster than stimuli which consist of arbitrary strings, and, moreover, that subjects respond faster to acceptable sentences than to unacceptable sentences. This is the idea that Clahsen and Hong exploited. Tests with native speaker controls showed that correct agreement and correct obligatory subjects pairs both independently led to decreases in response times. Tests with Korean L2 learners showed group-level variability. Thirteen of 33 subjects successfully discriminated both acceptable/unacceptable agreement pairs and subject pairs. Clahsen and Hong note that these subjects’ results are ambiguous between a parameter-resetting explanation and some other account. Among the 26 subjects who could discriminate the subject pairs, statistical tests showed no correlation between response times on those pairs and response times on correctly discriminated agreement sentences. The authors conclude that their subjects appear to have disassociated knowledge of these two subsystems of the L2 grammar. The conjunction of results from this experiment and from the production studies should, of course, be replicated. Nonetheless, one conclusion
164
INPUT AND EVIDENCE
seems to emerge: Adult learners do not manifest knowledge and control of the types of related sub-areas of the grammar predicted by the null subject parameter. It therefore looks as if the “right” theory of SLA ought to predict and explain these kinds of dissassociations. To sum up, a number of studies using a variety of analytical techiques and investigating several different parameters have failed to show that unrelated phenomena cluster during development change, as they appear to do in primary language acquisition. There is no reason therefore to believe that one needs more than the kind of clustering permitted by the Autonomous Induction Theory. Moreover, some of these studies also provide evidence that second language learning is slow, instance-based, or involves generalisations of a structural sort not illustrated in first language acquisition. In short, they provide evidence for i-learning. 3.7 Generation of new rules It was noted above that one of the problems with the original conceptualisation of induction as problem-solving was that the generation of novel hypotheses was treated as a random process. This led to an intractible problem, namely constraining the theory. There is no reason to preserve this feature of prior work on induction. Holland et al. (1986) argue, in the case of categorisation judgements, that the generation of hypotheses will be constrained by the individual’s theory or mental model of a cognitive domain. As noted, an MM is a complex semantic representation of some phenomenon. In the Autonomous Induction Theory I am espousing, MMs will be identified with complex conjunctions of conceptual structures. MMs will play an important role in the recognition and interpretation of verbal feedback and correction, as will be explained in Chapter 10. They will play less of a role within the autonomous representational systems precisely because they are autonomous. I-learning within the autonomous representational systems involves either the generalisation or specialisation of cues for categories, rule learning, etc. The Induction Theory countenances two forms of generalisation: Conditionsimplifying generalisation, and instance-based generalisation. Condition-simplifying generalisation involves the deletion of part of a rule’s condition. For example, some part of the conceptual structure disappears from the representation. Holland et al. cite the example of a complex semantic representation (understood here as a conceptual structure) consisting of several conjoined propositions being simplified by the elimination of a proposition. The relevant operator would map the more complex conceptual structure onto a conceptual structure minus the relevant proposition. See (4.19).
THE AUTONOMOUS INDUCTION MODEL
(4.19) a. b. c. d.
165
Input to the function = PAST [[JOHN GIVE ROSES TO MARY], AND PAST [MARY PUT THE ROSES IN A VASE] Output of the function = PAST [JOHN GIVE ROSES TO MARY] Input to the function = If /donat/i is an INSTANCE OF DATIVE VERBj AND /donat/i is an INSTANCE OF DOUBLE OBJECT VERBk Output of the function = /donat/i is an INSTANCE OF DATIVE VERBj
The examples in (4.19a, b) are meant to display a straightforward case of representing the “outside world” in terms of conceptual structure. In the interests of readability and to save space, I have recast the representation of conceptual structures (which should look like the representations in (4.9)) as a simple operator and proposition and omitted all irrelevant detail. The simplification created by condition-simplifying generalisation, as can readily be seen, involves deleting the representation of the second proposition. This might represent a situation where a person represents John as giving some specific roses to Mary at a specific time t′ (perhaps it is their anniversary), activates a default representation (she put them in a vase because she always puts them in vases). Now suppose that this person then sees the flowers at time t′ + 24 hours, lying on a kitchen counter in John and Mary’s home, shrivelled and dry. The conceptual structure in (4.19a) would license the inference that if Mary had put the flowers in a vase, they would be in a vase. Default assumptions related to a “theory” about the usual care of roses would include the proposition that the vase would contain water and these combined propositions would license the inference that the flowers would not be shrivelled up and dry. Since the perceptual input shows that they are, it is not compatible with (4.19a). The condition-simplifying operator is activated and applies and (4.19b) results. The examples in (4.19c, d) are exactly the same except that they are designed to represent metalinguistic information, in this case the information that the phonological form of the verb donate has two properties: it belongs to the class of verbs taking an NP and a PP complement (I have expressed this using the short form DATIVE VERB) and it belongs to the class of verbs taking NP NP complements (this is here called the DOUBLE OBJECT VERB category. Many learners of English come up with something like (4.19c) after learning some subset of verbs occurring in both frames. They overgeneralise the alternation to any verb of transfer of possession. If a learner had had a lesson on dative verbs, in which she is explicitly told that donate does not occur in the NP NP frame, she might represent donate in something like (4.19d) as a consequence of the application of the relevant function. A second type of simplification involves taking the intersection of the conditions of two or more similar rules. Those parts of the conditions which
166
INPUT AND EVIDENCE
differ are treated as details to be ignored (Holland et al 1989: 86). Since in the Induction Theory, categorisations are expressed in the condition–action empirical rules, the simplification of a categorisation could involve either the elimination of some property or feature from its representation (as per (4.19)) or the intersection of properties. For example, suppose that the learner is told on separate occasions that a VERB can occur in the frame ___ NP, ___ NP PP, and ___ NP S. The intersection of these frames is ___ NP. The prediction is that the learner would create a new rule, ignoring the differences and retaining the information that a VERB can precede an NP. A third type of simplification is the case of a TYPE whose features are reduced or simplified. In this case, condition-simplifying generalisation will be triggered by the joint activation of rules with exactly the same actions and almost the same conditions. Suppose that the learner has represented the information in (4.20). (4.20) a. b.
If [−N, +V, −3rd p.]- don’t precedes [−N, +V], adjoin it to VP. If [−N, +V, +3rd p.]- doesn’t precedes [−N, +V], adjoin it to VP.
Simplification would then eliminate the reference to the particular features in the condition leading to a generalisation over the dummy verb. Evidence for simplification can be found throughout the L1 developmental literature. Olguin and Tomasello (1993) have shown, under experimental conditions, that native speaking children between the ages of 22 and 25 months appear to follow a linguistic model for each verb they hear. That is to say, learning is instance-based, meaning that the very specific co-occurence properties of each verb in the stimuli are encoded. Over time, the learner generalises over classes of verbs, eliminating certain verb-specific encodings (e.g. the phonetic form of the verb, or manner-specific meaning encodings). See also Tomasello (1992, 2000). The theory of categorisation adopted here based on Jackendoff’s rules of categorisation (see (4.9) once again) is so constructed that it can account readily for this kind of instance-based learning. We can hypothesise that the learners encode the instances in longterm memory and that complex generalisations, e.g. agents are subjects, emerge as the simplification of the specific properties of each instance. We have already seen evidence for what might be instance-based learning in SLA in the discussion of the acquisition of verb placement and negation in Swedish, as well as in the acquisition of German word order. Instance-based learning is therefore common to both L1 and L2 learning, to learning by children and to learning by adults. Instance-based generalisation proceeds from examplars. Generalisation would involve, e.g., proceding from having multiple instance-
THE AUTONOMOUS INDUCTION MODEL
167
based representations to classifying a variety of stimuli as instances of the same verb. Here is another example. Suppose that the learner observes separately that donate (construed either as a class of phonetic representations or as a unit of a different representational system, say a phonological representation) can occur in the input donate the money to the museum. On the first occasion of observation, the verb and its complement are stored in memory. On the second occasion of observation, the verb and its complement are again stored in memory, say, e.g. as donate his inheritance to the World Wildlife Fund. Then an empirical rule can be constructed which generalises over the particular properties of the phonetic examplar and the lexical items in the syntactic frame. Thus, the learner can extract the frame __ NP PP by constructing an empirical rule. Now let us suppose that the learner hears give in the same complement structure, e.g. give money to the museum. This can be stored as is (unanalyed) in LTM. But exposure to more such exemplars will lead to the creation of a second instance-specific rule which is identical to the rule stated except for the phonological form of the verb itself. A new third rule can then be created which will cover a larger set of situations, namely both the contexts where donate occurs and the context where give occurs. In the case of both first and second language acquisition, generalisation and abstraction can continue until exemplars of the donate class are wrongly grouped as double object verbs (see Pinker 1989: 45–7 for complete discussion). What is odd is that children can recover from their overgeneralisations, adult second language learners of English usually do not (see Mazurkewich 1984a, b, 1985 for experimental studies involving double object verbs and ESL learners). This fact points to a significant difference between generalisation processes of L1 and L2 acquisition. What lies behind it is, at this point, anyone’s guess. When a category is missing from a rule or the wrong category has been activated, then the rule will be inadequate. New rules must be generated. The system must be able to function as if further refinement of its current rules is inappropriate. How does this happen? Old rules give rise to new rules by, e.g., recombining either the condition or the action part of the rules. In the Induction Theory, which assumes that representations have no structure, this occurs through a genetic operator called crossover. Crossover combines parts of two existing rules to form a hybrid rule which is added to the system and then competes with the source rules (Holland et al. 1989: 119). Holland et al. note that genetic algorithms function formally in the following ways: (a) pairs of classifiers are selected according to their strength, with stronger ones being preferred to weaker ones, (b) crossover applies to the pair exchanging a randomly selected segment from the pair, either in the condition, or in the action, or in both, (c) weakest rules are replaced by their “offspring.” In the Induction Theory, q-morphisms
168
INPUT AND EVIDENCE
must be “tagged” to name regions in a q-morphism. This creates fundamental difficulties arising from the abandonment of structural representations. The system has to identify regions of q-morphisms for recombination, something which is immediately available in representations consisting of units. Moreover, nothing in the characterisation of crossover will guarantee that q-morphisms relevant to a given problem will be rapidly identified — a schema related to the phonological analysis of a stimulus could conceivably crossover with a schema related to the semantic analysis of the same stimulus. If induction has to respect autonomous representational systems, however, this possibility is excluded. It follows that the “search space” will be restricted to the relevant system. Given this assumption, it actually becomes possible to imagine how restructuring could be constrained by shared triggering conditions. Recombination involves a randomly selected segment from the pair of algorithms, but this leaves open too many computationally possible choices. Limiting the choice for restructuring to a computationally small number is critical to the success of the model. This can be done by assuming that the selection process chooses structural representations which are different in terms of a single structural change to the rule activated at the time of a failed parse. 3.8 What initiates and ends i-learning? Jackendoff’s theory of Conceptual Semantics is concerned largely with absolute constraints on conceptual organisation. He has not attempted to model how categorisation takes place in real time. This is where the Autonomous Induction Theory is useful. It gives us a model of how this might work. The implementation of the rule creation and rule emendation mechanisms is constrained by triggering conditions that function to ensure that rules will be useful to the system. They are most useful to the system when they make no changes unless there is urgent need. Urgent need is defined as a failure to correspond to the representations arising from the perceptual representation of the situation and the current MM. Nothing happens, therefore unless existing representations, including default expectations which are part of the MM, are inconsistent with the representations being constructed or activated in working memory (Holland et al 1989: 22).26 Induction is therefore triggered by some input arising from the processing of a stimulus in the environment or from some other computation. Thus, the Autonomous Induction Theory provides an explanation of why learning begins, and this explanation ties i-learning to the environment in a rather obvious way. Induction is moreover only initiated by some failure of current representations to fit the active MM.27 Re-organisation of the system takes place automatically.
THE AUTONOMOUS INDUCTION MODEL
169
Reinterpreted to be consistent with the construct of representational autonomy, this means that i-learning operations are triggered by detectable errors, that is errors which are definable over linear sequences (Wexler and Culicover 1980; Wilkins 1993/1994). I return to this topic in Chapter 9. It follows therefore that learning stops, a steady state is attained, or fossilisation sets in — however one wishes to characterise it — when the organism fails to detect errors. It is important to understand that it is the learner’s analyses and representations of the L2 stimuli which matter for defining when learning stops.28 Detectable errors, and therefore learning, are defined on input to parsing mechanisms. Two points need to be stressed here: a. As already noted in Chapter 1, induction is dependent on the physical properties of the speech situation only indirectly. Unlike most learning models in psychology, including the Competition Model, I do not think that the learner’s mental representations reflect the environment in any strict way. Rather the MMs will structure the nature of experience, and the autonomous representational systems will define what can be represented, imposing organisation on stimuli. Consequently, by definition, the stimuli will never match representations (except at the perceptual level of analysis). Therefore, the definition of detectable errors cannot be defined in terms of an identity between stimuli and mental representations. Rather, detectable errors must be defined in terms of a “closeness of fit” between input to parsers (analyses of stimuli) and currently activated representations. b. Detectable errors are relevant for representations deployed by the parsing system and not the production system. It may well be the case that the learner continues to make speech and writing errors although she “knows” that they are errors. Similarly, a learner may continue to have a foreign accent, although he can recognise his own production as non-native-like (Flege 1992).29 This is the important element of control discussed in Chapter 1. Control is a factor in production; it is not a factor in perception where processing appears to occur automatically. To sum up, if the learner can readily parse and interpret speech signals (in auditory mode, in visual mode in the case of sign languages, or in graphic mode), no detectable errors will be registered by the parsing system although the learner may still be making systematic errors in production.30 Thus, the Autonomous Induction Theory provides an explanation of why learning stops. Moreover, it can explain in principle why learning stops even though the physical properties of the learner’s productions are distinct from the stimuli produced by native speakers and perceived by the learner to be distinct. The construction of any mental model will lead to the emergence of a variety of
170
INPUT AND EVIDENCE
“expected outcomes.” What is an “expected outcome”, given a current MM? An expected outcome is any state of affairs which is consistent with the MM currently activated. Consistency is a less stringent requirement than the identity function required by theories using matching of stimuli and representations. The Induction Theory uses quasi- or q-morphisms rather than isomorphisms to define a fit so it can be had without identity being satisfied; there is a fit on some relevant subset, what counts as “relevant” to be determined by a theory of perception, a theory of grammar and parsing, and a theory of mental models for particular domains. Note that the expected outcome need not actually be explicitly represented in the MM, and often will not be, but it does not contradict anything in the MM or derivable from the MM by logical inference.31 “Unexpected outcomes” will be those which are in contradiction with currently specified representations in the activated MM. Unexpected outcomes therefore are explicitly computed and they trigger induction. It is in this way that induction is always tied to the learning situation, understood in the broadest sense: Induction is a process which permits the organism to eliminate inconsistencies and contradictions between a mental model and current representations of stimuli, or representations of stimuli in conjunction with stored representations of information. The notion of “unexpected outcome” will require some modification, however, when applied to the autonomous representational systems. In particular, we do not want to assume that the learning mechanism is computing a large number of possible representations of, e.g. a string, including representations of what is ungrammatical. Representations which are ungrammatical are precisely those which are not computed by learning mechanisms. Consequently, “expected outcome” has to be defined by the constraints imposed on the operations of the grammar, and these, in turn, must be defined by UG. (4.21) Definition of induction: Induction is a process which leads to the revision of representations so that they are consistent with information currently represented in working memory. Its defining property is that it is rooted in stimuli made available to the organism through the perceptual systems, coupled with input from LTM and current computations. I-learning is, however, different from mechanistic responses to environmental change in that the results of i-learning depend upon the contents of symbolic representations.
I have differentiated i-learning from the strictly corporeal responses of our own bodies and simpler organisms to stimuli. I-learning is crucially dependent upon the content of the representations being computed by the learning system. At the level of phonetic learning, this means that i-learning will depend upon the
THE AUTONOMOUS INDUCTION MODEL
171
content of phonetic representations, to be defined in terms of acoustic properties. At the level of phonological representation, i-learning will depend on the content of phonological representations, to be defined in terms of prosodic categories, and featural specification of segments. At the level of morphosyntactic learning, i-learning will depend upon the content of morphosyntactic representations. And so on. I-learning is also different from deduction in that the informational content of the premisses which form part of, e.g. a deductive syllogism, is irrelevant to the truth of the conclusion. If we consider a classic syllogism of the form “Socrates is a man. All men are mortal. Therefore, Socrates is a man”, the choice of predicate IS A MAN and the choice of the referent SOCRATES are actually irrelevant to the truth of the conclusion. It is the form of the propositions which matters, which is why we can substitute a variety of names and predicates and still produce a correct conclusion. Thus “Jacques is a hairdresser. All hairdressers are mortal. Therefore, Jacques is mortal”, is a variant and both deductions are simply particular instantiations of the deductive schema: “x is a Y. All Y’s are mortal. Therefore x is mortal.” Induction, in contrast, requires the addition of new information to the inferencing process. Given the inferencing rules discussed in (4.9) above, we can actually predict what the new information will consist of: (a) innate conceptual primitives organised into new TOKENS, e.g. I PERCEIVE A NEW THING (EVENT, PATH, ACTION, DIRECTION, etc.), (b) new TOKENS will be categorised according to known TYPES, e.g. THIS THING IS A THAT, and, (c) new TYPES will be created via the addition of new features to those already defining a TYPE, e.g. THIS CLASS IS ALSO THAT CLASS. (d) TYPES will be located in linear and hierarchical representations at a given level of representation. When it is stated that induction involves minimal changes to existing activated representations, this is what is intended. Precisely these sorts of minimal changes will be permitted, and we can further constrain the theory be hypothesising that there is a numerical restriction as well — namely, one change at a time is permitted. Often this new information will be the consequence of processing some stimuli so that the new information will be the output of perceptual processing. However, i-learning can also arise as the result of activating stored information in LTM. What information gets stored in LTM? Clearly lexical entries and their contents get stored. We may also assume that there are recognition templates in LTM, by which I mean pre-compiled representations of stimuli which are frequently heard. Formula, names, idioms, telephone numbers, and so on, come to mind. We must also assume that recognition templates exist for more abstract representations at each of the levels of autonomous representation, syllable templates, morphological templates etc. English-speakers must hear hundreds of
172
INPUT AND EVIDENCE
NP V NP sentences in a day. It is hardly parsimonious to assume that a novel procedure must be generated each time to interpret them. Conceptual structures are also stored both individually and in complexes (mental models). Finally I will assume that there are various sorts of production schemata which get stored in LTM, including syllables (Levelt 1996) and syntactic templates. One can make other arguments on behalf of the theory. Holland et al. (1989: 16) resort to condition–action rules in part because they are “modular.” By this, they mean that the action of a given rule does not depend on the actions of preceding or following rules. Condition–action rules operate completely independently of the other rules of the system, except insofar as they share the same contents. There is no need to extrinsicially order them. As is well known from discussions of learnability, excluding such ordering statements considerably reduces the power of the theory, and constrains it in an important way. Condition–action rules can be assumed to be able to fire in parallel, an obviously desirable result for a theory of cognition where one of the central explanatory problems is to account for the speed at which cognitive processing occurs. There is no reason to assume that competition among rules will take vast quantities of time, although only formalisation will show how long the system operates in real time. How long i-learning actually takes for a given linguistic phenomenon for a human learner is an empirical issue as well. Thirdly, condition–action rules are efficacious. Fourthly, they are suitable for building mental models, these more complex cognitive structures which constitute our theories about the world. And they are suitable to a prediction-based evaluation of the knowledge store. Fifthly, rules can operate in clusters, as was seen above. Clustering will arise if the action or output of one rule constitutes the condition or input of another, whose output in turn is the input to yet a third rule. Rules which are frequently activated together become associated. Therefore, the activation of a single rule can lead to the deployment of many others. This also has consequences for the speed of computation. Sixthly, the various categorisation rules associated with a thing are organised into default hierarchies of assumptions based on subordinate or superordinate relations among concepts. Default hierarchies are a way of instantiating typicality conditions on categories, so they can be used to formalise preference rules. Holland et al. (1989: 20) emphasise that variability and uniformity must be part of the categorisation process and that default hierarchies are an efficient way to instantiate this. This means that the problem of the so-called “brittleness” of many classification models — forcing X to be noun, or not a noun, a grammatical sentence or an ungrammatical sentence — can be avoided.32 As is well known, this brittleness problem has been one of the major arguments against classical computational models and for connectionist models — including,
THE AUTONOMOUS INDUCTION MODEL
173
of course, the Competition Model. Seventh, we have an account of the dynamic nature of learning. The system is able to identify and strengthen rules which lead to the achievement of relevant end states. Or, alternatively, it weakens or eliminates rules which fail to modify representations in ways consistent with new information. In particular, revisions to the system are not random. In the Induction Theory, apportionment of credit (or blame) takes advantage of the fact that there are linked or “coupled” rules; the action of one rule serving as the condition of some other rule. Holland et al. have formulated an algorithm, called the Bucket Brigade Algorithm, which changes an assigned numerical value (used to measure rule strength) of a rule by passing on a portion of the value from some other activated rule, or giving a portion of its value to some other rule which is to be activated. Refinements to rules are limited to repairing faulty rules currently activated in the system. In the case of a long rule sequence, the apportionment of blame works locally back from the last rule through the sequence of rules. The constraint and the workings of the Bucket Brigade Algorithm guarantee that a problem will be found “locally” in the system. The theory thus provides a degree of clarity to the nature of the operations which no other SLA theory can lay claim to. Finally, I would like to add that the Induction Theory has been computationally rendered. This means that the properties of condition–action rules and genetic operators are reasonably well understood. Holland et al. have demonstrated for the Induction Theory certain desirable results, namely transfer of learning from problem to problem and order-of-magnitude speed-ups in learning (Holland and Reitman 1978). This means that the Induction Theory has a chance of competing on these terms with the Competition Model and other non-symbolic theories of linguistic cognition. It must be granted that having proposed substantial changes to the definitions and internal representations, it behooves me to demonstrate formally that the same results hold of the Autonomous Induction Theory. Since I have not yet operationalised my ideas even for a simple linguistic i-learning problem, I cannot guarantee that they will. At the moment, I see no difficulty in principle but formalisation raises many problems whose seriousness should not be underestimated.
4.
Summary of the Autonomous Induction Theory
The purpose of this chapter has been to introduce the Autonomous Induction Theory, and to distinguish it where necessary from its progenitor the (nonautonomous) Induction Theory. The Induction Theory takes as its basic formalism
174
INPUT AND EVIDENCE
the condition–action rule, which is used to characterise a broad range of processes. I have restricted its use to the characterisation of representations assumed to be computed on-line in working memory or stored in longterm memory. I have characterised the mental models the Induction Theory adopts as its basis for inductive reasoning in terms of Jackendoff’s conceptual structures, with all of the semantic structure which such a move entails. I have presented some of the constraints of the theory, proposing that i-learning will be initiated by detectable errors, that the learning mechanisms will make a single change to a representation at the point of error detection, that q-morphisms can account formally for the classification equivalence observed so often among L2 learners, that the theory can account both for instance-based learning and generalisation over instances, that competition among rules (which may differ minimally from one another) will ensure that robust solutions win out, that the theory encodes a general version of the Elsewhere Condition and can also account for saliency. I have illustrated the theory by focusing on language learning problems which can be assumed to involve conceptual structures and inductive reasoning, such as what might be learned in a language class or during an experiment. I have insisted, however, that the theory is of greater interest, once we recognise that instance-based learning, generalisation and retreat from over-generalisation are metaprocesses which are expressed in slightly different ways within the autonomous representational formats of the various domains of cognition (including the autonomous systems of the grammar). I can now return to a statement made earlier: learning grammatical categories is not concept learning. We cannot apply without emendation the constructs of theories of induction designed to explain problem-solving and concept learning to the problem of language learning. We acquire instances and categories in learning the grammar of a language but these instances and categories are encoded in the autonomous representational systems of linguistic cognition. Concepts of phones, syllables, or verb classes are another kettle of fish. How the Autonomous Induction Theory deals with induction within these systems, and in particular, how i-learning can be appropriately constrained to prevent the creation of “rogue grammars”, will be addressed in more detail in Chapter 5.
Notes 1. All citations are from the 1989 paperback edition. 2. The Induction Theory has been formalised, based on prior work by Holland et al. on classifier systems. I will forego presentation of most of the computational apparatus. Interested readers should consult the source.
THE AUTONOMOUS INDUCTION MODEL
175
3. One of the nicer observations from my perspective comes from the true statement that it is meaningful to ask bilinguals which language they dream or think in. Jackendoff points out that if the conceptual system alone were responsible for awareness, this would not be possible since, by definition, it is language-neutral. On the basis of the nature of the so-called “tip-of-thetongue” experience where one is trying to say something but cannot remember the word, he argues that conceptual representations are excluded from awareness and that only phonological representations are relevent (Jackendoff 1987: 290). 4. Note that I am not suggesting that the NS will be aware of the particular type of error a learner has made. For the naive native speaker, the experience is often just that an error is erroneous: “We don’t say it like that!” It can take a great deal of metalinguistic training, experience with non-native speakers and reflection to develop a theory of the types of errors learners make and their causes. See the discussion below of mental models or “theories” in this technical sense. 5. sink, sank, sunk and drink, drank, drunk but think, thought, thought and not think, *thank, *thunk. 6. In order to forestall cranky complaints from all those Canadians whose breakfasts do include Mett, salami, or rice (not to forget whale blubber), let me say that I base my claim on what one can expect to find on the culturally “unmarked” breakfast menus of hotels, restaurants or B&Bs. 7. Markman (1989) discusses a number of constraints on concept formation in children involving the correspondence between categories and language, e.g. the Taxonomic Constraint, which says that we preferentially understand relations among objects in such a way that they are grouped into taxonomies rather than according to how they participate in events (Markman and Hutchison 1984). A second constraint is the Whole Object Constraint, which states that we expect words to refer to whole objects and not to parts of objects. Children are more likely to attend to whole objects rather than to features of objects. They also are more likely to interpret words preferentially as objects rather than as actions or events (Woodward 1993). Markman (1989) also proposes a constraint called Mutual Exclusivity, according to which children assume that category terms are mutually exclusive, i.e. , ’ . Finally, she observes that natural kinds support rich inferences. Artefacts support fewer inferences. I know of no research which replicates Markman’s research with adult L2 learners but a number of studies ask adults to manipulate materials involving novel concepts (usually artefacts). I do not know to what extent the extant literature shows that category formation among adults learning an L2 exhibits Markman’s constraints. 8. Specifying what counts as “enough” properties to be included as even a marginal member of a class of entities is one of the central problems of a theory of categorisation. Minimally, would have to be a concrete object or , an individual with certain perceptual attributes (visual in the case of sighted persons, haptic or olfactory in the case of the blind). Current research suggests that even simple classifications like putting in the class of cats involves the development of a complex theory of animal life. See Keil (1979, 1989), Smith and Medin (1981), Carey (1978, 1985), and Markman (1989) once again for relevant discussion and further references from the philosophical literature. 9. It should be mentioned here that conceptual development and conceptual organisation in adults exhibits a certain autonomy. As previously mentioned, Keil’s work (see note 8) reveals that conceptual development does not show “across-the-board” effects but rather that one domain can be reorganised in a particular way before another conceptual domain is. This means autonomy cannot be reduced to modularity. Both constructs are required for an adequate theory of the mind. See Karmiloff-Smith (1992) for further discussion of the differences between autonomy and modularity and for empirical evidence that greater differentiation in the “language of thought” is required to adequately explain conceptual organisation and conceptual development.
176
INPUT AND EVIDENCE
10. This use of the term production from artificial intelligence should not be confused with the psycholinguistic construct of speech production. 11. These operating principles are not to be confused with Slobin’s (1972, 1977) operating principles, discussed in Chapter 1. 12. This is one of many statistical rules which Holland et al. (1989: 44) claim people induce from the observation of randomising behaviour or through explicit instruction. 13. This is referred to as a Regulation Schema in the original, a label which disguises the true nature of the constraint. The constraints on rule formation expressed in (4.12d) are deductive in nature. 14. There is no absolute constraint against this, indeed, when we are thoroughly convinced in the truth of some proposition, e.g. that cats don’t talk back when we talk to them, hearing Puss casually remark that we’re putting on weight might very well lead to the inference that we’re having serious mental problems. 15. I say “can” here and not “must” since we can and do use the relevant cues for identifying certain Indian accents of English. 16. Nisbett writes (Nisbett 1993: 5) … people can operate with very abstract rules indeed, and (…) the techniques by which they learn them can be very abstract. Abstract improvements to the preexisting intuitive rule system are passed along to the full range of content domains where the rules are applicable, and improvements in a given domain are sufficiently abstracted so they can be applied immediately to a very different content domain. 17. This fits in with the Holland et al. characterisation of concepts, which they view as sets of probabilistic assumptions about the features of a thing, and the consequences of specific sets of antecedents (Holland et al. 1989: 17). 18. Analytic implication is defined so: A set of assumptions {P} analytically implies an assumption Q if and only if Q is one of the final theses in a deduction in which the initial theses are {P}, and in which only analytic rules have applied.” Synthetic implication is defined so: A set of assumptions {P} synthetically implies an assumption Q if and only if Q is one of the final theses in a deduction in which the initial theses are {P}, and Q is not an analytic implication of {P} (Sperber and Wilson 1986: 104). Notice that the definition of synthetic implication does not state that an inductive rule has applied. However, if Q is some sort of information from outside of the deductive system, as the definition requires, then it follows that some sort of inductive rule must have applied to introduce Q into the inferencing representation. 19. The representations between slash marks should be understood as a shortform for a phonological representation, whatever the appropriate format is. I do not assume that learners encode linear strings of phonemes in their lexical representations. 20. Nothing of importance hinges on my choice of INFL as the specific functional category to which the verb raises; I could just as easily have chosen AgrO, Tense, Finiteness or any of the other proposed functional categories in the syntactic literature.
THE AUTONOMOUS INDUCTION MODEL
177
21. The variable in question involves sentences like those in (i). (i)
a. b. c. d. e.
Les garçons regardent tous la télévision le vendredi. ‘The boys watch all the television the Friday’ *Les garçons tous regardent la télévision le vendredi. ‘The boys all watch the television the Friday’ Les garçons ont tous regardé la télévision le vendredi ‘The boys have all watched the television the Friday’ To all own cars is the boys’ ambition. *To own all cars is the boy’s ambition.
22. It is not clear to me what the underlying position for this adverb should be in the current framework. If the specifier position is on the left, then raising of the verb must occur. On the other hand, if adverbs can be freely generated on the right of heads, this obviates the claim that adverb position can be a cue to underlying order. 23. Notice that with respect to the Competition Model, this link is completely unexplained. There is no more reason to link the presence or absence of the subject to tense or finiteness of the verb than to the verbal feature itself. 24. See Andersen (1991) who argues that aspect does play a critical role in the emergence of verbal morphology, and Andersen (1993) for a revised view that the frequency of forms in the input plays the deciding role. 25. Lakshmanan (1991: 390) and Clahsen and Hong (1995: 66) both point out that Hilles did not control for possible transfer from the L1 so that no conclusions can be drawn from her data about whether adult L2 learners were or were not resetting parameters. 26. This property of the Induction Theory is therefore like the “triggering” of the learning functions in the P&P-based model of syntactic learning of Berwick (1985). It will be recalled that in that model syntactic learning, largely reduced to parameter-setting, is driven by parsing failure. Whenever the current parser cannot analyse an input, and a parse fails, the learning mechanism goes into action to alter the current grammar, which in turn is used by the parser to parse. At some point, of course, the grammar must be “fixed” and parsing failures from that point on cannot lead to further revisions to the grammar. What ends parameter setting and grammatical learning in the P&P model remains a point of considerable debate. The same problem, needless to say, exists in L2 P&P models, where it has not been acknowledged, let alone addressed. 27. Bowerman has amassed considerable data related to longterm lexical reorganisations in L1 acquisition, see Bowerman (1978, 1981a, b, 1982a, b). Both she and Karmiloff-Smith (1985, 1992) have noted that models of L1 acquisition which are restricted to on-line grammatical reorganisations are necessarily inadequate because they simply have no explanation of why children’s grammars reorganise although the child can clearly represent and comprehend the input and has no problem communicating itself either. A similar problem arises in adult L2 acquisition. de Bot, et al. (1995) observe that there appears to be a radical restructuring of the adult’s lexicon somewhere in-between intermediate and near native-like stages of competence. Intermediate level learners exhibit lexical influences across languages such that hearing or seeing a form in one language will activate the entry for a cognate, near-cognate or translation equivalent in the other. In the case of near-natives, they suggest that inhibitory effects are seen such that no such cross-linguistic effects will be observed. Spada and Lightbown (1993) studied francophone children’s acquisition of English questions and observed that accurate use rates (as measured on an oral communication task) increased months after the instruction period. What is especially striking in this case is that improvement cannot be attributed to the fact that the
178
INPUT AND EVIDENCE children, having learnt questions, were attending more to questions in the input because they were not getting any. These subjects were in an intensive English course which lasted for 5 months. When it ended, they returned to an intensive French program in which they received no English instruction at all. Spada and Lightbown were able to show that these children had almost no contact with English or English-speakers outside of school. This too suggests that representations may continue to be reorganised offline. The Autonomous Induction Theory has nothing to say about either sets of observations. A complete theory of the languaging mind would, of course, have to account for such off-line reorganisations. It goes without saying that my discussion here assumes that the mind/brain is intact and functioning normally. This theory would treat as pathology any instance where the organism can detect differences between input and current representations but cannot restructure them. Such pathologies are well-attested (see Eysenck and Keane 1995: Ch. 7).
28. Much, too much, has been made about the fact that adult second language learners almost never attain perfect knowledge of the L2. Although these discussions are invariably embedded in papers dealing with UG in SLA, the authors almost never mean that L2 learners do not display perfect knowledge of universals. They mean that any group of L2 speaker/hearers will display different intuitions about any subset of L2 data than native speakers of that L2. This is a mug’s game. (i) It ignores the considerable variation among native speakers about other subsets of L2 data, variation which shows that native speakers are not teleologically driven to some target L2. They learn whatever they learn, and we, as observers, define that as “the language.” (ii) It shows remarkable naiveté about the linguistic lives of L2 learners, most of whom are not and never will be fully linguistically integrated into the societies they have joined. Those who are, achieve knowledge of the L2 and perceptual and production skills which can be accurately characterised as “native-like”, meaning not significantly different from the variability in knowledge and skills exhibited by native speaker monolinguals (Ioup et al 1994). 29. On the nature of the “foreign accent” and its detection, see also Flege (1981, 1984), Flege et al. (1998) and references therein.. 30. I am claiming, thus, that we do not compute the syntactic structure of every sentence we produce. Frequently used patterns will be stored and simply “filled in” with relevant lexical categories. As support for this view is (a) the great speed of speech, and (b) cross-linguistic slips-of-the-tongue. After an intensive period of speaking German, which is verb-final, I often find myself producing verb-final sentences when I speak French. During the period when I was actively practising getting German word order right, I even found myself making mistakes of this sort in English, suggesting that the syntactic pattern was stored, activated, and difficult to suppress. On the relevance of the construct of suppression for theories of bilingual behaviour, see Green (1986). 31. Expected outcomes could be measured in terms of explicit verbal statements, for example, “I was sure that the Maple Leafs would be in the finals”, but they could also be measured in terms of the organism’s surprise at some event happening. This assumption lays behind some of the research techniques in studies investigating infant cognition which will be reviewed in the next chapter. Surprise is normally understood in phenomenological terms, but could be used to explain why I picked my players from a given team in playing the hockey pool. The expected outcomes might also be simply that the organism’s current representations of the environment are accurate ones and make correct predictions about future events. 32. I must emphasise once again that there are other formal ways to avoid this problem. Jackendoff (1983), for example, proposes a typology of distinct conditions on word meanings — necessary conditions, graded or centrality conditions, and typicality conditions which he derives from domain-specific applications of preference rules.
C 5 Constraints on i-learning
1.
Form extraction, distributional analysis, and categorisation
Since there are as yet no detailed studies of i-learning using the Autonomous Induction Theory, it is impossible to discuss the specific nature of encodings of a given phenomenon, such as an instance-based versus a rule-based encoding of French gender, or an instance-based versus a rule-based encoding of the English double object construction. Most discussion of learning in SLA has involved discussion of metaprocesses but without focusing on the specific constraints observed during development or searching for evidence for their absence. Such research will hopefully follow in the future when the Autonomous Induction Theory is subjected to rigorous verification. In this chapter, I would nonetheless like to discuss in more detail form extraction, distributional analysis and categorisation within the autonomous representational systems so that the differences between that kind of i-learning and that which occurs on the basis of information encoded in conceptual structures and inductive reasoning is clear. This will lead to a discussion of the constraints on i-learning which occur within the Autonomous Induction Theory. 1.1 Prosodic bootstrapping and form extraction? Studies of the initial state of acquisition must begin with the question of what learners first encode. It stands to reason that the first problem the learner faces is segmenting the speech stream and extracting lexical items. As was noted in Chapter 1, how this happens is a fascinating problem which has yet to be given a serious treatment in the SLA literature. The most we can say is that crosslanguage lexical identifications are possible (the cognate recognition phenomenon, Carroll 1992), as are other possible types of cross-linguistic influence. There is good reason to suspect that prosodic properties of items will facilitate or hinder their extraction from the speech stream and therefore their encoding in
180
INPUT AND EVIDENCE
longterm memory (Carroll 1997). Functional items in many languages (certainly this is true of English, French, German, Spanish and Italian) tend to be unstressed, are not the locus for shifts in the fundamental frequency, tend not to bear the tones of intonation contours, are not separated from exponants of the major lexical categories by pauses, and so on. They are, in short, prosodically cliticised onto other lexical items which can be realised as prosodic words. We can therefore predict that the first items to be extracted from the speech continuum and learned will be prosodic words, and that in many languages prosodic words will instantiate tokens of the major lexical classes noun, verb, and adjective. It follows that the first morphosyntactic words learners learn ought to be members of these word classes. This prediction is consistent with widespread observations of L2 learners of languages like Dutch, English, French, German, and Swedish (Klein and Dittmar 1979; Clahsen et al. 1983; Klein and Perdue 1992). It is also consistent with observations from the same literature that these learners fail to express functional categories in the earliest stages of their L2 production. But this merely makes more pressing the question: What properties of the stimulus are learners relying on to fix the left and right edges of extractable units? If we assume that the learner has no relevant phonological knowledge at the initial stage to be able to encode phonological constructs like stressed syllable, then we must assume that learners are extracting prosodic words on the basis of cues from a lower level of acoustic representation and mapping these onto phonological units, which in turn are mapped onto morphosyntactic and semantic units. Suppose that learners are relying on given types of acoustic cues to locate prosodic words. Suppose, in particular, that they are relying on shifts in fundamental frequency, then it ought to be the case that the first words will be precisely those prosodic units which can appear at the beginning or end of an intonational phrase since these are precisely the places where the fundamental frequency falls, rises, or is reset (Pierrehumbert 1980; Ladd 1986, 1996; Cutler, Dahan and van Danselaar 1997).1 But for this prediction to be true, it must be the case that learners are driven by an a priori tendency to locate such shifts of fundamental frequency on abstract phonological units such as prosodic words, which in turn they are driven to analyse in terms of sequences of syllables and segments. This prediction ought to strike a bell with anyone familiar with the first language acquisition literature. On the one hand, it makes an explicit link between the stimuli that the learner gets and what the learner learns. As is well known, there is a substantial literature examining the properties of caretaker speech. But while a certain literature reveals that some groups of mothers use intonational contours to get and hold the attention of infants (Sachs 1977; Stern, Spieker, and MacKain 1982; Stern, Spieker, Barnett, and MacKain 1983), I know
CONSTRAINTS ON I-LEARNING
181
of no study examining the location of shifts in fundamental frequency and the extraction of first prosodic words.2 On the other hand, my hypothesis is very similar in spirit to one of Slobin’s (1971, 1973, 1985a) claims. It was noted in Chapter 1, that Slobin proposed a number of operating principles to account for how infants and young children discover the grammatical systems of their language. Operating Principle A was “Pay attention to the ends of words.” Slobin wanted to explain with Principle A the relative order of emergence of suffixes before prefixes in child language. Operating Principle A does not explain why children should pay attention to one end or the other of a word. Gleitman and Wanner (1982) proposed in the context of the Prosodic Bootstrapping Hypothesis (see Chapter 2), that children attend to and represent stress and map from stress domains to lexical items. Thus, it was hypothesised that there is an interaction between the representation of lexical items (stored in LTM) and a more general processing strategy which is that learners “Pay attention to stressed words” (Peters 1983; Ingram 1989: 68). The basic idea is that first language learners can attend to domains of stress in the stimuli, can extract those domains, and encode them in their mental lexicons. In many languages, stress happens to fall on the first or final syllable of a word (Dresher and Kaye 1990; Kenstowicz 1994: Ch. 10). In such languages, stress offers an ideal cue to the left and right edges of a prosodic word. Even in languages where stress location must be identified more abstractly, e.g. in terms of phonological constructs such as extrametricality, bounded feet, and so on, considerable evidence suggests that young children make use of it (Waterson 1981; Chiat 1983; Echols 1988, 1993; Echols and Newport 1992). The question, of course, arises: Since stress itself is defined in terms of syllable prominence within a given prosodic domain, how does the learner locate stressed syllables? The Gleitman and Wanner/Peters/Ingram reinterpretation is really not a complete solution to the problem raised by Slobin. Rather it requires that stress be reinterpreted in terms of acoustic cues. In other words, learners cannot come to the task of extracting word units from the speech continuum with a priori knowledge of what a stressed syllable is, since that knowledge itself emerges from learning. This observation is rendered more trenchant by research which shows that speech to young children is remarkably unintelligible from the point of view of word recognition (Gurman Bard and Anderson 1983), indeed, more unintelligible than speech to adults, a fact which in turn appears to be relatable to the prevalence of utterances expressing old information and the use of highly frequent vocabulary in conversations with children. It would appear that old information leads to a tendency to articulate syllables and segments less clearly. It must be the case therefore that the solution to the initial word extraction problem is to be found in a priori mapping strategies
182
INPUT AND EVIDENCE
linking the phonological and phonetic levels of representation, and the nature of acoustic cues to word boundaries. Shifts in fundamental frequency, of course, are one possible acoustic cue to prosodic domains. Pre-final boundary lengthening and the cessation of phonation (pause) are other cues. In FLA, the developmental logic of the correspondence between the phonetics and the phonology, and subsequently the phonology and the morphosyntax, may actually be fairly straightforward, at least for learners of certain kinds of languages. Consider English. Much of the young infant’s input once it has reached the babbling or proto-word stage, may well consist of one and two syllable words (Snow 1977). In such words, stress has to fall on one end of the word or the other. Now suppose that infants first extract “words” on the basis of acoustic-phonetic cues (whatever they might prove to be) and then they subsequently re-represent the acoustic representations as phonological structures, including some representation of syllabic prominence. It might appear that the child is relying on an extraction strategy based on stressed syllables because these overlap with the acoustic cues that the child is relying on. However, it might be more appropriate to state that the child is driven to encode stress as one of the basic properties of the phonological level of representation, i.e. that the child is in fact extremely sensitive to acoustic cues which can ultimately be represented phonologically as stress. That these prosodic units are then mapped onto major lexical categories must be derived from another a priori correspondence strategy mapping prosodic words onto morphosyntactic words. That children then procede to map specific semantic or morphosyntactic functions onto suffixes must also be explained by a priori correspondence strategies, which involve mapping specific concepts onto specific types of morphosyntactic categories.3 Aksu-Koç and Slobin (1985) provide considerable evidence for this correspondence strategy by showing that children acquire the inflectional suffixes of Turkish very early on, i.e. before 2 years of age. They demonstrate that Turkish presents a case for simple correspondences between unique forms, unreduced and putatively salient syllables, and unique functions in the morphosyntactic system. One can conclude that, given a predisposition to map unique functions onto unique forms, and a language environment which obliges by providing one-to-one correspondences, children are good at discovering and learning them.4 And what of adult L2 learners? As far as I have been able to determine, no research has examined the nature of the acoustic or phonological cues to word extraction at the initial stage of L2 acquisition. We therefore simply do not know how learners are doing it. There is one interesting observation, however, to emerge from experiments examining the learning of artificial languages. Over the
CONSTRAINTS ON I-LEARNING
183
last 30 years there have been a number of interesting studies involving the implicit acquisition of artificial languages by adults and which have focused on distributional analysis. See below. This research shares a number of properties with the acquisition of a second language, although many of the studies were undertaken to elucidate questions of primary language acquisition. Valian and Levitt (1996) asked if adult learners would rely more on prosodic cues to syntactic units than on same-level morphosyntactic distributional cues, in an effort to further investigate the hypothesis of Morgan (1986) that prosody guides the learner to morphosyntactic segmentation, thus reducing the amount of distributional learning which must occur. Their findings, however, suggest that learners rely on prosodic cues only in the absence of other types of cues, namely distributional cues or semantic cues. This suggests that adults may not be sensitive at all to prosodic cues to morphosyntax, or be unable to compute prosodic cues in the presence of other types of cues. The results of Carroll (1999a) also point in this direction. Obviously this is one area of SLA research where much more needs to be done. 1.2 Distributional analysis Studies by Braine (1963, 1966, 1987) and by Smith (1966, 1969) show that adults are good at learning distributional properties of “words.” In particular, they can learn the position of words with respect to “phrase boundaries.” This means that learners are sensitive to the first position or last position of some constituent. They can also learn the positions of nonce words with respect to an arbitrary marker (Morgan, Meier and Newport 1987; Valian and Coulson 1988; Valian and Lewitt 1996), suggesting that they can learn to identify functional categories and their complements on the basis of a purely distributional analysis. In short, it has been demonstrated experimentally that adults are sensitive to the distribution of forms in linearly ordered strings, and can readily locate particular forms relative to fixed positions in a string or relative to other forms in a string. These two observations are related: functional categories typically mark either the left edge of a phrasal constituent or the right edge. Functional categories are therefore particularly useful for parsing mechanisms. Since we are assuming here that parsing mechanisms encode grammatical distinctions which are in part acquired, this means that language learning mechanisms must be sensitive to functional categories. This ability can be readily explained in terms of a general sensitivity to contingencies in input (Kelly and Martin 1994), but only once it is granted that the learning mechanisms can encode what a functional category is. That ability arises as a consequence of the principles of Universal Grammar.
184
INPUT AND EVIDENCE
1.3 Categorisation Much of i-learning involves categorisation or re-categorisation. Distributional analysis presupposes categorisation. So does transfer. Consequently, any learning theory worthy of the name should pay special attention to categorisation. It turns out to be true, as I have noted previously, that categories across cognitive faculties exhibit “fuzziness” or gradience with some things clearly belonging to a given category, some things clearly not belonging, and a host of borderline cases (Putnam 1975a: 133; Rosch 1973, 1975; Rosch and Mervis 1975; Mervis 1980; Smith and Medin 1981). Some tokens or members of categories are judged to be better exemplars than others (Mervis, Catlin and Rosch 1976; Smith and Medin 1981). Categories across cognitive faculties also exhibit “family resemblances” (Wittgenstein 1953: 31–2, Rosch and Mervis 1975; Smith and Medin 1981). Natural kind categories behave differently from artefacts (Rosch, Mervis, Gray, Johnson and Boyes-Braem 1976; Markman 1989). Linguistic categories, be they categories of speech perception, of the morphosyntax or of the sociolinguistic system, exhibit exactly the same general properties as categories of the conceptual system (Crystal 1967; Ross 1972, 1973a, 1973b; Taylor 1989, 1994).5 It is this observation which has led me to the conclusion that the categories of language are i-learned using the same metaprocesses discussed above. It is worth emphasising that this perspective is perhaps not widely shared. One might conclude from discussion of linguistic categories in the SLA literature, categories such as noun, complementiser, etc., that the categories themselves are provided by UG. I have tried to emphasize in Chapter 3, Section 2.2 that such a view does not follow from anything in generative theory, rather we can assume that UG provides basic categorial features which are subsequently organised into a language specific typology of categories — hence i-learned — on the basis of the prosodic, semantic and/or distributional properties of the exponents of the categories. On a view of linguistic cognition where the language faculty consists of sets of autonomous representations and in which only some subparts of linguistic processing of these representations are modular, the shared properties of categories across cognitive domains can be explained. Representational autonomy requires that at least two representational systems be formally incommensurate and that communication between a pair of representational systems be mediated by correspondence rules (Jackendoff 1995: 5). This does not preclude, however, that the processes which create categories within each representational system cannot share properties. Jackendoff (1983) argues that human cognition instantiates the use of preference rule systems across a broad range of domains and at all levels of computation.
CONSTRAINTS ON I-LEARNING
185
It is the preference rule systems which explain the fuzziness of categories, their family resemblance properties, discrete exceptions, and the general difficulty in delimiting necessary and sufficient features to define them. We have seen, then, that the characteristics of preference rule systems are found everywhere in psychological processes, all the way from low-level perceptual mechanisms to problems so prominent in our conscious life as to be of social and political concern… Yet the notion of a preference rule system has not been recognised as a unified phenomenon, except perhaps by the Gestalt psychologists. The reason for this, I think, is that the kind of computation a preference rule system performs is quite alien to prevalent ideas of what a formal theory should be like. Formal logic, generative grammar, and computer science all have their roots in the theory of mathematical proof, in which there is no room for graded judgements, and in which conflict between inferences can be resolved only by throwing the derivation out. I see a preference rule as a way to accomplish what psychological systems do well but computers do very badly; deriving a quasi-determinate result from unreliable data. In a preference rule system there are multiple converging sources of evidence for a judgement. In the ideal (stereotypical) case these sources are redundant; but no single one of the sources is essential, and in the worst case the system can make do with one alone. Used as default values, the rules are invaluable in setting a course of action in the face of insufficient evidence. At higher levels of organisation, they are a source of great flexibility and adaptivity in the overall conceptual system. (Jackendoff 1983: 156–7)
We hypothesised that, in acquiring the novel categories of an L2, learners must i-learn new preference rules. New preference rules may also have to be i-learned to carry out the correspondences between novel categories or structures in one representational system and another.6 As previously stated, i-learning new preference rules will entail either learning new cues to a classification, re-weighting the relative importance of different cues, and suppressing the automatic activation of analytical procedures used in processing the L1. The Autonomous Induction Theory can thus explain in principle the nature of categorisation and re-categorisation in SLA. No other theory can. Let me elaborate what problems lie before the alternatives. Categorisation presents two rather serious conceptual problems for the P&P approach, given its claims that learning is limited to trivial or peripheral aspects of the lexicon. The first problem lies in the observation just discussed: linguistic categories exhibit general properties of categorisation across linguistic domains. On the view that linguistic cognition is both autonomous and modular (Fodor 1983), this result is mysterious.
186
INPUT AND EVIDENCE
The second conceptual problem for P&P theory turns on a methodological issue. Many SLA P&P studies rely for their assessments of the interlanguage grammar on acceptability judgement tasks which themselves are categorisation tasks (Birdsong 1989). While the SLA researcher might be interested in the nature of the learner’s morphosyntactic or phonological knowledge, by resorting to acceptability judgement tasks, he or she is using a metalinguistic task. A metalinguistic task is one which involves representation in the conceptual system and cannot reveal anything about the organisation of Fodorean modular linguistic systems. It can reveal only limited aspects about autonomous linguistic systems, namely where correspondences exist between the conceptual system and the others. There are, in fact, good reasons to suppose that responses to acceptability judgement tasks are indeed derived from activity in the conceptual system, and not from processing which accesses a modular or an autonomous grammar. Schütze (1996) provides a detailed review of factors influencing results on acceptability judgement tasks and concludes that such judgements are affected by just about every stimulus and procedure variable one can think of. Serial order, repeated presentation, deliberate judgement strategies, modality, register, preparation and judgement speed… various types of contextual material, the meaningfulness of the sentence, the perceived frequency of the sentence structure, and idiosyncratic properties of its lexical items… But perhaps the biggest lesson is the importance of instructions we give to subjects. (Schütze 1996: 169)
There is therefore no reason to believe that acceptability judgement tasks inform us directly about the learner’s interlanguage grammar. It seems far more reasonable to assume that these judgements involve an interaction between metalinguistic knowledge (encoded in the conceptual system), the internalised grammar, and the learner’s perceptual systems, especially since the judgements themselves display the “fuzzy” character one expects them to have.7 The Competition Model is designed to explain both the fuzzy nature of categories and the shared properties of linguistic and non-linguistic categories. Moreover, the Competition Model predicts that what is learned will not be a simple reflection of relative frequencies of properties of the speech stream but rather result from complex interactions among cue frequency, cue validity and cue reliability (cue conflict). The theory falls down, however, in pitching explanation at the wrong level. Cues in the Competition Model are properties of the speech stream, and hence correspond to bursts of energy, periods of cessation of phonation, and so on. There is no reason to think that L2 learners have encoded generalisations at this level of representation which will explain such
CONSTRAINTS ON I-LEARNING
187
things as preferred word order strategies. On the contrary, there is good reason to believe that the behaviour of learners exhibits sensitivity to phonological and morphosyntactic constituents — just one type of constituent encoded in abstract autonomous representations. 1.4 Summary To sum up: Distributional analysis would appear to be a prerequisite for encoding particular types of representations, constituents, hierarchical structures and word order patterns, and adults have the capacity to do it. But before words and word patterns can be used in speech production, they have to be mentally represented. Distributional analysis, as I view it, can only be understood in the long run in interaction with mapping procedures linking acoustic properties of the speech stream to phonological analyses of constituents (form extraction), which in turn have to be put into correspondence with morphosyntactic constituents. We know nothing, apparently, about how adult L2 learners extract prosodic words from the speech stream and isolate morphosyntactic words, except to note that lexical categories are often expressed and controlled in speech production before functional categories (a fact which can be explained phonologically, morphosyntactically or semantically). It is imperative that we begin to conduct specific studies of how learners learn their first words. Some of that research will have to be experimental. An essential part of the story, however, will also involve descriptive studies of the speech learners hear with a focus on such questions as the frequencies of particular types of prosodic constituents and acoustic cues to phonological constructs like stress. Finally, I have concluded that we need a theory which combines the best of both the generative and the Competition Model approaches: symbolic representations consistent with the best linguistic descriptions and a learning theory capable of explaining the prototypical nature of categories and the fact that cues to categories can compete with one another. The Autonomous Induction Theory has been designed with both properties in mind.
2.
Removing the straw man, or why induction needn’t produce “rogue grammars”
2.1 Induction and the Autonomy Hypotheses In Chapter 4, I briefly mentioned certain critiques of induction as an explanatory theory of SLA. I reject such critiques for they display the confounding of a
188
INPUT AND EVIDENCE
number of issues. For one thing, they confuse the manner in which induction has typically been modelled in formal work on learning, with necessary properties of induction. As we saw in the last chapter, induction has often been modelled as a random search through a search space. Formal (learnability) studies then ask if learning algorithms, under what must be viewed as a “worst case” scenario, will converge on the essential aspects of the target knowledge system. If the answer is “Yes”, that means that the algorithms and models in question might be reasonable models of human cognition. If the answer is “No”, it means that human learning of that knowledge cannot be modelled as a random search through the designated search space.8 In the case of primary language learning, the answer has indeed been “No” for every linguistic phenomenon investigated, on the assumption that the learner is not guided by any a priori knowledge about what linguistic categories or structures should look like. The presence of such a priori knowledge would thus constrain induction in important ways. It is well understood in learning theoretic circles that constraining induction is the central problem of induction theory (Peirce 1931–58, Vol. 2: 474–6; cited in Holland et al. 1986: 4, see also Landau and Gleitman 1985; Markman 1989; Gleitman 1990, for relevant discussion). The lack of appropriate constraints has led researchers in first language acquisition largely to reject induction as a major mechanism accounting for the core properties of grammars. We also saw that the claim that principles of UG, and especially parameter setting, must be operative in SLA because interlanguages do not display “crazy” rules or do not have “rogue grammars” presupposes that induction can only be unconstrained and necessarily leads to crazy rules. Since induction is usually considered to be a case of random hypothesis generation and hypothesis testing, the constraints problem arises largely at the point of hypothesis generation. In the absence of UG, it has never been apparent how the generation of hypotheses could be constrained to come up with just the right linguistic constituents and principles, since linguistic structure is not derivable in a straightforward manner from either perceptual properties of the speech stream or conceptual properties of language. In other words, if the learning mechanisms are not constrained to encode language in such terms, why should the learning mechanism induce syllables, or hierarchical morphosyntactic structure? Why should it induce c-command, or syllable structure constraints? No answer has ever been forthcoming from the psychology of learning. Moreover, within models of cognition in which the mind is viewed as an undifferentiated or unspecialised general processor, it has not been obvious why hypothesis testing should not be random. This leaves unanswered how language learning could be so fast and so relatively errorfree, and leaves unexplained the well-documented stages of language learning.
CONSTRAINTS ON I-LEARNING
189
The failure to take induction seriously as a solution to the representational problem also extends to SLA. No answer has been proposed to the questions raised by Archibald and others by those in SLA touting general theories of learning beyond the answer: “Transfer does it.” Such a response requires a worked out theory of transfer to be credible, but we know, on the basis of many years of investigation that not every feature, category or principle transfers. Language acquisition is not deterministic and no acquisition would take place if the L2 could only be encoded in terms of the categories of the L1. Consequently, the challenge of explaining the absence of “rogue grammars” remains; the “Transfer does it” answer is hardly satisfactory. This said, however, it seems to me that the critique of induction mounted by generativists is extremely naive, demonstrates an embarrassing ignorance of current research in the psychology of learning, and ultimately amounts to a nonsequitur. It may well be that various proposals in the first and second language acquisition literature have been unconstrained. It does not follow that induction is never and could not be constrained.9 SLA critiques of induction also often confound the question of whether interlanguages conform to UG with the question: Is language encoded in autonomous representational systems? Generativists presuppose the autonomy of linguistic representation, and many apparently believe that a commitment to induction entails as well as commitment to the hypothesis that all cognition is encoded in a single, undifferenciated non-autonomous representational system. It is certainly true that most inductivists have made this assumption because most of the people arguing for induction in language acquisition have happened to be functionalists, and functionalists are people who do not want to believe that there could be a set of language-specific structural representations. But it is not, however, a logical or conceptual necessity that induction be paired with nonautonomous cognition; the nature of the representational system is a distinct problem from the question of the mechanisms at work in language acquisition. I adopt the Autonomy Hypotheses, shown in (5.1), which are a more explicit working out of ideas stated more generally in previous chapters. (5.1) The Autonomy Hypotheses a. Human cognition is encoded in a variety of autonomous representational systems. Each autonomous system can be associated with a distinct domain of knowledge.10 A representational system is autonomous when it consists of (at least) some unique constituents, and principles of organisation. In other words, the “syntax” of autonomous representational systems does not reduce to, and is not necessarily in 1 : 1 correspondence with, other representational systems.
190
INPUT AND EVIDENCE
b.
c.
It follows from this last property that for representations of one sort to influence the organisation of other representations, there must be correspondence rules which allow the translation of constituents and structures from one representational system into another. Where no correspondence rules exist, no cross-system influence can occur. Language is encoded in autonomous representational systems.
I have hypothesised too that the representational systems which encode phonological and morphosyntactic information consist of linguistic constituents as I have defined them, that they exhibit structural dependence, and, in particular, exhibit properties like dominance, sisterhood, etc. I have also assumed that each autonomous level can also exhibit unique properties such as the c-command relation in the morphosyntax or the “No Crossing Lines” constraint in the phonology. Now assume that L2 stimuli are encoded in autonomous representational systems as per (5.1). We have seen that in the Autonomous Induction Theory, linguistic stimuli, regardless of the linguistic source (L1 or L2) will be analysed by the same language processors which parse L1 stimuli. There is no reason to assume that second language stimuli are not processed by the same mechanisms which process primary language stimuli, indeed, were we constituted differently, it would be impossible to learn a second language. Let us call this: The Uniform Parsers Hypothesis. (5.2) The Uniform Parsers Hypothesis Linguistic stimuli are processed by the same parsers regardless of the “origin” of the stimuli. Moreover, initially the same processes are applied. In other words, initially, L1 parsing procedures will be applied systematically and automatically to L2 stimuli. The procedures of the parsers are assumed to be based on structural information encoded in the representational systems of the L1 grammar.11
The Uniform Parsers Hypothesis together with the Autonomy Hypotheses in (5.1) guarantee that interlanguage cognition will display sensitivity to structural relations like c-command, dominance and sisterhood. This is so because (5.2) ties the operations of the parsers to the structural properties of the grammar. (5.1) and (5.2) therefore explain the particular forms of structural dependence of both first and second language acquisition that Flynn (1983, 1987, 1991) and Epstein et al. (1996) have attributed to P&P theory and which models by Bley-Vroman, Clahsen and Muysken, or Schachter do not explain. But I will make bolder claims: A theory of induction incorporating these two assumptions, such as the Autonomous Induction Theory, and only a theory making the same or similar assumptions can go beyond the explanation of these properties of psychogrammars to explain the developmental problem of second language acquisition discussed previously.
CONSTRAINTS ON I-LEARNING
191
Let me emphasise the point since it is critical to interpreting the theory: UG has been invoked to explain universal properties of linguistic representational systems. In particular, UG has been invoked to explain some of the primitives available to the L1 grammar and parsers plus constraints on structure-building. Induction alone will not explain these properties. Induction, however, will explain how grammars are restructured precisely when restructuring involves the acquisition of new categories, the acquisition of new cues to extant L1 categories, the re-weighting of existing cues to extant L1 categories, and locating instances of particular categories in particular locations in an input string (distributional analysis). Explaining how such restructuring occurs is the task of SLA research. The task is a difficult one. Documenting how learners get from an unanalysed speech continuum to phonetic, phonological, morphosyntactic and semantic representations of a given stimulus involves a kind of research which simply has not been done in our field. Despite the seeming pervasiveness of grammar-based research, none of it has been devoted to explaining how a learner could come to encode a particular kind of grammatical representation given a particular kind of stimulus. Developmental research has remained the purview of researchers with no particular commitment to a theory of grammar. Neither approach will lead to the development of an explanatory theory of encoding for reasons spelt out in Chapters 2 and 3. And encoding is only part of the story. Understanding why accurate representations at the various levels do not automatically lead to native-like control will require models of bilingual speech production which are just now being formulated (Pienemann 1998a, b). In my view, however, once the relevant research is done, it will show that SLA involves essentially these metaprocesses: classification or encodings of the sort “X is an instance of Y”, reclassification or reorganising the cues for a classification, distributional analysis or classification based on structural properties, and structure-building processes. I-learning will explain essential aspects of each of these types of information encoding. The metaprocesses will, however, take as their starting point certain structural constraints holding of representational systems, which will limit how classification and structure-building occur. 2.2 Induction is not random hypothesis-formation How does one explain that certain hypotheses and not others are generated? How does one explain that certain hypotheses and not others survive the hypothesistesting phase of learning? I have shown that considerable progress has been made on these questions in non-language domains of knowledge. One conclusion is already at hand: Induction should not be characterised as a random process.
192
INPUT AND EVIDENCE
According to the Holland et al. theory, MMs predispose us to create particular hypotheses which will turn out to be relevant to the resolution of a given problem. In generating new hypotheses, the learning mechanisms create minimally different representations from those currently active in the mind. This constraint will guarantee that i-learning is not random because alterations can be made only to activated representations and these will turn out to be the representations which contain a “problem” for the system in its on-line analysis of intake and/or input. Moreover, this constraint is just one of several which operate in such a way that i-learning is not only non-random but also conservative. Mental models change only a little through i-learning. Sudden, radical restructuring of an entire MM is precluded, providing us with another constraint on i-learning.12 (5.3) Constraints in i-learning (version 1) a. In generating new hypotheses, the learning mechanisms can only alter those representations currently active. b. In generating new hypotheses, the learning mechanisms create representations which are only minimally different from those currently active.
I have stated two principled constraints on induction. These are constraints which can be deemed internal to the operations and functional architecture of the mind as proposed by the Induction Theory and its avatar: the Autonomous Induction Theory. Other constraints have already been proposed. Something has to trigger the learning mechanisms. I have adopted the hypothesis here that learning is triggered when parsing of an incoming string fails. In other words, i-learning is crucially tied to the operation of the parsers, which, in turn, given the Uniform Parsers Hypothesis, initially encode specialised properties of the L1. I-learning therefore involves constructing new parsing procedures attuned to the expressions and classes of the L2. The construction of new parsing procedures involves learning new cues for categories or re-weighting particular cues as more (or less) important in the new system. This leads to another important constraint on i-learning; i-learning is tied to “trouble” in the system or to the detection of errors. (5.4) Constraints in i-learning (version 2) a. In generating new hypotheses, the learning mechanisms can only alter those representations currently active. b. The representations currently active when the learning mechanisms go into operation are those relevant for parsing a current string; i-learning begins when a current parse fails. c. In generating new hypotheses, the learning mechanisms create representations which are only minimally different from those currently active.
CONSTRAINTS ON I-LEARNING
193
Let me now reiterate two important consequences which follow from the hypothesis in (5.4b), namely (i) that i-learning will cease when the parsers can accommodate all relevant stimuli with current parsing procedures, current grammatical representations and current mental models. Parsing may be possible, given the ways in which parsers exploit cues, even when the production system is far from native-like. In the Autonomous Induction Theory, there is therefore an important disjunction between learning for successful parsing and comprehension and learning for successful production. This means that responses to inadequate production in the form of feedback and correction may constitute an important signal for the learner of “trouble” in the system, namely trouble in the formulation of acceptable speech. Feedback will normally not be directed at trouble in the parsing system, and therefore will not normally signal problems essential for restructuring the internalised grammar. This constitutes a considerable limitation on the role that feedback can play in SLA. (ii) global reorganisations of the sort which may affect the learner’s lexicon, for example, as native-like knowledge and proficiency are attained, will not be accounted for in terms of i-learning. The Autonomous Induction Theory is therefore incomplete in that it has nothing to say about such global reorganisations, should they prove to part of the SLA story. The generation of hypotheses will also be constrained by the nature of attention and perception since what can be construed about the L2 from the stimuli present in a given situation is clearly restricted by what we can perceive and attend to. This is a well known constraint on induction (Johnson-Laird 1983; Jackendoff 1987), if one whose mechanics are little understood. But constraints here work in two directions — mental models also constrain what we perceive and attend to. Similar remarks could be made about the constraints placed on i-learning by working memory and longterm memory. Here therefore is another area to look for constraints on i-learning, namely the ways in which i-learning interacts with the horizontal faculties of attention, perception, and memory (Fodor 1983). (5.5) Constraints in i-learning (version 3) a. In generating new hypotheses, the learning mechanisms can only alter those representations currently active. b. The representations currently active when the learning mechanisms go into operation are those relevant for parsing a current string; i-learning begins when a current parse fails. c. In generating new hypotheses, the learning mechanisms create representations which are only minimally different from those currently active. d. In generating new hypotheses, the learning mechanisms will operate
194
INPUT AND EVIDENCE
with representations currently under attention, and will be limited to what can be processed by working memory.
Constraints on i-learning can arise from the way in which “thinking” occurs. Since ratiocination is rooted in the mental models that we have in some domain, defining the relevant MMs for language (MML) is obviously yet another area where one can look for constraints on i-learning. Recall that Kellerman (1978, 1986, 1987) has argued that learners’ beliefs about the typological relatedness of the L1 and the L2 affect the extent to which they believe that the structure of the L2 lexicon will be like that of the L1 lexicon, which in turn affects the extent to which they believe what they already know will be relevant for learning L2 idioms. This is an example of how a particular MML can affect learner behaviour. I suspect at this point that such considerations may have more to do with how learners formulate speech in production, rather than how they comprehend speech. However, this distinction has not been investigated empirically in connection to Kellerman’s work on transfer and typological relatedness so I must simply set the matter aside. (5.6) Constraints in i-learning (version 4) a. In generating new hypotheses, the learning mechanisms can only alter those representations currently active. b. The representations currently active when the learning mechanisms go into operation are those relevant for parsing a current string; i-learning begins when a current parse fails. c. In generating new hypotheses, the learning mechanisms create representations which are only minimally different from those currently active. d. In generating new hypotheses, the learning mechanisms will operate with representations currently under attention, and will be limited to what can be processed by working memory. e. To the extent that conceptual representations can interact with autonomous representational systems, they can restrict the generation of new hypotheses, given that representations in one system will be brought into conformity with the content of representations in another.
Finally, we are armed with the assumption that adult learners transfer the representational system of the L1 to the L2 learning task, and that i-learning must respect the categories and computational operations of autonomous representational systems (i.e. the Autonomy Hypotheses of (5.1) and the Uniform Parsers Hypothesis of (5.2)). Therefore, there must be constraints on the ways in which the system responsible for generating hypotheses interacts with the
CONSTRAINTS ON I-LEARNING
195
autonomous linguistic representational systems, in particular, the grammar. This is a final area to look for constraints on induction. (5.7) Constraints in i-learning (final version) a. In generating new hypotheses, the learning mechanisms can only alter those representations currently active. b. The representations currently active when the learning mechanisms go into operation are those relevant for parsing a current string; i-learning begins when a current parse fails. c. In generating new hypotheses, the learning mechanisms create representations which are only minimally different from those currently active. d. In generating new hypotheses, the learning mechanisms will operate with representations currently under attention, and will be limited to what can be processed by working memory. e. To the extent that conceptual representations can interact with autonomous representational systems, they can restrict the generation of new hypotheses, given that representations in one system will be brought into conformity with the content of representations in another. f. I-learning respects the autonomy of grammatical representational systems, which means that it can neither create new primitives nor new operations.
The proposals in (5.7) constitute a significant set of constraints on i-learning and therefore take us one step further towards explaining why SLA does not lead to the development of “rogue grammars.” They do so without trivialising either the contribution of learning to SLA or the contribution of universals to linguistic cognition. 2.3 Jettisoning the problem-solving metaphor Induction is almost always associated with problem-solving, and talk of problemsolving usually means that induction is associated with “goal-oriented” activities. This in turn means that it is construed as intentional, and therefore is a consequence of conscious awareness. I intend to abandon this metaphor because it has proved to be an obstacle in the development of an integrated theory of language learning including both autonomous representational systems and i-learning rooted in the conceptual system. It has been pointed out before that construing language acquisition as a problem to be solved is completely unsatisfactory. It is easy to understand the agent’s role in conscious actions, hence the continued appeal of the problem-
196
INPUT AND EVIDENCE
solving metaphor. It is more difficult but still possible to characterise an agent’s role in subconscious actions, as Freud has shown, but the extension of the goalsetting metaphor to the characterisation of the internal regulation of cognitive states is extremely problematic. Much of language acquisition is not conscious nor is it available to awareness. As previously pointed out, learners do not “hear” the subsegmental features of consonants or vowels in the sense that they are representable independently of the syllables, or prosodic domains in which they occur, and can be questioned and discussed as features.13 Nonetheless, acquiring an accurate L2 accent requires that learners perceive these features and adjust their representations of them. This particular illustration is a good one of the inappropriateness of the problem-solving and goal-setting metaphor because learners actually often have as an explicit conscious goal the acquisition of a native-like accent. They want to sound like native speakers. Their (un)conscious sub-goals, however, do not include adjusting their Voice Onset Time values when perceiving and articulating syllables. Illustrations could be found from other areas of linguistic knowledge. The problem-solving metaphor demands that explanation of language acquisition take place in the phenomenological mind, but this demand cannot be met. One must permit explanation at the level of processing and structure. One must distinguish between the ego and the cognitive system as a whole. I can have certain intentions with respect to my environment, but my brain does not. If we are to avoid the mind/body problem, and, in particular, if we are to remain realists, then it must be granted that explaining cognition cannot require intentionality (Jackendoff 1987). This is particularly clear when we examine the processes involved in parsing speech stimuli. The parsing process occurs automatically and outside of the realm of our awareness. We could not suppress it or alter it, nor can we have conscious access to it, even if we wanted to. If the problem-solving metaphor has been so appealing, it is perhaps due to an unfortunate over-emphasis on overt behaviour and skills learning as opposed to perceptual learning and other forms of learning which involve representation and understanding rather than overt behaviour. In language production, in contrast to language perception, intentional activity just seems obviously explanatory.14 A perusal of the literature on slips of the tongue (Fromkin 1971, 1980; Cutler 1982a, b) will quickly show, however, that most of the action there too is taking place outside of consciousness or awareness. Let us grant that it is impossible to know if the learner’s explicit goals (assuming they have some) have any direct consequences for the ways in which they process input, or store information. In any event, in practice, theories of induction do not normally limit themselves to instances of intentional learning. The Induction Theory is no exception.
CONSTRAINTS ON I-LEARNING
197
In short, it is not limited to intentional actions on the part of some agent. On the contrary, its creators move quickly to the hypothesis that the cognitive system is performing certain cognitive “actions” independently of the agent’s intentions. I believe, for all of these reasons, that it would be preferable to eliminate the goalsetting metaphor from the Autonomous Induction theory. 2.4 The Coding Constraint A second property of induction mentioned by Holland et al. is that it takes place in non-specialised sub-systems. If induction is really a type of inferencing, and if it involves combining bits of new information either from the output of the perceptual processors, or from activated representations of stored information (perceptual representations of the environment, representations of the learner’s own internal states, currently activated information drawn from longterm memory, etc.) then it follows that induction must take place in some processor which has access to these different kinds of information. This means that information of various sorts is put to use by some non-specialised system. This is often referred to as the central processor. In the theory I am adopting, we have seen that there is no such thing. It is not necessary to induction that the functional architecture include a central executive and there are arguments against the idea (Jackendoff 1987). The hypothesis that induction takes place in a non-specialised system is the standard assumption in psychology. This would appear to require that one assume that cognition is largely non-autonomous. Or, that i-learning does not take place in the autonomous systems. I have insisted, contra most inductivist stances, on the autonomy of representational systems, in particular, those responsible for linguistic cognition. Holland et al. make the mistake of believing that they can rely on low-level perceptual processes, namely covariation detection, to make the system work. Covariation detection is supposed to provide the system with the basis for discovering errors in the system with respect to current stimuli, i.e. that everything is not “business as usual.” The fact that humans have language, and apes do not, is reduced to differences in “sensitivity” across species and individuals in a species, which Holland et al. (1989: 174) attribute to motivation, prior experience with the types of rules to be learned and the codability or events and/or event relations to be incorporated into rules. They concede, however, that associations are more likely to be learned if they involve properties that are relevant to the organism, where relevance must be construed as relevant to the internal cognitive organisation of the system.
198
INPUT AND EVIDENCE
Associations, however, are not an amorphous, undifferentiated thing; they can be defined as arbitrary and non-arbitrary (cf. Seligman 1970, sited in Amsel 1989). Arbitrary associations are those for which the organism is neither “prepared” nor “counterprepared” genetically. These have typically been the object of behaviourist learning experiments. But it has been well understood that non-arbitrary associations might be learned on a few trials or even a single trial if the organism is prepared for it. It has also been understood that association might never occur if the organism is counterprepared for it. So the potential role of genetic memory has always been acknowledged even by theorists who would reject the invocation of UG. Evidence […] indicates that human co-variation detection is also heavily influenced by the degree of preparedness to see certain associations. (Holland et al. 1989: 175).
Stripping this talk of its associative foundations, we can say that we are all agreed that humans might be prepared a great deal to discover certain patterns in the stimuli. Since Holland et al. are not linguists, or psycholinguists, have not developed their model to account for language, and presumably did not intend to develop their model for language, we may assume that if they knew as much about it as we do, they would grant that in fact humans are mightily prepared for learning language. And this preparation includes possessing autonomous representational systems which encode structure. To sum up: I have rejected the assumption that cognition must be nonautonomous. The empirical evidence for autonomy is now considerable and covers a number of domains of knowledge, not just language. Since I want to ensure that induction respects the autonomy of the subsystems it co-exists with, I shall introduce the Coding Constraint on induction in (5.8). (5.8) The Coding Constraint on induction Induction respects the basic properties of specialised autonomous representational systems. In particular, it can augment neither the set of basic primitives nor the set of basic operations
The Coding Constraint does not have to be stipulated. It follows from the Autonomy Hypotheses, but it is useful to spell it out explicitly. Notice that the Coding Constraint does not require that induction in adults be constrained by the properties of the autonomous systems as they are prior to experience. I am not assuming that the autonomous representational systems are a direct mirror of the initial state. Consequently, the Coding Constraint does not guarantee that i-learning respects the categories and computational principles of
CONSTRAINTS ON I-LEARNING
199
UG understanding UG to be that initial state. Rather, it ensures that i-learning will directly reflect properties of the representational state at the moment induction occurs. It should be assumed, therefore, that induction will respect the primitives and operations present in the L1 grammar. The Coding Constraint places nonetheless yet another very powerful constraint on induction. Categories and properties of language which are not encodable in the adult’s mature representational systems will not be i-learnable unless they are derivable from the primitives and operations of the linguistic system via the basic inductive operations. Should it turn out to be the case that the primitives and operations of the L1 grammar and parsers are quite different from UG, then there will be profound differences between what the Autonomous Induction Theory will permit and what “access to UG” will permit. This allows us to draw a clear distinction between the P&P and the Autonomous Induction Theory: Any primitive attributed to the initial state UG which is not represented in the L1 grammar ought to be “accessible” to the adult L2 learner if UG itself is “accessible.” Such primitives are not learnable in the Autonomous Induction Theory. I have already discussed studies of the perception of phonetic categories which show that adults do not bring “the initial state” of phonetic features to the acquisition of L2 sound systems. The evidence suggests that these features are no longer accessible to adult learners. They appear not to be i-learnable either, which supports the Autonomous Induction Theory. I discussed above the L2 learning of French gender and suggested there that no relevant category can be induced from conceptual categories. These examples are merely illustrative; the issue merits much more research. The Coding Constraint is designed to guarantee that new information to a cognitive subsystem is in the “right” format. Having postulated it, several questions arise: (i) What are the representational systems in question? (ii) What are the origins of new categories in L2 interlanguages? (iii) What are the interactions between the conceptual system and the various linguistic representational systems? Are they unrestricted and fully interactive, or are they modular? No precise answer can be given to the first question at present. The general answer required is that there must be enough formats to express all of the relevant distinctions observed in linguistic cognition. Finding a satisfactory answer will involve for the researcher at the very least figuring out how many levels of processing are needed to explain both language acquisition and the efficient processing of speech. Processing theories will be responsible to the distinctions discovered in competence theories. Grammatical theories of appropriate richness and generality may permit the elimination of redundant levels. Only ongoing research in psycholinguistics and linguistics will give us the right number and the right type.
200
INPUT AND EVIDENCE
To the second question: What are the origins of new categories in the interlanguage?, two answers have been given. One is the traditional answer we in SLA all resort to because we must, namely that new categories come from the existing cognitive systems. We transfer old categories to new stimuli, i.e. we project old categories onto to new stimuli, and re-categorise over time to approximate the categories that native speakers cognise. In Piaget’s terminology, we assimilate new stimuli to old categories, and ultimately adapt old categories to the stimuli. I will invoke the Coding Constraint to explain an obvious fact about transfer which, to my knowledge, remains unexplained. Weinreich (1953), Haugen (1953), and Kellerman (1978, 1983) have rightly insisted that transfer presupposes the identification of categories, with stimuli or input of some sort being treated as if it were “the same thing” as some category of the L1. What the best theory of transfer must explain, is why the identifications always seem to occur at the “right” level — acoustic phonetic stimuli are identified with the central category of phones, as Flege’s research has shown, syllable structures and higher order phonological representations are identified with phonological words and can activate L1 lexical entries, as the cognate literature reveals, and syntactic environments involving semantically selected elements (semantic roles) are identified precisely with the predicate-argument structures of verbs and adjectives (Harley 1989b; Harley and King 1989). Transfer thus operates in a highly constrained fashion in that only categories and computational principles relevant to a given level of representation and processing can be transferred, and they must be transferred to that level.15 It follows from the Coding Constraint that transfer must involve identification at the right level. It is simply not possible for a learner to identify a category of the syntax, e.g. the syntactic phrase, as if it were the same thing as an instance of segmental encoding. This will seem to many readers to be self-evident, which may prove only that many readers have unstated and unquestioned assumptions about the nature of cognitive organisation. These assumptions should be brought out into the light. The Induction Theory cannot explain this fact about transfer. Neither can classical connectionist theories. No theory which relies solely or largely on associationisn can explain it. Only theories which invoke structural representations can. The Autonomous Induction Theory does. New categories are also created from the basic primitives of the representational systems or from other complex categories. How does the system come to have these primitives? The traditional response has been to say “They’re induced from the categories of perception.” Holland et al. realise this response is inadequate but do not really come to grips with it. I assume here that the primitives of the autonomous representational systems cannot be induced. It
CONSTRAINTS ON I-LEARNING
201
follows therefore that they must be given a priori. We could say that they are “innate” but given the difficulties surrounding this term discussed in Chapter 3, I prefer to avoid it. I-learning plays a role in the creation of novel complex categories, using the primitives of the autonomous representational systems. Let us return to the issue of categories as prototypes. Flege (1991), for example, reviews a number of his studies showing that segments or phones in the grammars of both native speakers and L2 learners are represented as central representations. This means that there are no necessary or sufficient conditions for being heard as a [p], but rather that articulations that correspond to a certain range of acoustic features will be assimilated to the phone. He also shows that second language learners adjust their central representations so that representations of phones fall somewhere “in between” those of groups of monolinguals. Wode (1992) has also pointed out that the perception of segments exhibits properties of prototypes. But the situation is actually more complicated in that the perception of categories is dependent upon global properties of the cognitive system. For example, the perception of segments is normally not faithful to the acoustic reality, but rather interacts with lexical knowledge in quite precise ways. Labov (1994) discusses vowel shifts typical of dialects of the northern United States which involve the raising of low front vowels and the lowering and backing of high front vowels. In these dialects words with historical /aeh/ vowels are actually articulated with the vowels in the upper front area. Words with historical /o/ words are pronounced with the vowels in a fronted area, and words with historical /iy/ and /ey/ are pronounced with the vowels in a central mid area. Labov (1994: 197–9) discusses a number of misperceptions by listeners from other dialect areas which are directly related to the existence of lexical items in the listener’s lexicon which could correspond to the actual segments uttered. In other words, if I hear someone from Buffalo saying Jan when they are in fact intending to say John, but here them say Canada when they are attempting to say Canada (articulated as more like [Ákh7n6n6]), it is a direct consequence of the fact that I can project Jan onto the on-line representation of the acoustic stimulus but there is no closer relevant word than Canada to project onto the parse of that stimulus. Note that what counts as the “closest relevant word” in parsing depends, however, not merely on the properties of the stimuli, but also on the density of the lexical neighbourhood in the perceiver’s psychogrammar, the properties of the to-that-point-constructed morphosyntactic representation, as well as properties of the ongoing mental model of the discourse.16 None of these observations about word recognition can be explained in terms of co-variation detection and association alone. We must have recourse to constituents and structures of various sorts. If linguistic cognition reduces to varying degrees of
202
INPUT AND EVIDENCE
activation among associated elements in which the elements correspond only to low-level features of the acoustic stimuli, then these observations about word recognition cannot be stated let alone explained. The Coding Constraint requires the learning processes to be framed in terms of the relevant representational types and the transfer of specific representations will mean that learning will require adjusting the boundaries of extant categories. 2.5 The role of feedback The third property of induction is that the organism is constantly being fed information about the accuracy of its representations of the environment. There are two respects in which feedback could play a role. In the first instance, internal monitoring mechanisms could be examining outputs of various processors to determine if they are in the “right” format for some task. These might be entirely self-regulating and independent of the operations of the conceptual system. Self-regulating, mechanistic, monitoring could occur within a completely modular system of linguistic cognition. I assume that all models of cognition will include self-regulatory mechanisms. But these will not play a special role in the Autonomous Induction Theory distinguishing it from other theories. Feedback, I have said, has a causal role to play in the restructuring of knowledge only if the contents of the learner’s conceptual representations can be shown to be the essential element initiating change. This leads to the Feedback Constraint. (5.9) The Feedback Constraint Changes to representations which are initiated by feedback must be assumed to be caused by the contents of the feedback, which, if given verbally, will be encoded in conceptual structures, after the speech expressing the feedback has been parsed in the usual fashion. The provision of metalinguistic feedback therefore presupposes that the organism has the capacity to encode linguistic categories and concepts in conceptual structures.
In this book feedback is understood to involve the intervention of some external agent. In the typical case of language acquisition, it will involve a language teacher or some native speaker, who provides the learner with independent information about the accuracy of her representations of the language. The feedback will be instantiated in speech. Deploying feedback will therefore involve first decoding the speech, that is parsing the speech and deriving an accurate conceptual representation of the feedback provider’s (henceforth the
CONSTRAINTS ON I-LEARNING
203
corrector’s) message. The parsing and interpretation processes create opportunities for much of the information content in the correction to “go astray”. Consequently, it is not the provision of feedback per se which is important for a theory of language acquisition but the ways in which learners interpret and deploy it. Since the feedback is encoded in conceptual structures, but the grammar is encoded in a variety of other autonomous systems, there must be points of interaction where the information in the conceptual structures can be reencoded in the proper format. Adopting the Autonomy Hypotheses and the Coding Constraint has forced on us the adoption of correspondences rules which will map categories of the conceptual structures onto categories of the linguistic system. Where no correspondences are possible, feedback will have no effect.
3.
Summary
In the previous chapter, I introduced a modified version of the Induction Theory of Holland et al. (1986). In this chapter, I have focused on a number of constraints which will limit how induction operations can create new representations. The induction functions are assumed to perform only one operation at a time, in such a way as to respect the Autonomy Hypotheses and the Coding Constraint, meaning that induction cannot directly create primitives or introduce them into the autonomous representational systems. The Coding Constraint forces the implementation of correspondence rules which mediate between the mental models related to language and the autonomous representational systems of the interlanguage grammar. Changes to the system are created when errors are “detected” and this happens when the parsers cannot assign an analysis to incoming material with extant procedures. It should be kept in mind, that analysis must occur in both the correspondence and the integrative processors. These innovations constitute major constraints on the original Induction Theory. The Autonomous Induction Theory was also shown to be constrained in a variety of other ways, by inheritance of constraints which also hold of the Induction Theory. Change can occur only in representations currently activated. Induction therefore always occurs on-line. Operations are limited to creating minimally different representations from “parent” representations. These then compete with the parents within the system, the best representation winning out. Rules get activated in clusters, and the clusters can lead to the development of default hierarchies. When problems occur in a cluster of representations, the Bucket Brigade Algorithm works back locally from the point of the identification of the error to locate the source of error.
204
INPUT AND EVIDENCE
Taken together, these various sets of constraints provide a constrained theory of induction which I believe can account for many of the major properties of interlanguages. It must now be admitted that these are areas to look for constraints on induction. There has been little attempt to elaborate a model of induction in SLA so to date we know very little about how i-learning actually operates in this cognitive domain. This discussion, and the questions in (5.10), thus elaborate what amounts to a research program; at the moment there are few facts of the matter to report. (5.10) a.
b.
c. d.
How do the processes which construct novel representations of the L2 actually work? And how are they constrained by the functional architecture of the mind? How does perception and attention affect the construction of MMs relevent for the L2? And vice versa, how do MMs affect the perception of L2 stimuli and attention? What are the mental models relevant for second language acquisition? How are MMs actually derived? What are the correspondences between MMs and the grammatical representations? In particular, what are the restrictions on these correspondences? And specifically, what role is played by feedback and correction in the construction of MMs?
By attempting to answer the questions in (5.10) for a variety of different kinds of linguistic knowledge, we can expect to develop constrained models of i-learning, models which, in interaction with detailed proposals about the representational systems in question, really will provide explanations of L2 development.
Notes 1. I have no empirical evidence that this is the case but am relying on certain observations emanating from the typological literature. First of all, fundamental frequency is instantiated in all languages, and is realised either as tone or as pitch. In tone languages, tone is expressed over a lexical domain and indeed typically appears as a contrastive feature of the language, discriminating among lexical items which are otherwise segmentally identical. In pitch-accent languages (such as Japanese), fundamental frequency is also expressed over a lexical domain. In intonational languages, fundamental frequency can be realised as pitch associated with stress marking or prominence. Stress, in a language like English, is expressed over lexical domains. Thus, even in English, where intonational tunes may be realised over domains much larger than the word, these larger domains are still linked to lexical domains via accent. The final observation is that a language has one and only one of these ways of realising shifts in fundamental frequency, suggesting that its realisation over lexical domains may be a truly
CONSTRAINTS ON I-LEARNING
205
universal property of linguistic cognition, something that an L2 learner would necessarily bring to the problem of extracting words from the speech stream. How this might interact with the specific prosodic properties which a learner might transfer is not clear to me at this point. 2. It should be noted that at the ends of intonational phrases, shifts in fundamental frequency may well coincide with other cues such as pre-boundary lengthening and the occurrence of a pause at the end of the intonational phrase. Bernstein-Ratner (1986) has shown that pre-boundary lengthening is exaggerated at certain phases of parent-child exchanges but is eliminated as the child grows older. The same can be said for the exaggerated contours which some mothers use. Consequently we have no reason to assume that exaggerated features would be part of the language adult L2 learners are exposed to. 3. Ingram suggests (1989: 68) that the fact that children acquire suffixes before they acquire prefixes could also be attributed to a recency effect in auditory processing but that alone will not explain why they derive suffixes and prefixes from their analysis and not merely syllables. 4. The specific empirical claim that inflectional suffixes will be acquired before prefixes has been questioned. See Weist and Konieczna (1988). 5. This observation appears to be one of the motivations behind the development of Cognitive Linguistics. See Taylor (1989) for discussion. 6. These preference rules will do much of the work of the connectionist activations linking the input cues and the output knowledge of the Competition Model. It is essential, to make sense of my claims, to keep in mind that preference rules, in creating categories are creating complex constructs. Categories are not primitives. Representational primitives are given along with the autonomous representational systems. They are not induced. 7. It is precisely for this reason that I am disinclined to regard data from acceptability judgement tasks as evidence that interlanguage grammars violate principles of UG. 8. Even parameter setting models tend to adopt this approach (Dresher and Kaye 1990; Gibson and Wexler 1994). 9. Gregg (p.c. 1995) claims that the problem for induction is its “fallibility”, by which he means (I assume) that the learner can fail to arrive at a correct solution. But learners typically do fail to arrive at a unique solution to a wide range of language acquisition problems. This is why we have linguistic variation. The real point is that learners arrive at unique solutions for a limited range of phenomena; we need recourse to UG for these and for none others. A theory which invokes only UG to provide an account of phenomena like, e.g. the class of verbs corresponding to double object verbs, the class of words which can take -hood as a head suffix, the location of stress on a variety of learnèd words, or the ability of reflexive pronouns to substitute for simple pronouns in sentences like John saw a picture of himself in the gallery, will not be able to explain why native speakers regularly disagree as to the class of acceptable and unacceptable cases. 10. However the converse — If X is a distinct domain of knowledge, then X is an autonomous system — is not true; every domain of knowledge need not be instantiated by a distinct autonomous representational system. The exact number, and the formats they assume, must be determined by empirical investigation. 11. It is a corollary of this hypothesis that learning the L2 involves the development of L2 specific parsing procedures wherever the L1 procedures fail. 12. Similar constraints have made their way into models of parameter setting. Gibson and Wexler (1994: 410–2) propose that the algorithm which sets parameters, called the “Triggering learning algorithm” (TLA) only operates when a sentence currently being parsed fails to be parsed, that
206
INPUT AND EVIDENCE a parameter value can be altered by the TLA only on parsing failure, and that the learner somehow cognises that certain parameters may be more relevant to the parsing failure than others. They do not spell out, however, how the learner cognises these things. Dresher and Kaye (1990) attack this problem head on by supposing that there is a unique relation between triggers and parameters. These conditions, it should be noted, are part of a learning theory, and not part of UG. It is therefore grossly misleading to assert, as some like to do, that UG just is a theory of language acquisition.
13. It should be emphasised that metalinguistic instruction plays no role here. Phoneticians cannot have the relevant sub-goal any more than the naive L2 learner. The perception and representation of sounds is just like that: automatic and unconscious. 14. This is an illusion, which will become obvious as the discussion proceeds. It should also be pointed out that intentionality is distinct from “directed cognitive processing” resulting from interactions of attention, the conceptual system and perception. Holland et al. (1989: 93–4) discuss focused sampling. Focused sampling takes place when an organism is actively searching for a particular type of stimulus. For example, it is known that subjects will hear degraded speech samples more accurately if they have been given prior information about the nature of the stimulus. This and other types of information about speech perception suggest that it does not involve matching stored representations to acoustic phonetic properties as they are perceived. Rather, it involves creatively projecting likely representations onto a partially specified representation in construction at the current point in the parse. This is a general property of cognition; while I might intend to look for a given door in my Weihnachtskalender, I don’t intend to see it. 15. One can argue that transfer exhibits these effects only because the analyses have been expressed in particular ways, in other words that the effects are a consequence of the analysis and not of linguistic cognition. The critique, however, misses an obvious point, which is that scholars investigating transfer, in particular, the scholars cited, have been seeking to explain obvious generalisations and the generalisations are only statable as, e.g. “phonological” or “syntactic” or “semantic.” 16. Almost all of the errors of perception cited by Labov respect the syntactic category of the stimulus, thus nouns are misperceived as other nouns, and verbs are misperceived as other verbs. These errors of perception illustrate that lexical selection is not a bottom-up process only. See also Bond (1981), and Warren, Bashford, and Gardner (1990). In addition, Labov’s collection of misperceptions also tend to respect the basic logical categories of the input, with names being substituted for other names. This might be an accidental property of Labov’s example, which was chosen merely to illustrate how lexical closeness affects perception across dialects differing in the presence/absence of the Northern Cities Vowel shift. Systematic studies of “slips of the ear”, however, show that errors of lexical perception generally respect the syntactic category of the input, and often respect the basic logical category showing how the projection of non-phonetic information structures lexical selection. See Garnes and Bond (1975, 1976), Cole and Jakimik (1978), Bond and Robey (1983).
C 6 The logical problem of (second) language acquisition revisited
1.
Introduction
In this chapter and the next, I wish to return to the metatheoretical debates introduced in Chapters 2 and 3. More particularly, I shall take up the two principal arguments made which lead to the assertion that UG is necessary as an acquisition mechanism to explain SLA. If these arguments can be shown to be without basis, then the case for the Autonomous Induction Theory (or some other, better theory of i-learning) is stronger. The first argument, to be addressed here is that there is a logical problem of second language acquisition which Universal Grammar but not induction can solve (White 1985a, 1989: Ch. 2; Birdsong 1989: 89; Bley-Vroman 1990). The second argument, to be addressed in Chapter 7, is Schwartz’ (1987) hypothesis that the language faculty is modular in such a way that language acquisition can only take place bottom-up, on the basis of stimuli from the speech stream. The major claim in this chapter will be that there is no logical problem of second language acquisition, indeed that the logical problem of language acquisition has been misrepresented in the literature mentioned, reducing it, incorrectly, to the claim that the stimulus is impoverished with respect to ultimate attainment. While the Poverty of the Stimulus Hypothesis is an essential component of the logical problem of language acquisition, the two are conceptually distinct.1 In order to make my point, I review the various factors involved in resolving the logical problem of language acquisition. It will quickly become apparent that resolution involves committing oneself to properties of the learner, the learning situation, the what to be learned, and to a definition of successful learning. Making such commitments really means converting the logical problem into an empirical problem of language acquisition. I argue that the logical problem of language acquisition can be resolved in principle in a variety of ways, but that the resolution of the empirical problem rests on the facts of human linguistic cognition which differ significantly in the case of first and second language acquisition.
208 2.
INPUT AND EVIDENCE
There is no logical problem of second language acquisition
2.1 The form of the argument UG, I have argued, is necessary to explain how a learner comes to have a representational system capable of encoding phonological and morphosyntactic information. Such a system cannot be i-learned from properties of stimuli, nor is it derivable from conceptual representations. Now that we have explored what an induction theory has to look like, in particular the relationship between inductive reasoning and mental models on the one hand, and the autonomous nature of the grammatical representational systems on the other, this argument is more motivated. It becomes compelling on the assumption that the pre-linguistic child has a comparatively primitive conceptual system and limited i-learning abilities.2 This comparatively primitive conceptual system will not support the induction of the representational primitives and combinatorial algorithms of the phonological and morphosyntactic representations (Fodor 1980; Felix 1981). The case for UG, in some form or other, in primary language development is therefore a virtual necessity. The case goes like this: At the same time that they are acquiring a grammar, children are also acquiring complex conceptual systems. This means that they cannot map from a complex conceptual representation to a simple phonological or morphosyntactic representation. Since they lack other compensatory cognitive mechanisms which could overcome the inadequacies of the stimuli in the speech stream (those bursts of energy and bits of silence), there is no other mechanism for inducing the basic properties of these representational systems. The only compensatory mechanism could be UG. One is therefore forced to assume that UG is part of the language acquisition device (LAD). Indeed, I have assumed that UG is part of the LAD in precisely the sense that it provides the basic representational system for the pre-linguistic child who neither induces the primitives of a grammatical representational system from linguistic stimuli nor maps them in a one-to-one correspondence from an initial innate conceptual system. The same argument could be made using the child’s parsing systems. At the same time that the child is acquiring a grammar, he is also acquiring language-specific parsing procedures. Therefore, the parsers cannot be used as the bootstrapping mechanism for the development of the specific properties of the representational system. Developmental psychology has often assumed that language specific universals could be dispensed with and that grammar, to the extent that it exists, can be derived from either perceptual primitives or conceptual ones. According to this view, the traffic goes in one direction only, from the properties of the
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
209
initial conceptual and perceptual systems to the categories and features of the morphosyntactic system, so that the child’s initial knowledge of the structural categories and organisation of language can be kept to a minimum. Occam’s razor can then be invoked and arguments raised that UG ought not to exist. For example, the young child appears to discover the categories of the morphosyntax by treating referential expressions as nouns, and forms expressing actions as verbs. This correspondence then goes from the non-linguistic conceptual system to the grammar. Why shouldn’t such relations define linguistic competence in general? The reason is, as Landau and Gleitman (1985), Markman (1989: 37), and Gleitman (1990) have rightly insisted, that linkings work in the other direction too. Children also use the formal properties of morphosyntactic categories to discover the sub-categories in a given semantic field. This means that those formal properties must be independently available. To rephrase, i-learning a given language requires some initial representational system(s) in which the various acoustic/phonetic, phonological, morphosyntactic and semantic properties of language are encoded. In the case of primary language acquisition, this set of initial representational systems includes at least those representational primitives made available by UG, a priori acoustic and visual perceptual representation systems (which, as it turns out, are quite rich and complex) and the initial conceptual system. In the case of second language acquisition, the relevant set consists of every representational system the learner has in place at the point in time where L2 acquisition occurs, including a mature and rich culture-specific conceptual system, the specific grammar of the L1, the specific parsing and production systems associated with the L1, a mature acoustic perceptual system, a mature visual perceptual system, a mature set of domainspecific problem-solving systems, and so on. All the evidence available provides support for the assumption that adults come to the acquisition task capable of deploying their mature representational systems to the purpose of acquiring the L2. Unlike the child’s initial representational system, however, these mature systems are a product of both “innate” capacities and acquired information. In other words, the initial knowledge states of the adult learner (at the point of analysing L2 data) are such that she can encode stimuli in various representation types. The relevant literature suggests that she can encode concepts in terms of propositions consisting of predicates and referential categories, quantified expressions, bound variables, etc. It also suggests that an adult learner can encode propositions as sequences of sentences, and sentences as sequences of structured morphosyntactic categories (both referential and non-referential). It suggests that the abilities to encode the L2 speech stream in these ways are present in L2 acquisition from the very start.
210
INPUT AND EVIDENCE
It is true that a theory of SLA must explain properties of the initial state, but one is free to hypothesise that adults are transferring knowledge which has been arrived at in any number of ways. It could be a priori knowledge, acquired through selective learning or canalisation, or it could be i-learned. Consequently, one cannot argue that “access” to UG is logically necessary in SLA in order to explain the fact that adults are capable of representing language in terms of linguistic structures. There is no separate logical problem of language acquisition for each particular language that an individual might acquire. Consequently, there is no logical problem of second language acquisition. To summarise, I claim that “access” to UG is not a necessary element for explaining the fact that adults can represent linguistic stimuli in structural terms when they acquire a second language. However, it is a necessary part of the story of how they happen to have the representational systems in the first place. That they do is explained by the very fact that they have an L1 grammar, L1 parsing and speech producing systems, and a complex and mature conceptual system. UG is thus implicated in SLA but only in a very indirect way. It provides the pre-linguistic child with the basis for acquiring an L1 psychogrammar, which consists of rich, language-specific information encoded in various representational systems. Secondly, UG is not sufficient to explain the properties of acquired languages, either the developing linguistic systems of children or the interlanguages of adults. This is so whether one construes UG as the initial state of acquisition, or as a mechanism responsible for cross-linguistic variation. In the rest of this chapter, I will present the logical problem of language acquisition and develop the argument just given in brief above that there is no special logical problem for second language acquisition and claims to the contrary amount to a serious confusion of the representational and developmental problems of acquisition. 2.2 What is the logical problem of language acquisition? 2.2.1 Three basic assumptions The logical problem of language acquisition is a problem for a theory of i-learning. It asks the question: Are natural languages i-learnable in principle? Providing an answer depends on at least five factors: (i) the formal properties of the class of languages to be learned, (ii) the time allotted for learning, (iii) the criterion used to define success, (iv) the nature of the input, and (v) the computational nature of the learner (Gold 1967; Pinker 1979; Osherson, Stob and Weinstein 1984). Defining these factors involves introducing other considerations into learnability theory. The more these considerations actually reflect psychological
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
211
properties of learners and the learning situation, the more such studies become relevant for SLA. In other words, learnability studies can be done in such a way as to move us away from the logical properties of learning to the empirical properties. By referring to the empirical problem of language acquisition, I mean the problem of how real learners could in principle induce a language. Let us consider the first factor: What is being acquired? There are various kinds of responses to this question which are possible. Mathematically formalised studies of learning define languages in two ways: extensionally, and through a generative grammar.3 A language like English can be defined extensionally by simply listing all of its sentences. We could model i-learning extensionally by hypothesising that learners hear and represent sentences one-by-one. We would therefore define learning English as learning the set of all English sentences. This approach is, however, psychologically implausible. Life is too short to learn oneby-one a set of infinite sentences. Moreover, learners don’t learn sets of strings, they learn form-function pairs. They also, so I have claimed, learn all sorts of abstract types of knowledge associated with such form-meaning pairs. This abstract knowledge can be used to encode and understand sentences that one has never heard before. It also is deployed in recognising that languages are recursive, and that sentences could be infinitely long (Langendoen and Postal 1985). We must therefore assume that the empirical problem of language asks how learners could induce the properties of a grammar capable of describing the sentences of a language.4 What the grammar looks like is, however, very much a matter of debate, and the answers provided to the question: Are natural languages learnable in principle? will clearly depend on what assumptions one makes about the grammars in question. Explaining the empirical problem of language acquisition thus demands, as we have seen, a commitment to a theory of grammar. Learnability researchers are content to assume that the time allotted for learning must be finite. In other words, learning algorithms must converge on a grammar in finite time. This is obviously a very generous definition. Language acquisition in real learners takes place not merely in finite time, it occurs over relatively short periods of time.5 The criterion used to define success could include a variety of things. Learnability studies often assume that learning is successful if the learning algorithm decides-in-the-limit on a unique grammar generating all and only the set of acceptable strings of a language, leaving it up to grammarians or psycholinguists to spell out what actually fits in the relevant set. This criterion has little psychological plausability since it will be both too strong and too weak. It is unlikely that my psychogrammar is actually capable of describing all of the
212
INPUT AND EVIDENCE
acceptable strings of English (there are lots of words and even constructions that I don’t know or don’t use), and my speech also contains utterances which are not acceptable to others. In the learnability literature, input to the learning mechanisms is defined as potentially the set of acceptable sentences of the language-to-be-learned, if the input is to exclude negative evidence, or as sets of acceptable and unacceptable sentences, if the input includes it. Obviously, this is also psycholinguistically implausible since the stimuli learners hear are not sentences but utterances, a much larger and less well-defined set. Real learners also have to make do with subsets of sentences of the language, and these may turn out to be quite small.6 They may have other forms of input as well. The matter of the type of input available for language learning turns out to be crucial. Gold (1967) demonstrated that a general induction procedure could not induce the class of primitive recursive languages (which is usually assumed to include the class of natural languages) on the basis of finite input which consisted only of sentences which belong to the language-to-be-learned.7 In the absence of negative evidence, therefore, the class of natural languages is not learnable-in-the-limit by inductive mechanisms of the sort Gold adopted. The last factor, the computational nature of the learner, is obviously of critical importance. Within learnability studies, one adopts the weakest set of assumptions about the learner’s computational capacity. A learning procedure is defined simply as an effective function from the set of sentences serving as input to the set of grammars in some class (Wexler and Culicover 1980: 48). Therefore, by saying that the class of primitive recursive languages could not be induced on the basis of sentences of the language-to-be-learned in the previous paragraph, I mean that there is no effective function from the data set to the relevant class of languages.8 The learnability literature has thus already provided a kind of demonstration that there are three assumptions which cannot be simultaneously held, if both languages and/or the representational systems which encode them are to be induced: (6.1) a. b. c.
the assumption that the learner is linguistically uninformed; the assumption that the learner is cognitively uninformed; the assumption that learners successfully i-learn their language from linguistic data which is impoverished in some sense.
Assumption (6.1c) has been discussed enough to merit its own name: The Poverty of the Stimulus Hypothesis. Since it is my contention that the other assumptions are equally important I shall give them names too. Let us refer to
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
213
(6.1a) as the Hypothesis of Linguistic Innocence. Let us refer to (6.1b) as the Hypothesis of Cognitive Innocence. These three assumptions together are the true basis for the conclusion that UG is necessary to solve the logical problem of language acquisition. Together they are also the basis for the conclusion that UG is necessary to solve the empirical problem of language acquisition. This problem has been much discussed in the first and second language acquisition literature (Bley-Vroman 1990; Hyams 1991; Pinker 1989: Ch. 1; Saleemi 1992; White 1989b: Ch. 2; inter alia). It may seem to many, therefore, that there is no pressing need to revisit it. It seems to me, however, that the presentation of the logical problem of language acquisition in most of the works cited is misleading. In particular, several of these works suggest that it arises solely because of the poverty of the stimulus.9 Because this suggestion is misleading, the conclusions which are supposed to follow for SLA from the logical problem — that any theory of adult second language acquisition must posit Universal Grammar as a mechanism of language acquisition — fail to convince. 2.2.2 The linguistically innocent learner Assumption (6.1a) corresponds to the hypothesis that learners bring no a priori knowledge about the nature and structure of language to the i-learning task. In other words, i-learning of a given language grammar, or of a grammatical representational system, must take place unaided by any prior information about what languages or language grammars or the set of relevant sentences of the target language might be like. Since we know that natural languages are not learnable in principle assuming (6.1), we must ask: Which assumption can be abandoned? In moving from the logical to the empirical problem of language acquisition, we might make some assumptions about whether the learner is linguistically innocent, or how innocent he is. In the case of primary language acquisition, this means making a commitment as to whether the learner has knowledge of the basic properties of the representational systems, by hypothesis, phonetic and/or phonological systems, a morphosyntax, and a semantics. Of course, one must also decide if the learner also has knowledge of the specific properties of the psychogrammar to be induced from the stimuli in the learner’s environment. In the case of the psychogrammar, it is assumed that the learner could have no prior linguistic experience since regardless of one’s position on innateness, no one assumes that children are born knowing the idiosyncratic properties of specific languages like English. So in the initial state of language acquisition, neonates are linguistically innocent with respect to the psychogrammar, and may, or may not, have prior knowledge of a universal grammar, a universal phonetic representational system or a universal conceptual system.
214
INPUT AND EVIDENCE
Adopting UG is, however, supposed to be the conclusion that one arrives at after consideration of the logical problem of language acquisition. It is precisely the correctness of the Hypothesis of Linguistic Innocence, as applied to the question of how children come to have a grammatical representational system, that generativists and non-nativists choose to fight about most, when they turn their attention to matters of language acquisition. Generativists argue that one adopts UG because the psychogrammar cannot be learned without it. Nonnativists have tended to argue that children are much more linguistically innocent than generativists assert. The most extreme version of (6.1a), namely that the child initially knows nothing about language or grammars, follows directly from an extreme theory of mind: the mind of the child as a tabula rasa with no or only rudimentary a priori knowledge about anything. According to the tabula rasa view, all adult knowledge is the result of induction. While this extreme version of empiricism is a respectable philosophical stance (see Locke 1690/1964), and is essential to the learnability demonstrations, it is presupposed only in certain behaviourist and folk theories of child cognitive development which are no longer fashionable. It is no longer, in other words, an acceptable psychological stance. Just about everyone in the business of describing and explaining what children know and do, now takes the view that children bring to the task of learning everything, including language, some sort of a priori knowledge which manifests itself through specific, largely invariant behavioural tendencies. They adopt some version of nativism, either UG, or general nativism, to use the terminology of O’Grady (1987), meaning innate systems which transcend linguistic cognition. Nativists and non-nativists then agree when they assert that the fundamental question is: Just what does this innate knowledge look like? If UG constrains i-learning then children do not have to learn to represent stimuli in terms of morphosyntactic or phonological constructs because UG forces them to do this. UG forces particular types of analysis by making available autonomous grammatical representational systems. Moreover, it is often argued that autonomous domains of cognition are modular — they operate in relative isolation from the influence of stored information, and induction and inferencing mechanisms. Consequently, generativists arrive at the conclusion not merely that the linguistic system happens to be innate, but that it must be innate. Similarly, they have tended to assert that the organisation of linguistic cognition is such that it is not only modular but that it must be modular. Non-nativists, in contrast, tend to assert that the innate knowledge corresponds neither to a system of grammatical representations nor to principles and autonomous processes which would derive such representations. In other words, they have tended to reject both claims about the autonomy and modularity of
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
215
linguistic cognition. Instead, they have argued for domain-general principles of induction, part and parcel of a single, all-embracing general theory of learning. They thus make very different claims about the functional architecture of linguistic cognition. With respect to specific claims about language, they assume either that one can acquire the psychogrammar without acquiring the abstract grammatical properties attributed to UG (in other words they simply deny that linguistic theory is about anything of relevance to psychology), or, they assume that one can induce the properties of both the psychogrammar and the grammatical encoding system. It is often assumed that the properties of the sound system can be induced from a general theory of perception, while the properties of the morphosyntax, to the extent they are independently motivated, are learnable from properties of the conceptual system. To summarise: the real disputes about the development of cognition have traditionally revolved not around the necessity of epistemological innateness, but around the contents of that innate knowledge. In other words, proponants of UG are prepared to abandon (6.1a) while non-nativists and general nativists prefer to abandon (6.1b). At the same time, disputes have arisen (i) around the question of whether there is something we can call general learning processes which cause the final state of linguistic knowledge in the same way they are assumed to cause the acquisition of other cognitive sub-systems (the autonomy question); (ii) the extent to which grammatical development is dependent upon perceptual and conceptual development and how that dependence is to be characterised (the functional architecture question); and, (iii) whether cognitive domains which are autonomously represented must also be modular in their processing (the modularity question). 2.2.3 The cognitively innocent learner Assumption (6.1b) corresponds to the hypothesis that learners have un- or underdeveloped cognitive systems, including perceptual, inferential or logical and conceptual systems. The most extreme version of (6.1b) also follows from the tabula rasa theory of mind. It can be contrasted with the position that learners bring to the learning task mature and fully developed perceptual, conceptual and logical systems which are appropriate and attuned to the properties of the nonlinguistic environment. A third and intermediate position is that children bring to the task of acquisition conceptual systems which are merely immature. The extreme version of (6.1b), like the extreme version of (6.1a), is no longer very popular, for good empirical reasons. The developmental literature on category and concept-learning provides considerable evidence that the perceptual and cognitive systems of infants and small children are far from being tabula rasa.
216
INPUT AND EVIDENCE
Lasky and Gogol (1978), for example, have examined the child’s ability to perceive relations among stimuli, in particular, the relative motion of dots moving horizontally and vertically. The thinking behind this work is that stimulation from the visual field is infinitely variable and must be grouped into figures and objects to be experienced as THINGS. Perceptual grouping in turn requires that we respond in relative ways (rather than in absolute ways) to cues from the environment, and to the interrelations among stimuli, since the boundaries of figures and objects are not reflected in a straightforward way in the structure of light to the eye. Lasky and Gogol found that by five months of age infants are sensitive to relative-motion cues and appear to be able to group stimuli perceptually. Perception of motion appears to play a critical role in the development of object perception. Spelke (1982) discusses habituation studies in which infants are required to perceive objects in cluttered, changing arrays. These reveal that infants perceive objects by the spatial arrangement of surfaces in a scene and are not “distracted” by the colours or textures of the objects. She proposes moreover that objects are perceived on the basis of two a priori principles: the Connected Surface Principle and the Common Movement Principle. The former says that two surfaces pertain to the same object if they touch each other directly (Spelke 1982: 426). The latter says that two surfaces pertain to the same object if a movement carries both from one place to another without disturbing the connection between them (Spelke 1982: 426). These are apparently the only principles that infants follow in perceiving objects visually. Thus, their cognitive systems are different from those of adults in that they do not follow a Similarity Principle (grouping in a single object surfaces identical in colour, texture, or shape). They also do not group together surfaces so as to create objects with simple shapes and smooth edges, which means they do not follow any principles of form (Spelke 1982: 427). Rather, the static gestalt principles do not emerge until infants are about six months old. Spelke (1985) reports on habituation experiments involving stationary objects adjacent or separated in depth. Infants were presented with displays of one or two objects (solid, rectangular blocks of various sizes and shapes) and then looked at displays of one block, of two blocks separated in the frontal plane, of two adjacent blocks, and of two blocks separated in depth. Spelke reports (1985: 95) that only a minority of infants habituated to a change in the number of objects. Those that did looked longer at displays of two objects separated in depth. Spelke argues that infants begin life able to perceive certain objects as unitary and bounded, a capacity which enables them to see objects as persisting even when the objects move in and out of the visual field. Baillargéon (1986) has studied infants’ abilities to make inferences about
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
217
objects hidden from view. Five-month old infants saw displays involving a bright yellow block sitting on a table. In front of the block, lying flat on the table, was a screen. The screen was able to rotate on its (from the spectator’s perspective) far edge so that it could move up and towards the block. At a 70° rotation, the block was fully occluded for the spectator. If the screen continued moving towards the block, it would make contact at 120° rotation. On half of the trials viewed by the infants, the screen did make contact with the block. On the other half, however, it continued its rotation into the space which should have been occupied by the block. Infants were habituated to displays in which the screen rotated in the absence of the block. They also saw displays in which the block sat next to the screen, out of the way of its rotation path. Infants who saw the block looked longer at displays in which the screen rotated to a full 180°. This suggests that they were surprised by this display, and that in turn licenses the inference that this movement was inconsistent with their mental model of object motion and the persistence of objects. The conclusion to be drawn is that they perceive an object to persist even when it moves behind another object. Spelke (1985: 103–4), in discussing this conclusion, points out, however, that there is no reason to assume that objects always persist when they come to be occluded by other objects. She notes that objects which enter mouths and incinerators do not normally emerge from these places preserved intact. Our ability to make decisions about the persistence of objects in various situations therefore depends on our mental models about those objects in particular kinds of relations. See also Spelke (1990). Spelke (1988: 198) states: In our studies of human infants, the organisation of the perceptual world does not appear to mirror the sensory properties of objects, it does not appear to follow from gestalt principles of organisation, and it does not appear to depend either on invariant-detectors or modality-specific modules. Our research suggests that these views are wrong for a common reason. All assume that objects are perceived; that humans come to know about an object’s unity, boundaries, and persistence in ways like those by which we come to know about its brightness, colour, or distance. I suggest, in contrast, that objects are conceived: Humans come to know about an object’s unity, boundaries, and persistence in ways like those by which we come to know about its material composition or its market value. That is, the ability to apprehend physical objects appears to be inextricably tied to the ability to reason about the world. [Emphasis in the original, SEC]
As noted, Baillargéon (1986) has shown that infants as young as 6-months exhibit object permanence. Babies can recognise objects as specific individuals. They can recognise the basic object level of categorisation. They can identify
218
INPUT AND EVIDENCE
certain thematic relations among objects in events — causality, space, and time. Massey and Gelman (1988) show that three and four-year-old children have learned to use the complex surface properties of objects in order to distinguish those that do and those that do not have a capacity for self-generated motion. They know, for example, that animals can move even if they cannot see the animal’s feet or wings. They know that statues do not move, even if the statues look like animals. Premack (1990) has argued that children have an a priori notion of self-propulsion which is the basis for their developing theories of animacy. Golinkoff, Harding, Carlson, and Sexton (1984) have shown that infants sometime around year two exhibit behaviour suggesting that they expect animate and inanimate objects to behave differently in causal events, the latter requiring physical contact and possibly force to make an object move, the former not. Golinkoff (1975), however, has results which are inconsistent with the claim that toddlers represent the same animacy concepts as adults. She had two groups of male infants (14–18 months, and 20–24 months) watch films involving a table and a man or a man and a woman. The children were habituated to events in which the man pushed the table from left to right and to a second film in which the man pushed the woman from left to right. The experimental films involved events where the action roles were preserved but the direction of action was not (position-direction reversals). In this case, the man pushed the table or the woman from right to left. In the third event type, the direction of action was preserved but the action role was not, in which case the table was seen as pushing the man, or the woman was seen as pushing the man (action role reversals by position). In the fourth event type, both action and direction changed (action role reversals by direction). Changes in action role reversals by either position or direction in the film involving the man and the table created anomalous events. The children watched the events involving the table and the man longer than those involving the man and the woman. However, action role reversals by direction were watched significantly longer than position-direction reversals, suggesting that the children were not simply mapping notions of selfpropulsion onto notions of agency. Rather, shifts in the direction were cueing them to shifts in action. I have stressed the fact that humans encode contingencies as part of i-learning. Gelman (1990a: 90) also argues that learning the predictive validity of cues in a domain is a general process exhibited by humans and other animals. Watson (1984), however, shows that simple contingency of events cannot be the basis for inferring causality, rather, infants appear to enter the world “prepared to analyse its dynamic causal fabric” (Golinkoff et al 1984: 151). This in turn is determined by concepts of what can act on what. Moreover, the fact that a cue
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
219
is valid does not entail that it is a defining property of an object. Golinkoff asks what determines category membership and replies: I propose that implicit domain-defining principles specify the core of many of the concepts and categories with which young children learn to sort the world. Domain-specific principles function to direct attention to the objects, events, or attributes that are relevant exemplars. The examplars in turn feed general processing abilities, including the ability to extract from stored information the predictive validity of the characteristics of the items assimilated to a domain. (Gelman 1990a: 90)
Keil (1979, 1989) has developed a theory of natural kind and artefactual concepts as relational entities which are structured, and which instantiate systematic sets of beliefs.10 These sets of beliefs are like “theories” and mental models in that they are causal in nature. He too specifically argues that even young children do not rely solely on counting attributes and correlations in constructing classes of things. For example, children of kindergarten age do not judge that temporary alterations in the appearance of an animal, nor its wearing a costume change the category of the object. In other words, a cat that temporarily looks like a skunk is still a cat.11 Carey (1985) has produced similar results showing the complexity of children’s “theories” about biological kinds. Markman (1989) has examined the nature of basic classificatory abilities in a large number of studies and has shown that children are biased right from the start to assume that nouns refer to whole objects rather than to object parts (the Whole Object Constraint). They also assume that nouns refer to objects of the same type, namely objects at the basic level of categorisation (the Basic Level of Categorisation Constraint). In other words, they cognise, by about 18 months, that objects are members of a class, and they also cognise that names refer to the class rather than to the individual members or to supersets (see also Katz, Baker, and Macnamara 1974; Macnamara 1982). They thus cognise that references to a cat are different in kind from references to Fluffy, although they are clearly hearing both kinds of nouns. Young children notice and represent both thematic relations and taxonomic relations but over time develop a preference for the latter (the Taxonomic Constraint). They do not assume that thematic relations are good candidates for noun meanings. In particular, they assume that term labels refer to taxonomic categories and they are biased to assume that novel names refer to whole objects (Markman 1989: 27). They are biased to assume that objects with different names belong to mutually exclusive categories. By the age of 4, they know that membership in natural kind categories will support rich inductive inferences about novel objects. If some thing is identified as a cat, the
220
INPUT AND EVIDENCE
child (and the rest of us too!) can infer that it will be active at night, hunt mice, like catnip, have fur, meow, and so on, once these properties of cats are mentally represented. Markman argues moreover that children manifest several types of classification strategies from a very early age — not only exemplar-based, but also feature-based (Markman 1989: Ch. 1). See Gelman (1990a) as well. Gelman and Gallistel (1979), Starkey and Cooper (1980), Starkey, Gelman, and Spelke (1983), Starkey, Spelke and Gelman (1990), and Gelman (1990a, b) have examined the origins of numerical cognition. They show that infants as young as 22 weeks old can discriminate the exact number of items in stimulus arrays consisting of not more than four items, a property we apparently share with certain primates. This ability relies on, but is distinct from, a rapid perceptual enumeration process called “subitizing”. Gelman (1990a, b) has argued for structural constraints on cognitive development which direct the child’s attention to relevant stimuli and direct learning of number concepts. Johnson and Morton (1991) looked at the development of face recognition and showed that neonates preferentially attend to static stimuli that have face-like arrangements of features. Johnson (1990a) has hypothesised that there are specific subcortical mechanisms (an orienting mechanism, and a behavioural mechanism, mediated by the subcortical visual pathway) which guarantee that faces and face-like objects are attended to at birth. Johnson (1990b) observes that preferential tracking for static face-like patterns, however, disappears between 4 to 6 weeks as developing cortical circuits inhibit the activity of the subcortical system. At this stage, a second system, which depends on the cortical system, and requiring exposure to faces, takes over. This second system interacts with i-learning since increased exposure to face stimuli increases the specificity of the input needed to activate it (Johnson 1990b: 152). Johnson (1990b) argues for four different visual pathways, each one operating according to maturational development. The relevant neurological structures quickly become specialised for faces.12 The face-recognition mechanisms become modularised as well. Schematic faces are not preferentially attended to by 5-month olds. However, if internal movement is given to face-like configurations, in other words, if the stimuli become more specific, the effect is restored. Johnson (1990b: 159) remarks: in the first few weeks of life developing cortical circuits in the infants [sic] brain are preferential [sic] exposed to classes of stimuli for which they will subsequently become specialised. This preferential input is ensured by primitive sensory motor systems such as the tracking of face-like patterns.13
See Karmiloff-Smith (1992: Ch. 5), and the papers in Mehler and Fox (1985) for more discussion of the development of face recognition.
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
221
To sum up, what this rich and varied literature shows is that in nonlinguistic domains of cognition, children reveal invariant tendencies to attend to specific sorts of stimuli in the environment, which means specifically that they are not processing stimuli in a random fashion. They organise perceptual information in complex ways, and quickly develop highly specialised knowledge. This ability relies on both a priori perceptual processes which are domainindependent and a priori representational primitives which are domain-specific. These directed searches may be guided by the maturational patterns of the visual pathways. Young children also have tendencies to relate language (nouns) to objects in quite specific ways (the Taxonomic Constraint, the Whole Object Constraint, and the Basic Level of Categorisation Constraint) which determine how individual and class names are mapped to concepts. Infants therefore have certain types of basic knowledge about the nature of objects, knowledge which develops over time into richly structured beliefs about categories, the relations among the members of categories, and how we typically talk about individuals, classes and the properties they exhibit. These structured belief sets allow the young child to make inferences about the nature of categories and their members in a given cognitive domain. These structured beliefs in turn can direct attention to stimuli in the environment. This research thus shows a complex interaction between a priori representational systems which are far richer than has been traditionally believed, and perception. Certainly they demonstrate that hypothesis (6.1b) is not tenable and must be abandoned. However, as noted above, the same literature also reveals that small children do not carry out perceptual, categorisation and inferencing tasks in the same way as older adolescents and adults. This is because categorisation and inferencing depends on mental models, and the mental models of infants and very young children are impoverished in comparison to those of adults, and, in some respects, just plain different. Infants are therefore comparatively underdeveloped or cognitively immature. Let us refer to this modified version of (6.1b) as the Hypothesis of Cognitive Immaturity. (6.2) The Hypothesis of Cognitive Immaturity Children’s non-linguistic cognitive systems are immature in comparison to adult systems, meaning that they are not as rich in content and structure, and are deployed differently from the mature systems.
As a weakened version of the Hypothesis of Cognitive Innocence, it might appear as if (6.2) can be adopted both by those who believe in Universal Grammar in the domain of language acquisition, and those who do not. One either assumes that linguistic cognition and non-linguistic cognition exhibit parallels, both deriving from skeletal domain-specific primitives, or, one argues
222
INPUT AND EVIDENCE
that language is still “different” and requires a much richer set of a priori organisational primitives. But observe that (6.2) plays an essential role in the argumentation for UG in linguistic cognition. The argument goes this way. If, by hypothesis, children have no prior knowledge of what grammars ought to be like, then they will need comparatively well-developed and powerful cognitive systems to compensate them for any deficiencies in the stimuli. But if they have comparatively immature perceptual and representational systems, this increases the likelihood that they will not attend to, encode, or store in memory relevant stimuli. Moreover, if they have limited knowledge of what sentences and words might mean because they have limited theories about the organisation of the world, this decreases the likelihood that they can make correct inferences about sentence structure and grammatical form on the basis of an understanding of sentence and word meaning. If infants and young children have limited inferencing capacities, they are less likely to induce the correct conclusions from appropriate input even when it is correctly encoded. Finally, if they have limited metalinguistic capacities, then they will not understand and will not be able to use explicit instruction about language or metalinguistic feedback and correction. In summary, even if one adopts the Hypothesis of Cognitive Immaturity, which attributes much greater a priori knowledge to the child in non-linguistic domains of cognition, then one still cannot, in principle, explain how children could ever induce the linguistic representational systems (the phonology and the morphosyntax) given impoverished stimuli. 2.2.4 The Poverty of the Stimulus Hypothesis Hypothesis (6.1c) is the final piece in this puzzle. In its present form, it expresses two separate claims. The first is that the information relevant for inducing a grammar comes from the environment, in the form of utterances produced in meaningful interactions with other speakers. This formulation is innocent and uncontroversial and does nothing more than reflect the fact that language acquisition is based on exposure to stimuli. Hypothesis (6.1c), can be, and often is, however, construed as a claim about the functional architecture of the mind. Thus, from the separate claims that learners acquire a psychogrammar from input, and input equals stimuli, one can be led, not quite so innocently and uncontroversially, to the claim that acquiring a grammar necessitates a version of modularity. If grammatical acquisition is dependent on sentence processing in a bottom-up fashion, then one would conclude that grammatical acquisition cannot be influenced by conceptual information. This follows if one understands (6.1c) to mean that the learner acquires her language only on the basis of input in this restricted sense. Modularity then would lead to the conclusion that psychogrammars cannot
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
223
be induced. I shall set the Modularity Hypothesis aside until Chapter 7. For present purposes, (6.1c) does not entail modularity. It says, merely, that stimuli are impoverished with respect to the grammatical knowledge that speaker/hearers manifest. There are three significant ways in which linguistic stimuli are impoverished: They are uninformative about critical properties of grammatical representations; they are incomplete; they are deviant but their deviancy is not signalled. Let us take these claims in order. We can understand (6.1c) to say that the speech stream, is uninformative about the grammatical categories, relations and constraints that speaker/hearers ultimately cognise. Therefore, the linguistic stimuli are uninformative about the nature of the morphosyntactic and phonological representational systems which are assumed to characterise the adult grammatical knowledge systems. They are therefore also uninformative about the grammar needed to generate the strings of a particular language. A consideration of even some simple properties of grammars will show the logic of the argument. There is no information in the speech stream about the categories of words comprising a sentence. Assume that it is correct to characterise the grammatical knowledge of English speakers in these terms: They cognise that the sentence The birdfeeder is empty consists of a set of four words corresponding to the abstract syntactic categories det(erminer), noun, verb and adj(ective). Now observe that there is nothing in the acoustic-phonetic properties of the sentence which indicates that the first word is a det, the second a noun, or the third a verb. Indeed, the problem is more serious because there need not be, under normal pronunciation of the same sentence, any indication that it consists of words nor even about where the word boundaries occur. This problem has been extensively studied by structuralists under the name of juncture (see Goldsmith 1990, for discussion). The larger morphosyntactic units of a sentence are not isomorphic to the acoustic-phonetic ones either (Selkirk 1984; Nespor and Vogel 1986). Indeed, explaining how hearers extract phonological units from the speech continuum remains one of the great puzzles of linguistic cognition (Pisoni and Luce 1987; Cutler, Dahan, and van Donselaar 1997).14 Exactly how such phonological units map onto morphosyntactic ones or conceptual ones has yet to be determined, although the problem is currently getting a certain amount of attention in FLA research (Morgan, and Demuth 1996). Comparable arguments can be constructed looking at the correspondence issue from the conceptual-syntactic side too. It is well-known that the correspondence between the meaning of a sentence and its morphosyntactic structure is also not one-to-one but rather many-to-many.15 Therefore, a learner inferring an interpretation of some string from the context would not be able to map the
224
INPUT AND EVIDENCE
meaning of the sentence onto a unique structural analysis of the stimulus simply on the basis of his conceptual representation of it. Strings are, moreover, often multiply ambiguous. This can be illustrated using Chomsky’s famous example Flying planes can be dangerous. It can attribute the property of dangerousness to objects or to an event. It is easy to imagine many contexts (standing on an airfield looking at planes overhead, for example) where the context simply would not permit us to choose one interpretation over the other and therefore to choose one syntactic analysis over the other. The ambiguity problem of the input is not insignificant. We cannot allow language acquisition to be sensitive to, or derailed by, such instances. Attempts to counter the claim that the linguistic stimuli are uninformative about the abstract properties of syntactic knowledge often start from an undeclared and undefended presupposition that the input to language acquisition involves some already analysed representation. In other words, the presupposition is that when learners learn language, they already know what counts as a sentence, or a word of their grammar. Slobin’s operating principles, for example, appear to be based on this assumption. Of course, if it were true that learners had some pre-analysed representations incorporating words and sentences, this would amount to the rejection of the claim that they are linguistically innocent. In the past, the presupposition has usually been born out of sheer ignorance about the true nature of linguistic stimuli. What learners initially confront is a wall of sound. Getting over that wall to some minimal morphosyntactic and semantic representations of the speech signal amounts to a significant feat of acquisition. Notice, however, that even if we assume that the learner can somehow represent sentences as strings of words, it still turns out to be true that intake or minimally analysed types of input are unrevealing about many properties of grammars. Strings of words say nothing, for example, about the structure dependency of grammatical processes. It is this property of grammars which explains why (6.3b) is a well-formed global or yes/no question version of (6.3a) but (6.3c) is not. (6.3) a. The rose which is blooming in the garden is an Austin rose. b. Is the rose which is blooming in the garden an Austin rose? c. *Is the rose which blooming in the garden is an Austin rose?
This often-used illustration of structure dependency hinges on the fact that yes/no question formation in English requires the main copula verb is in (6.3a) to appear to the left of the complex noun phrase (NP) subject the rose which is blooming in the garden. In particular, it is not possible for just any verb to appear at the front of the question. More generally, it is the case that syntactic operations respect the
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
225
constituent structure of the sentence. It is this constraint which goes under the name of structure dependency, discussed in previous chapters. However, knowledge of the well-formedness of (6.3a) does not entail that the corresponding question must conform to (6.3b), or more generally that question formation as a grammatical process conforms to structure dependency. Since all natural languages appear to manifest structure dependency, the question then naturally arises: How does the acquirer come to know this? How could an acquirer induce this property of grammars? There are no obvious answers based solely on the properties of the stimuli. Even if one just waves one’s hand at the problem and responds that learners just do infer the property, the next question becomes: What requires the inference? Again, there are no obvious answers based solely on general properties of induction, and the nature of the stimuli. The only plausible answer, one made explicitly by Slobin (1991), is that structure dependency is a general property of all cognition. That language manifests it is therefore not surprising, or so Slobin surmises. This response, however, overlooks one significant fact: Different domains exhibit sensitivity to different units of structure. The examples in (6.3) illustrate more than just the fact that language manifests structure dependency, they illustrate the nature of that structure-dependency. The point about (6.3) is that movement operations are sensitive to structures unique to language: phrasal categories, and tense. This is surely a surprising result if grammars are induced from properties of conceptual representations or from acoustic-phonetic stimuli. There are other problems with the traditional idea that people induce basic grammatical categories or primitives and grammatical constraints from linguistic stimuli. One problem raises difficulties both for the claim that learners could induce a morphosyntactic representational system, and for the claim that they could induce the psychogrammar of the L1 without an a priori morphosyntactic representational system to guide them. Input is incomplete. One of the assumptions we must make in moving from the logical problem of language acquisition to the empirical one, is that real learners get exposure to only finite amounts of the language. No learner of English hears all possible sentences of English. As we noted above, no learner could therefore learn English through simple enumeration of the strings of the language. At best exposure, the learner’s input is merely representative. A related problem is what I call the sampling problem; the learner at any one time has been exposed to only a finite subset of the sentences of the language but has linguistic knowledge which goes far beyond the properties of the sample. This fact leads to a related problem: the uniformity problem. We must assume that learners hear linguistic stimuli which are variable in any number of ways. Learners might attend closely to different details of the stimuli
226
INPUT AND EVIDENCE
and end up extrapolating quite different sorts of generalisations about the language they are learning. If the progress that learners manifested was also highly variable, that might suggest the influence of the stimuli and environmental variability. However, L1 learners show remarkable uniformity in their development in this respect: for certain grammatical phenomena (see Brown 1973; Ingram 1989; Meisel 1986, 1990a, b, 1992, 1994a, 1994c, for numerous examples), children go through the same stages in the same order, although not necessarily at the same age or speed. Since i-learning is experience-dependent and children’s experiences can vary in infinite ways, how can this be explained within a general theory of induction? This question becomes particularly pressing when one observes that children’s linguistic development manifests variability with respect to the acquisition of other phenomena (Vihman, Ferguson, and Elbert 1986; Bates, Bretherton and Snyder 1988; Dinnsen 1992). What theory of induction could explain both of these patterns of development without recourse to cognitive or linguistic universals? Yet another problem is the robustness problem. Learners must not be led astray by properties of odd, unique, infrequent or even ill-formed stimuli. We saw in Chapter 4 (the Unusualness Constraint of (4.12)) that humans are sometimes prepared to pay close attention to unusual events and to alter their mental representations on the basis of the properties of those events. Selective attention is also clearly a factor in first language acquisition (de Boysson-Bardies, Vihman, Roug-Hellichius, Durand, Landberg, and Arao 1992). This, however, creates a dilemma for general learning theories. Even small amounts of odd or ill-formed stimuli could lead to serious misanalyses if learners were inducing their L1 grammars by attending closely to all aspects of the stimuli and encoding unusual features. Learners are apparently impervious to certain kinds of “noise” in the stimuli. What explains that? Every learning theory must explain precisely which unusual properties of language are ignored and which get encoded. It isn’t clear how induction alone can do this. This factor interacts with the issue of the time allotted to learning. We observed in Chapter 4 that construing language acquisition as an unstructured search among possible hypotheses leads to intolerable computational results.16 The ability to disregard misleading or highly variable aspects of the input would appear to be part and parcel of a “structured search.” The vital question is: What is directing the search? 2.2.5 Summary The truth of the claim that hypotheses (6.1a–c) (or (6.1) and (6.2)) cannot be simultaneously held should now be obvious if we grant that knowledge of a psychogrammar consists of knowledge of abstract grammatical categories and
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
227
constraints like structure dependency. The internalised information that constitutes knowledge of the properties of the grammatical representational systems must come from somewhere. If the information comes from the acquirer’s experience, it must be directly represented in the linguistic stimuli. Or, if it is not directly represented in the linguistic stimuli, the acquirer must induce its presence on the basis of some other information, or else she must transfer it from some other known cognitive domain. Both the induction solution and the transfer solution entail that the acquirer possess this other sort of information and the requisite inferential capacities, including an appropriate representational system capable of encoding inferences. If the information cannot come from the environment, or if the learner does not possess the necessary cognitive mechanisms to infer it, or if it is assumed that the learner’s conceptual/inferential system cannot represent the right inferences because it is too immature, then the information must be “innate”. There are no other logical possibilities. The solution to the logical problem involves abandoning at least one of the assumptions (6.1a, b, or c), or (6.2). Any one can go. In addition, we could abandon the claim that knowledge of a language involves knowledge of abstract properties of grammars, which is the route connectionists like Bates and MacWhinney would like to take. I am, however, not really interested in the logic of the matter so much as how to interpret grammatical acquisition when defined in terms of real people. We must make rather different assumptions in the case of primary and second language acquisition.
3.
The empirical facts from first language acquisition
3.1 From FLA to SLA? It is commonplace for researchers to approach the problem of SLA by discussing the nature of first language acquisition. One can find this approach exemplified not only in discussions of the logical problem of (second) language acquisition, but also in characterisations of the developmental stages that learners actually pass through. I have followed this tradition. I want to make the case now, on the basis of a consideration of the logical problem of language acquisition, and the facts of first language acquisition, that both the logic and the empirical facts concerning it are different from those relevant for SLA. What then are the considerations from the L1 developmental literature which inform the logical problem and help us to decide which of the assumptions (6.1a–c) or (6.2) must in fact be abandoned?
228
INPUT AND EVIDENCE
3.2 Input consists of more than strings of forms 3.2.1 Meaning as input General nativists have argued that children induce grammars on the basis of meaning, via the grammaticalisation of meanings. The basic idea is that if structural information may not be derivable from properties of the speech stream, as we saw above, it may nevertheless be derivable from properties of conceptual representations. The claim is that grammatical constructs are encoded in the earliest stages of L1 acquisition as concepts in conceptual representations. There are two possible interpretations of this claim. One is that children use meaning to discover the properties of the syntax. This claim does not necessarily deny the prior existence of UG. Indeed, it is commonly asserted to be true, as the discussion of the Semantic Bootstrapping Hypothesis made clear. The second claim is that the syntactic representational system itself is derived from meanings. This version does deny UG.17 Only concepts and conceptual representations are said to exist a priori and they are said to “evolve” into grammatical notions and categories. Despite vagueness in the literature about how this evolution occurs, this idea is not without interest for us. On the one hand, virtually every major contemporary theory of psychology has adopted a position much like this. Functionalist linguistic theories can, under one interpretation, be construed as making the same claim. On the other hand, Gold’s (1967) learnability research demonstrates that the class of natural language grammars is induceable, in principle, if the input includes a correspondence between each sentence and a representation of its meaning. Thus, if the conceptual code is innate (or given in advance), and if the properties of this code are kept constant, then a grammar could eventually be learned-in-the-limit. There is an obvious problem with this solution when transferred to the realm of primary language acquisition and when one moves beyond the simplest problems like correspondences such as INDIVIDUAL-noun or ACTION-verb. It forces one to abandon the Assumption of Cognitive Immaturity (6.2). This is one assumption, as we have seen above, for which there is considerable empirical motivation. It is implausible to think that children have complete and welldeveloped representations of sentence meanings which they can put in correspondence with linguistic forms in order to learn the grammar of their language. Thus, for example, we have no reason to believe that children have a complete set of quantifiers available at birth and can map these onto words like every, all, some or many. Studies by Drozd, Philip and Musolino and Crain into children’s interpretations of negation and quantification show that they manifest distinctly non-adult properties, but properties which are nonetheless fully compatible with
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
229
the nature of the stimuli they get (Philip 1991; Drozd 1996; Musolino and Crain 1999). We reviewed literature above showing that the conceptual system of children is clearly distinct from that of adults, e.g. that their object concepts are different. At best, then, the grammaticalisation claim is that some sorts of primitive conceptual notions can be equated with some sorts of primitive grammatical categories. How the adult system of grammatical categories and constraints (structure-dependency, subjacency, etc.) emerges or even how the child develops an adult-like conceptual system is left unexplained. Even if we assume that infants begin with a fairly rich repertoire of innate conceptual categories, something I have tried to motivate, this assumption does not explain how they make the correspondence between events in the world, meanings and linguistic strings. Conceptual innateness does not resolve, in other words, how a child comes to believe that sentences mean what they do, i.e. that the strings Don’t eat those flowers (pointing at some hellebores) They’re poisonous could be true of particular objects in the garden, and lead him to act appropriately (i.e. to refrain from eating those particular pink-flowers-over-there, not giving them to his little brother to play with, etc.). So there must be innate principles regulating the correspondences between language and the conceptual system of precisely the sort that Markman (1989) has proposed. And even if we assumed that children have far richer conceptual systems, and we build in universal mapping principles between the conceptual categories and the morphosyntactic ones, it remains true that the two representational systems are far from isomorphic. Therefore we must assume that moving from conceptual representations to grammatical representations involves more than establishing a one-to-one correspondence. Children must be doing some complex inferencing. For example, in the adult language, noun phrases do not refer merely to INDIVIDUALS. They can refer to ACTIONS (The leaf raking took hours), EVENTS (The leaf raking has been completed), STATES (Sleeping is necessary for good health), or nothing at all (It seems unlikely that it will snow in July). Similarly, verb phrases can refer to more than just ACTIONS. Without UG, we would have to assume that children can use categories only to express initially a single conceptual type (INDIVIDUALS and ACTIONS), and that they must be doing some serious grammatical restructuring as their conceptual system or the correspondence patterns between, e.g. individuals and nouns changes. I know no evidence demonstrating such serious grammatical reorganisation. Grammatical organisation on the basis of syntactic categories is apparent early on. More importantly, the essential properties of the morphosyntactic categories (major referential categories and functional categories) are acquired long before the child’s semantic system is fully worked out.
230
INPUT AND EVIDENCE
Finally, consider that functionalist approaches have no answer to the basic question of why syntax should exist. How could, and why should, a representational system of one sort (a system in which categories likes EVENT, STATE, INDIVIDUAL, PROCESS are encoded) turn into a separate system in which categories like sentence, verb, noun and adjective are encoded? We will find no account in the first language acquisition literature of the mechanism which causes the evolution of conceptual representations of utterances into separate conceptual and morphosyntactic representations.18 In other words, we will find no description of the cognitive mechanism whose function is to convert semantic representations into an autonomous format. Asserting that grammar develops out of “meaning”, therefore, is itself meaningless when it is taken to be a response to the question: How do children come to have a syntactic representational system? A corresponding criticism can be constructed regarding claims that knowledge of the phonological representational system is induced on the basis of an a priori perceptual representational system. There is no evidence that the structural primitives of the phonological code are induced or are inducible from the features of the speech stream. 3.2.2 Feedback as input In much of modern psychology, the nature of the input to language learning has not been debated, it is merely presupposed that learners get enriched information in the form of feedback and correction. Feedback is the answer given to the charge that linguistic stimuli are deviant, incomplete and unrevealing about the structural properties of grammars. It is also the answer given to the problem of constraining induction. Developmental psycholinguists, in contrast, charged with the task of explaining what small children actually say and understand, have had to critically examine the nature of the stimuli children are exposed to. What do they look like? As with much else in the domain, there is controversy about what the facts actually are. A small but important body of research has claimed that children do not get explicit correction about abstract structural properties of grammar, in particular about syntactic mistakes (Brown and Hanlon 1970; Hirsh-Pasek, Treiman, and Schneiderman 1984). Snow and Goldfield (1983), nonetheless, report that erroneous choice of vocabulary is systematically corrected. Thus, caretakers enrich the input serving to establish lexical sound-meaning correspondences. Moerk (1991) also shows this to be the case. See also Moerk (1983a, b) and the references therein to further studies by Moerk. These latter studies show that it cannot be claimed that correction does not exist. The only bone of contention can be whether there is explicit grammatical correction, that is to say,
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
231
correction related to structural properties of words or sentences. Unfortunately, the empirical studies focusing on this question are not all that revealing for three reasons. The first is that the characterisation of a correction as structural depends upon a theory of structure. The literature asserting or denying structural correction in FLA almost never makes explicit its structural assumptions. It is simply assumed that if a correction could in principle be analysed in a given way, it must in fact be so analysed by the child. Nothing could be further from the truth. Secondly, there has been no attempt in the corpus studies to show that the caretaker had, what I shall call, a structural corrective intention. It is naive to assume that a parent who repeats or re-words a child’s two word utterance necessarily has a corrective intention, that is to say, that the point of the repetition is to teach the child something specific, and specifically grammatical, about the L1. The fact that parents repeat both correct and incorrect child utterances (Gropen, Pinker, Hollander, Goldberg, and Wilson 1989; Pinker 1989) suggests, on the contrary, that this particular parental behaviour might have a variety of causes. It would also be naive to assume that parents who do not repeat or correct haven’t noticed the child’s error. We are not driven to comment on errors we perceive. I will come back to this point later in the book. Thirdly, no one has presented any evidence that children construe corrections, repetitions, and recasts as corrections. In Chapter 10 I will demonstrate that correction is not an intrinsic property of an utterance, but rather a construal of the interlocutor’s behaviour. To count as explicit structural correction an utterance must be so construed by the child — the psycholinguist’s interpretation of the same utterance is irrelevant. A second important claim is that infants and young children do not understand explicit feedback or correction, and do not use it, when they do get it (McNeill 1966; Braine 1971; Platt and MacWhinney 1983). This failure can be directly explained by the fact that interpreting linguistic feedback requires constructing conceptual representations which encode linguistic constructs (sentences, noun phrases, syllables, etc.) as objects of thought. In other words, the child has to possess a metalinguistic capacity which will permit her to think about language in much the same way that she might think about her dog or a favourite toy — as entities in the world, as possessing properties that can be predicated of particular individuals, and so on. Inductive extrapolation involving units of language requires at least this much. But metalinguistic capacity of this sort is not manifested in the earliest stages of language acquisition and appears to follow a maturational path (Gombert 1992). At 2 or 3 years of age, children do not appear to be able to make sense of questions like “How many words are in the sentence Johnny likes to watch television?” which suggests that they cannot conceptually encode the notions WORD or SENTENCE. See Bialystok (1979,
232
INPUT AND EVIDENCE
1982, 1986a, b) for further discussion of the emergence of these abilities. If children do not possess at a given stage the requisite metalinguistic capacities, they clearly cannot construct the appropriate conceptual representations of the contents of the intended feedback. Another tack has been to argue that input really is enriched, but enriched by the caretaker in an indirect way. Indirect feedback is said to consist of differentiated responses by caretakers to well- and ill-formed utterances produced by the learner (Hirsh-Pasek et al. 1984; Demetras, Post, and Snow 1986; Penner 1987; Bohannon and Stanowicz 1988, among others). These studies show that caretakers have tendencies to repeat children’s ill-formed utterances. The idea behind the claim that this is useful to the child is that he can notice the differential responses to his own utterances, infer that there is some meaning to this behaviour, infer that the meaning is that his own output is deviant, and then attend to the problematic output. Moerk (1983a, 1991), for example, has claimed that expansions of children’s utterances are “necessarily” corrections. This argument is fallacious. It has been noted before (Gleitman, Newport, and Gleitman 1984; Pinker 1989) that these studies are correlational and therefore cannot in principle establish a causal relation between what caretakers say and children do. But the situation is much worse; the very logic behind this research is faulty. As noted above, very young children do not have the requisite metalinguistic capacities during early stages of acquisition (up to approximately age 5) for drawing the inferences that indirect forms of feedback are supposed to lead to. In Chapter 10 I will demonstrate that interpreting metalinguistic feedback requires some very sophisticated inferencing based on violations of Grice’s Relevance Principle. The more explicit the feedback, the easier it is to draw appropriate conclusions from it. Implicit feedback does not state the relevant information, and frequently obscures the corrector’s corrective intention. This makes the learner’s task harder, not easier. Even if we set aside this critical problem, the arguments still do not hold. If indirect feedback is to be useful and not misleading, children would need to get it systematically, but they do not (Gropen et al. 1989). Moreover, the older the child is (and therefore the more likely to be able to interpret indirect feedback), the less likely caretakers are to provide it (Pinker 1989: 10–15). Pinker has also argued that feedback from parents is informationally limited in the sense that it does not distinguish types or the sources of error. So the child has to figure out on his own what his mistake has been. As noted above, responding to indirect forms of feedback therefore increases the amount and complexity of the inferencing required; it does not reduce it. Pinker also points out that the provision of feedback may be culturally or even class-restricted (the sampling
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
233
problem again); there is no evidence that children use it, nor that it is necessary to recover from an erroneous analysis. It is therefore impossible to see how the provision of indirect sorts of feedback solves the problem of how children come to have grammatical representations. 3.2.3 The Simplified Input Hypothesis Some input research has focused on other properties of the stimuli. One body of research has attempted to prove that Chomsky’s (1965: 200–1, ft. 14) remarks about linguistic stimuli relevant for language acquisition containing slips of the tongue, false starts and other performance errors cannot be substantiated. The first result was to show that properties of adult speech are differentiated according to the addressee, in particular, that while it may be true that adult speech to other adults includes ill-formed utterances, the speech of adults to young children is largely well-formed (Snow 1972; see also the contributions in Snow and Ferguson 1977, in particular Snow 1977). This research thus rejects the deviance claim. It also shows that caretaker speech is simplified in that it consists of non-recursive structures (simple sentences and simple noun phrases), and is deictic. Well-formed simplified input, it has been argued, thus solves the logical problem of language acquisition, dispensing with the need for recourse to UG. The Simplified Input Hypothesis is based on the idea, not only that the child hears simplified language but also that she proceeds in step-like fashion learning first the properties of simple sentences and then the properties of more complex ones. This predicts, for example, that children learn first how single verb main clauses are organised (simple intransitive and transitive sentences) and then learn embedded and complex complementation structures. The conclusion does not follow, however, as is obvious when one considers closely particular i-learning problems involving recursive structures. Consider in this light the fact that children acquire complex sentences (relative clauses, subordinate clauses, clefts, etc.). The research on caretaker speech to children shows that these constructions are available in considerably smaller numbers than simple sentences. Morgan (1989: 352), for example, has tabulated 30 hours of speech transcripts from the American English corpus of Roger Brown. This tabulation shows quite clearly that the vast majority of sentences to the children Adam, Eve and Sarah involved unembedded simple sentences. Sentences with more than 1 degree of embedding were virtually absent. I reproduce a simplified version of the tabulation in Table 6.1.
234
INPUT AND EVIDENCE
Table 6.1. Complexity of language input (reconstructed from Morgan 1989: 352)
Stage I Stage II Stage III Stage IV Stage V Observed total Projected totals
Simple sentences
1 degree of embedding
93.28% 93.55% 89.70% 91.05% 87.69% 7,976 3,900,000
06.38% 06.08% 09.73% 08.09% 11.63% 742 360,000
The projected totals were calculated by extrapolating the hourly rate of speech over eight hours per day × seven days a week × five years.
If we consider first the frequencies, it is clear that simple sentences might be substantially more important to children for i-learning the properties of their language than complex ones. Indeed, the frequencies of sentential complements and relative clauses are so small that children might ignore them and treat them as “noise” in the data like the foreign accent of the occasional visitor. Moreover, it is noteworthy that the differences are stable over time. Parental speech to children becomes more complex along a number of dimensions, but not apparently this one. These data clearly bear out the claim that children hear simple sentences. The Simplified Input Hypothesis then asserts that this type of data will make inducing the properties of the complex structures easier. The argument, however, is based on an unsubstantiated assumption: that the syntactic organisation of complex sentences is a simple extension of the syntactic organisation of simple ones. But just the opposite is sometimes true. In many languages certain basic properties of morphosyntactic organisation will only be revealed by the complex sentences. If, for example, the evidence for i-learning constituent order in German is revealed by the place of verbs in embedded sentences, which are verb final (SOV order), or in sentences with infinitives (as Lightfoot’s (1991) discussion proposes) then clearly children cannot decide on basic phrasal order solely on the basis of simple single verb utterances, where the input can be analysed as SVO or OVS but not *SOV. The only way around this would be to argue, as Lightfoot does, that the most obvious cues as to verb position, namely the position of the main verb, are not in fact the most important cues for the learner. However, in the absence of an account of why the most direct cues are ignored in preference to indirect cues, in other words, in the absence of a theory of grammar which explicitly links the position of separable verb particles or
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
235
tense to an “underlying” position of verbs, such a move is suspect. So the problem remains that one cannot get in any straightforward way via induction from an analysis of main clauses to a correct account of the structure of embedded clauses.19 Consider a different example. If the properties of subjacency, hinge on the relationships between an interrogative pronoun and a predicate in an embedded constituent, then the input must include embedded interrogatives for these properties to be i-learned. The child cannot i-learn properties of sentences which are not present in the input (nor induceable from properties of conceptual representations). In fact the situation is worse for the Simplified Input Hypothesis; if the child i-learns simple structures first and only then complex structures, i-learning the latter will require considerable unlearning. In short, there are at least some syntactic problems where the learner can acquire more in principle from complex input than from simplified input (see the commentary to Lightfoot 1989, for further examples of this sort). Another major difficulty with the simplified input research arises from the sampling problem. There is only a crude correspondence between the frequency of certain patterns in the speech of caretakers and that of children. Consider, once again, the literature on the acquisition of German word order (Clahsen 1982; Clahsen and Penke 1992; Meisel 1986, 1994b, 1995; Meisel and Müller 1992), which shows that children initially favour verbs in final position (SOV order) despite the frequency of SVO and OVS orders in the input and the absence of *SOV simple sentences. Although we can find a reason why children might encode structures with verbs at the end of the sentence, for example, these verbs in final position will bear shifts in fundamental frequency and are likely to have lengthened syllables and thus to be acoustically salient (see Cooper and Paccia-Cooper 1980, who document these properties for English sentences, see also Bernstein-Ratner 1986; Hirsh-Pasek, Kemler Nelson, Jusczyk, Wright Kennedy, Druss and Kennedy 1987, and Echols 1988), there is no principled reason obvious from the acquisition literature as to why children should prefer these stimuli to ones where the verbs occur initially or medially. The post hoc nature of much explanation based on the Simplified Input Hypothesis thus becomes particularly obvious when examining accounts of children’s structures which never occur in speech to children. It also becomes obvious whenever the child’s speech really does reflect the variability present in the linguistic stimuli. Children and adults alike display variability in their knowledge of word forms and word meanings. Children pick and choose which forms will be their favourite first forms (Ingram 1989). Now the point here is that i-learning theorists cannot argue that learners get structurally uniform experience (they
236
INPUT AND EVIDENCE
uniformly extract general syntactic patterns) but get lexically (phonetically, semantically) variable experience. The reason is that syntactic properties of sentences directly reflect lexical properties. Syntactic properties such as subcategorisation and selection, argument structure and the correspondence between semantic roles like AGENT, EXPERIENCER, FIGURE or GROUND and syntactic positions or case relations hinge on the semantics of individual words (Chomsky 1965: 86–87; Gruber 1965; Fillmore 1968; Jackendoff 1972, 1983, 1990b; Grimshaw 1990; among others). If the lexical input to the child is variable, or if the child variably selects from homogeneous stimuli, then it follows that the information about syntactic structures to the child is variable. If the syntactic input is variable then the systematicity in children’s syntactic development becomes quite mysterious. In short, general nativists cannot have their cake and eat it too. They cannot explain differences in the content and organisation of an individual’s lexicon in terms of the contingent nature of lexical input and at the same time argue that syntactic input is uniform. 3.3 The representational problem vs. the discovery problem It is customary in standard presentations of the logical problem of language acquisition to dismiss the Simplified Input Hypothesis and then launch into a discussion of UG. This will not be a standard presentation. I want to dismiss the conclusions of Snow (1977), viz. that one can explain primary language acquisition without positing Universal Grammar, but at the same time, I want to emphasise the importance of simplified input for a theory of language acquisition. Here is the link. I have defined the representational problem as the problem of how the learner comes to have a representational system of a particular type, one capable of encoding certain constructs. I have defined the developmental problem as the problem of explaining how the learner comes to know the properties of the psychogrammar. Connected to the latter is what I call the discovery problem. This is the problem of how a learner comes to know that a particular bit of the input instantiates a given sound or concept (Gleitman, et al. 1988). In Snow’s (1977) paper, she proposed the Simplified Input Hypothesis as an answer to the representational problem. This is understandable insofar as syntactic constructs were to be defined inductively on the basis of the environmental cues that a learner noticed and represented. In other words, from a functionalist point of view, grammatical constructs would reduce to something else and would have no independent status. Unfortunately, the Simplified Input Hypothesis is not an answer to the representational problem. What is frequently overlooked, however, in the rush to defend Universal
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
237
Grammar, is that the Simplified Input Hypothesis is directly relevant to the discovery problem. Studies of caretaker speech constitute an important body of research and have led to important refinements to claims about the nature of the stimuli that children, at least middle-class American children, get. They are especially important in showing how interactions guide children’s attention, and provide important ideas about how children can initially break through the wall of sound to begin structural analysis. For example, discussions of language learnability (Wexler and Culicover 1980; Grimshaw 1981; Berwick 1985), in which UG is acknowledged as the answer to the representational problem, hypothesise that child acquirers must encode some minimal representation of meaning in order for syntactic acquisition to begin. Under some proposals this conceptual information is limited to the distinction between INDIVIDUALS and ACTIONS. In other proposals, the child must know who did what to whom in specific events. He then can deploy an a priori and universal strategy which maps AGENTS onto grammatical subjects and PATIENTS onto grammatical objects, these relations being provided a priori by UG.20 Children also map logical subjects and predicates onto NPs and TPs. The input literature suggests that caretaker speech is used in such a way as to make such semantic roles and relations particularly salient for the child by focusing on objects and events which are independently perceptually discriminable. Moreover, such relations are expressed over simple clauses as Morgan’s table above showed. Broen (1972), Bernstein Ratner (1986), and Fernald (1985) have shown that caretaker speech to infants can show greater and more systematic use of various prosodic features which coincide with clause boundaries. Hirsh-Pasek et al. (1987) have shown that infants discriminate on the basis of such features, thus potentially providing the child with the means to extract clauses from the speech stream if he knows what a clause is and only has to locate it in the speech stream with respect to changes in fundamental frequency (or changes in sequences of high and low tones). Mintz, Newport and Bever (1995) have shown how distributional properties of nouns and verbs in caretaker speech could serve as the basis for inducing which expressions are nouns and which are verbs. Thus, the input literature reveals a “conspiracy” of factors which would permit the child to identify the phonological units and the semantic units relevant for locating the morphosyntactic units of the psychogrammar. The learnability literature also claims that if “long distance” relationships, e.g. the proper interpretation of passive sentence subjects like The lily was attacked by beetles where the lily is a PATIENT, or the relationship between question pronouns and their source positions as in When did Wortmann’s say they would send the camellias t?, where the when phrase must be understood in terms
238
INPUT AND EVIDENCE
of the sentence they would send the camellias when, are to be learned at all, then these relationships must be learned from simple sentences. Simple sentences, as noted above, are defined in terms of the extent of their embedding. Wexler and Culicover (1980) proposed that constructions like passives and questions must be learned over structures which have no more than two degrees of embedding. Lightfoot (1989, 1991) has argued that this should be reduced to zero degrees, i.e. “main clauses plus a bit” (see Culicover and Wilkins 1984, for the related but distinct claim that thematic and grammatical relations can be assigned within simple sentences). The important point here is the the input studies provide empirical support for such ideas. Children do not hear a lot of sentences with two and greater degrees of embedding. To sum up, the simplified input literature is not to be dismissed. It plays an important function in guiding our assumptions about the discovery problem. I have stressed in previous chapters that one of the problems of much current generative work in SLA is that it pays insufficient attention to the problem of defining relevant input for the acquisition of a given phenomenon. The literature on native speaker-nonnative speaker interactions should be regarded in the same light. Properly done, it can give us some idea of the features of language that learners can use to discover the properties of the psychogrammar. Having said this, however, we can return to the idea that caretaker speech research does not show that assumption (6.1c) is wrong. It therefore fails to solve the logical problem of language acquisition and consequently is irrelevant to Chomsky’s claims about the necessity of UG to explain primary language acquisition. 3.4 Summary The facts from L1 acquisition show that assumption (6.1c) is basically correct. Linguistic stimuli constituting the input to language acquisition are unrevealing, incomplete, variable, and (in the case of older learners) occasionally deviant. This input is not enriched by instruction, feedback or correction relevant to the learning of structural properties of language (such as structure dependency, basic word order, or subjacency). No case has been made that children construe indirect forms of feedback as the instantiation of a corrective intention on the part of the caretaker, nor is there any reason to believe that they have the necessary metalinguistic capacity to draw the proper inferences about their own grammars on the basis of indirect forms of feedback. No case has been made, therefore, that enriched input solves the representational problem of language acquisition. This has led me to conclude that we have to accept UG in some form or other as a means for the development of the representational systems used
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
239
in the psychogrammar. The case against UG-less inductive approaches in first language acquisition is, in my view, very strong. There is no way a linguistically and cognitively uninformed learner can acquire a grammatical representational system of the required complexity on the basis of impoverished input.
4.
The empirical problem of second language acquisition
4.1 Preliminaries I now want to turn my attention to the arguments for UG in second language acquisition. I began this chapter by asserting that there is no logical problem of second language acquisition. There is one logical problem: the simultaneous incompatiblity of assumptions (6.1a–c) (or (6.1a), (6.1c) and (6.2)). It can be resolved by abandoning any one of the assumptions. The facts about infant cognition suggest that we need to adopt UG as the basis for i-learning our first linguistic system. What we face in SLA is something quite different, namely an empirical problem of language acquisition. This point is no mere quibble over terminology. To be used to make inferences about either first or second language acquisition, the logical problem must be converted into an empirical problem with distinct sets of assumptions made for any given group of learners, and any given class of languages. The formal nature of the grammars of natural languages does not change just because they are being acquired subsequent to some other language. So we can set aside this aspect of the logical problem: the nature of the what being acquired. However, it is patently obvious after even superficial scrutiny that the nature of the learner changes dramatically from childhood to adulthood with the learner coming to possess much more powerful cognitive systems, complex “theories” about objects in the world, including linguistic objects, and a number of compensatory mechanisms which could make i-learning of an L2 possible. 4.2 The Success Measure Posing the question: Is English learnable in principle? requires making a commitment to some explicit criterion of successful learning. We saw that the learnability literature simply sidesteps the issue by defining successful learning in terms of a grammar which will generate all of the sentences of a language, but does not worry about how one actually determines what that class consists of. When we convert the logical problem to an empirical problem, what happens?
240
INPUT AND EVIDENCE
It has become commonplace to assert that first language learners acquire their mother tongues “perfectly.” This claim is, however, either trivially true, and hence uninformative, or else it is false. It is trivially true if it is just a way of saying that first language learners stop learning at some point and that point, whatever it is, counts as a definition of the L1. With this definition, each learner’s idiolect is acquired perfectly, an obviously uninteresting conclusion. The claim is false if it is intended as a claim about the learner and the stimuli coming from the environment. The literature on sociolinguistic variation and language change cited earlier shows quite clearly that individuals growing up in the same community do not acquire the same knowledge systems and do not use language in the same way (see also Trudgill 1986; Labov 1994). Success in SLA is still insufficiently researched. Some studies suggest that adult acquirers can attain the same sorts of grammatical knowledge or performance that L1 learners do (Neufeld 1978, 1979; White and Genesee 1996; Ioup et al. 1994). Other studies clearly show that learners resident in a country for many years do not attain native-like competence or proficiency (Oyama 1976, 1978; Patkowski 1980; Johnson and Newport 1989, 1991; Coppetiers 1987; Birdsong 1992; Sorace 1993). Some of the latter research concludes that older learners are not able to attain native-like competence. Ioup et al. (1994), whose subject clearly has attained native-like competence in the L2 despite learning it as an adult, is treated as an exceptional learner. Both types of studies suggest that, generally speaking, adult L2 learning is relatively unsuccessful in that monolinguals L1 learners cognise other types of knowledge and have much better control of their knowledge. Therefore, it cannot be argued that UG is required to explain the comparative lack of success of adult L2 learners (given cautionary remarks made earlier about the nature of the input to L2 learning). 4.3 The adult’s other cognitive “equipment” I have stated repeatedly that the logical problem of language acquisition depends not merely on the assumption that the linguistic stimuli is impoverished, but also on the assumption that the learner has no compensatory cognitive mechanisms. In moving from the logical problem of language acquisition to the empirical problem of how children learn their primary language, we have seen that one can motivate empirically the assumption that the child indeed does not possess such compensatory mechanisms even when we grant that children have much richer initial conceptual systems than has traditionally been believed. In contrast, adults are, by hypothesis, cognitively and linguistically well-equipped. At least as far as the logical problem of language learning is concerned; I set aside here the
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
241
hypothesis (Felix 1987: Ch. 3) that the adult’s superior cognitive abilities in fact get in the way of language acquisition. 4.3.1 How meaning solves the logical problem of language acquisition It has been shown that relevant classes of languages are learnable in principle, i.e. they can be learned-in-the-limit, if input strings are coupled with representations of “meaning” (Gold 1967). If adults possess mature conceptual systems, they have, at least in principle, the capacity to represent the meanings of sentences. If they have fully developed inferencing systems, they have, at least in principle, the capacity to make inferences about the meaning of an input string from prior knowledge, or from the visual and auditory context. Second language learners have hypothetically two different routes into the formal system: They could map directly from conceptual representations onto structures of the L2, or they could use representations of sentences of the L1 to map onto the L2 via a translation-equivalent strategy.21 If adults possess the capacity to match input strings with meanings, they have the capacity, in principle, to solve the logical problem of language acquisition. 4.3.2 How feedback solves the logical problem of language acquisition There has been no demonstration that feedback is unavailable, uninformative and not used by adult learners. It is therefore potentially available as a solution to the logical problem of language acquisition. Indeed, many researchers have argued that given the transfer of linguistic representations (or parsing procedures) from the L1 as a major discovery procedure, feedback and negative evidence are required to learn the L2 (Vigil and Oller 1976; Gass 1988; Major 1988; Schachter 1988; Birdsong 1989; Bley-Vroman 1990; White 1987b, 1989a, 1991a: 148–71). My own research and that of others, which will be reviewed in Chapter 8, has shown that adult learners can in principle learn at least certain sorts of abstract linguistic generalisations on the basis of metalinguistic instruction or feedback in experimental contexts. Thus, a second solution to the logical problem exists in principle for adult L2 learners, according to which feedback constrains the learner’s induction about forms and structures in the L2. 4.3.3 The time element The amount of time available to learn a language is an important factor. Learnability studies place no temporal constraints on language learning since they want to simulate the idealisation of instantaneous acquisition (Chomsky 1975: 119–23). According to this idealisation the learning mechanism has access to all relevant input and learning is not hindered by the sampling problem. In translating from
242
INPUT AND EVIDENCE
the logical to the empirical problem of language learning, clearly this idealisation must be surrendered. The fact that input is temporally organised means that input is sequenced. Ordering input may hinder learning (some necessary information is missing in a given sample of input) or else it may help learning (simple input may come before more complex input). Berwick (1985: 173) has made the point that the incremental nature of acquisition could serve as a filter of stimuli received in random order. Thus, an ordering of the stimuli can occur even when no one in the environment consciously attempts to restrict it (as happens in language teaching). Indeed, we saw in Chapter 3 that maturation of principles and parameters has been proposed as a means to filter input to infants in order to explain the incremental nature of learning. We saw above in the discussion of the development of facial recognition that maturation of the visual cortex changes the kinds of stimuli infants attend to. It follows that the temporal dimension is critical to stating what access the learner has to given sorts of input; at stage 1 more may be filtered out than at stage 10. If learning is interrupted at stage 5, no conclusions about the failure to learn what might have been learned in stage 10 can be drawn. Talking in such ways relativises the notion of time to the learner. But comparisons between the success of FLA and the failure of adult SLA often make use of absolute notions of time, observing either that first language acquisition is fast, or that it is slow (Klein 1991: 50–2). If one focuses on how fast first language acquisition is, adults look terribly disadvantaged. Children have acquired much of the phonetic and phonological properties of their L1 in the first year, and many of the morphosyntactic structures of the L1 by 4 years of age. There are no longitudinal studies which have followed adult language acquisition for that length of time but certainly the studies on ultimate attainment cited above suggest that many immigrants are still far from proficient production long after arriving in the L2 community. Nonetheless, if one focuses on how slow first language acquisition is, the two groups appear more alike. Six yearolds are often still sorting out some aspects of the segmental systems of the L1 and certainly are still acquiring the L1 morphology and lexis. That aspect of knowledge may not be fully acquired until around puberty. But how could language acquisition or even grammar acquisition be both fast and slow? The apparent conflict arises because researchers are talking about two separate things. The fast aspects of first language acquisition appear to involve the encoding in perception and comprehension of the principal category types and the basic properties of the representational systems of the phonology and morphosyntax. This can be explained if it is the case that these properties are the manifestation of the deployment of the representational system for language, and if this system is never in fact acquired. The hypothesis then is that the
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
243
primitives of the representational systems are simply available and are put to use to rapidly acquire the specific categories of the L1 psychogrammar (between, say, birth and 3 years). On this view, the idealisation to instantaneous acquisition might not be so ideal. What appears to take time is the acquisition of the categories of the grammar, the sound-meaning correspondences, the cues to categories in perception and the parsing system, the acquisition of the parsing and production schemata and the development of expert control of those schemata. These are precisely the things that adult L2 learners must acquire. The comparison to L2 acquisition then permits the following conclusions: Adults do not acquire a representational system, indeed there will be no manifestation of development involving the representational system since the adult’s cognitive system is adult. We may assume that adult learners simply deploy the relevant representational systems and this deployment is equivalent to instantaneous acquisition. As I have stated before, the developmental problem in L2 acquisition then boils down to (i) creating lexical sound-meaning mappings, (ii) the development of novel parsing procedures for stimuli where transferred parsing procedures fails, (iii) the acquisition of any categories, relations, or properties not instantiated in the L1, and (iv) the appropriate differenciation of categories insufficiently differentiated under (ii). Any apparent developmental effects should be limited to these aspects of acquisition and not to the sudden emergence of the representational system. Therefore, we should expect SLA to look like the late stages of primary language acquisition and not the early stages, which ordinarily serve as the basis for comparison involving the “fast” aspects of language acquisition. To my knowledge, no such systematic comparisons of children learning their L1 in the late stages, and adult L2 learners has been done.
5.
Summary
I have argued at length that there is no logical problem of second language acquisition. There is only one logical problem which is correctly characterised in terms of a number of assumptions about the formal properties of natural languages, the cognitive capacities of the learner, the Success Measure, and the time alotted to learn. Converting the logical problem of language acquisition into a series of conclusions about the empirical problem facing a given learner means making certain assumptions with respect to each one of these categories. Since the objective is to make a case about what the facts are, one’s assumptions should be drawn from the best available studies describing the relevant population. The best available studies suggest there are profound differences in the learning
244
INPUT AND EVIDENCE
problems facing the language-less child and the adult L2 learner. Children do not possess mature conceptual systems. Their theories about how the world is organised are comparatively simple ones. It is not apparent that they could or indeed do derive the grammatical representational system from semantic information although mapping from initial conceptual representations to an initial syntactic representation appears to be needed to solve the discovery problem. Children have immature inferencing capacities and lack the metalinguistic capacities necessary for interpreting feedback and metalinguistic instruction (before approximately age 5). They could not use feedback, and apparently do not use feedback, to learn either the properties of the representational system or their psychogrammar. All of these reasons motivate my belief that linguistic cognition is constrained by UG and other a priori principles. That belief motivates in part my decision to constrain induction by tying it to autonomous representational systems which are by-products of innate capacities, including UG. None of these properties are true of adults. They possess all of the relevant representational systems. They have complex theories about objects in the world, including complex theories about language. They have mature inferencing capacities based on their mental models. They have mature metalinguistic capacities enabling them to represent units of language as conceptual categories. They therefore can, in principle, and apparently do, use feedback to learn the properties of the target system. Consequently, they could in principle at least induce the psychogrammar from the properties of their existent representational system (the L1 grammar), and the input. The point of this chapter has not been to claim that any one of the proposed procedural types (mapping from L1 structures to L2 structures, mapping from conceptual representations to L2 structures, inferencing on the basis of feedback or correction, etc.) will, on its own, provide correct accounts of what I have been calling the empirical problem of second language acquisition: explaining what adult learners come to know on the basis of the input they actually get. I do not believe they will. Nor do I believe that they will explain the developmental problem: accounting for the stages that learners actually pass through on the way to whatever endstate they achieve, as well as accounting for the pace of acquisition. Models of acquisition processes must be fleshed out in conjunction with models of longterm memory (in particular that part of memory we usually refer to as the mental lexicon), models of short term or working memory (including auditory memory), models of attention and consciousness, models of learner style, by which I mean everything from the individual’s tolerance of anxiety to preferences for certain types of stimuli, and last, but not least, models of the input to learning. The development of an explanatory theory of second language
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
245
acquisition will necessitate concerted research efforts on all of these fronts. We should conclude, however, that the logical properties of language acquisition are not a suitable basis for motivating access to UG in SLA. Even less are they a motivation for parameter-setting. There is no logical problem of second language acquisition, and therefore the logical problem of language acquisition cannot be used as grounds for excluding i-learning in SLA.
Notes 1. One SLA author who makes this distinction abundantly clear is Felix (1986: Ch. 2) who is careful to describe the logical problem of acquisition in terms of the question: What factors make language acquisition possible at all? 2. Note that I am not claiming that the child’s conceptual system is primitive in the sense that it can encode only a small range of concept types. On the contrary, the research literature on conceptual development shows that infants have rich initial conceptual systems. As techniques of investigation become more sophisticated, I think we will discover that infant and young children can cognise far more. I assume merely that the child’s conceptual system is comparatively impoverished with respect to that of the adult’s. I also assume that it is immature in the areas of problem-solving, inferencing, and deduction. This assumption is well-supported empirically (Gruber and Voneche 1977; Carey 1985; Keil 1989; Gelman 1990a, b). 3. I am using the term “generative” here in its original mathematical sense and not to mean some particular version of transformational generative grammar. In the learnability and mathematical linguistics literature a generative grammar is one that describes a language. 4. The situation is actually much more complex. Katz (1995, 1996), basing himself on the cited work by Langendoen and Postal (1985), has argued that generative linguistics is not consistent with representational realism. In other words, no person could have internalised a generative grammar in their mind/brain because a generative grammar must be capable of deriving nondenumerably infinite numbers of sentences, including sentences like the sentences in (i). (i)
I know that I like darwin tulips. I know that I know that I like darwin tulips. I know that I know that I know that I like darwin tulips. …
As noted in the text, our linguistic competence corresponds to an ability to produce and represent not only an infinite number of sentences, but also sentences with infinitely many embeddings, even unending sentences. No finite mental capacity could be said to be a physical state corresponding to such knowledge. In other words, there could be no distinct physical state corresponding to each sentence. (See also Putnam 1989, on this issue.) Katz therefore claims that Chomsky and other generativists have failed to show that mental grammars must be generative. Indeed, Katz says they are not generative, and consequently, one can no longer define “grammaticality” as that knowledge derivable from a generative grammar. It follows that UG is therefore not true of the initial mental state; it is at best a possible model of the contents of the initial state. 5. For primary language acquisition, the most conservative estimates place the process somewhere between 4 and 14 years at which age it is normally assumed that the child’s linguistic system is mature (although Labov 1963, has shown that older adolescents can move away from or
246
INPUT AND EVIDENCE towards a given sociolect). The least conservative estimates place the time of acquisition somewhere between 0 and 6 years arguing, incorrectly, that what follows is merely “lexical” acquisition. As far as second language acquisition is concerned, I know of no estimates of the length of time needed to acquire the L2 based on any sound empirical evidence. Various researchers have, however, speculated that after the first 3 or 4 years of exposure, no significant learning occurs. See note 6.
6. Flege claims in various places that phonetic learning takes place in the very early stages of SLA, and that fossilisation sets in rather rapidly. For this level of acquisition, then, it would appear that the input set might be very limited, representing only a tiny subset of the possible utterances of a target language. 7.
Gold’s algorithm exploits the properties of primitive recursive languages by enumerating candidate grammars one by one. For each candidate grammar, it tries to determine whether all the input sentences it has seen so far can be derived from the grammar. If so, it waits for a new sentence and tries the grammar again. If not, it moves to a new grammar and repeats the process. Although such a device cannot be guaranteed to discover the correct grammar of the target language from a finite corpus of sample sentences (such a guarantee would violate the known limits of any inductive process), nevertheless it can legitimately aspire to identify the target grammar ‘in the limit’ if it manages to find a grammar that continuously passes tests against new input sentences and hence need not be abandoned. (Gleitman and Wanner 1982: 6)
8. In their discussion of the relationship of triggers to parameters, Gibson & Wexler (1994: 408) characterise the learning algorithm as simply choosing a set of parameter-settings such that the data are grammatical, i.e. generated by the grammar, given the parameter-settings. 9.
The logical problem is the familiar problem of accounting for the apparent ease, rapidity and uniformity of acquisition in the face of impoverished data. (Hyams 1991: 72). This is the standard formulation.
10. This constitutes a radical departure from Gentner (1982) who has argued that nouns are learned before verbs because the latter are semantically relational. If object names are also relational conceptually then another explanation must be found for the comparatively early acquisition of nouns. One solution is to say that the verbs, like all predicates, are structurally relational and that it is this structural difference which retards the development of verbs in child speech. 11. Pre-schoolers have judged that wearing a mask would change a cat into a dog (DeVries 1969). Keil (1989: 231–34) showed 3 and 4 year-olds a series of pictures consisting of, e.g. a white horse, a zebra, and a “mixed” picture in which the white horse exhibits the zebra’s head. The children heard stories while looking at the pictures. In one condition, the stories made it clear that the zebra was wearing a horse costume. In another condition, the zebra wore paint covering its stripes and a glued-on mane which temporarily came off and then got put back on. In a third condition, the stories related that the zebra was painted and had a glued-on horse’s mane but the paint and the mane never come off. (They were, in other words, permanently attached.) The pre-schoolers differed significantly from chance responses to the question “Is the animal a horse or a zebra?” only on the first condition. For the very youngest children, the external appearance of an animal seemed to matter more than its internal features, its heredity, or its behaviour, and they were willing to infer appearance based on other properties (Keil 1989: 235). 12. Johnson (1990a, b) relates this to a developmental shift from subcortical to cortical processing in vision due to the maturation of the primary visual pathway. Johnson’s studies also indicate the special effects that moving stimuli have for infants.
THE LOGICAL PROBLEM OF (SECOND) LANGUAGE ACQUISITION
247
13. Johnson (1990b: 159) continues: I would anticipate similar mechanisms for language acquisition. This biased input at particular stages serves two functions. The first is to ensure appropriate input to sensory pathways, the second to ensure appropriate feedback through the specific pathway from, for example, STS [= superior temporal sulcus, SEC]. In this way, neural pathways may become configured to process particular classes of input. Only when the initial phases of the configuring are complete does the pathway begin to control output. In such a scenario the relative contribution of genes and the environment may be specified, but are nevertheless highly interdependent and integrated. 14. This is not to suggest that once phonological units are imposed on the speech stream that correspondences between them and morphosyntactic units could not be established. On the contrary, much current work in primary language acquisition assumes that children bootstrap from the phonology to the syntax. See Gleitman and Wanner (1982), Hirsh-Pasek, Kemler Nelson, Jusczyk, Wright Kennedy, Druss & Kennedy (1987), Gleitman et al. (1988), all on primary language acquisition, and Cooper and Paccia Cooper (1980) on the correspondences between phonological properties and syntactic units in adult speech. 15. There is plenty of evidence for this point in the linguistics literature. Georgian verbal morphology should convince even sceptics. See Anderson (1992: 137–56). 16. Clark shows that this problem exists even assuming UG. His example begins with a UG containing the small number of 30 binary parameters. The search space defined by 30 parameters yields 230 or 1,073,741,824 possible grammars to test against the input and accept or reject. Clark calculates that if the learner could test and reject one grammar per second it would take 34 years to test all of the hypothesised possible grammars. This is clearly unacceptable for any theory of language learning; children do not take 34 years to decide on a grammar. It must be emphasised, however, that 30 is a tiny number in comparison to the infinite number of features induceable from any perceptually-defined situation. 17. This is the basic idea behind one of Slobin’s (1973) operating principles, i.e. OP E: Underlying semantic relations should be marked overtly and clearly in the morphosyntax. Slobin assumes the prior existence of semantic categories but denies the prior existence of a syntactic representational system. The Semantic Bootstrapping Hypothesis discussed previously agrees with Slobin that the child imposes a correspondence initially between semantic and syntactic constituents, but also assumes, in contrast, that the syntactic constituents have prior representational existence. In other words, there is a fundamental distinction between Slobin’s position, where semantic categories are used to induce syntactic categories, and the Semantic Bootstrapping Hypothesis where semantic categories are used to help the acquirer map UG-given syntactic features onto particular forms of the L1. 18. It is, of course, misleading to say that conceptual representations evolve into syntactic representations since at no time does the child cease to represent the meanings of utterances. Rather, the idea is that the child induces a separate and distinct set of concepts which are then used in a purely unconscious way to analyse and encode the linguistic stimuli for structural rather than interpretive purposes. The interpretation remains the function of the conceptual representations. 19. In general there is a lack of correspondence between main or root clauses and embedded clauses. The embedded clause generally exhibits fewer properties and is more constrained (Emonds 1976). 20. Berwick (1985: 22) adopts this second proposal and derives the 1–1 syntactic/thematic correspondence from the Subset Principle.
248
INPUT AND EVIDENCE
21. Flynn (1987) claims that knowledge of structure dependency would not be available if learners were translating and could only result from directly accessing principles and parameters. This argument holds water only if subjects are mapping from a conceptual representation (based on the L1) to a non-structural linear ordering of words in the L2. But there is absolutely no reason to suppose that this is what translation consists of. Consider what Sharwood Smith (1979: 16) has to say about translation or what he calls naive relexification: Here learners are said to generate (in a processing sense) an L1 (or other nontarget language) syntactic frame which is then filled with substitute L2 lexical material, producing what looks like a literal translation of the source language. [emphasis mine]. In other words, translation involves mapping from one syntactic structure to another, defined on the basis of the syntactic properties of the original L1 lexical items which are simply given an L2 phonological shape. Such a process would entail that learners have knowledge of structure dependency.
C 7 Input and the Modularity Hypothesis
1.
Introduction
The need to appeal to principles and parameter-setting as an acquisition mechanism to explain the properties and stages of second language acquisition has seemed compelling to many in SLA. The most frequent argument to appear in print is that UG is needed to explain the logical problem of second language acquisition, which we examined in the last chapter. We saw, however, that there is no such logical problem. The weakest argument for adopting UG is simply the absence of a well-defined alternative, in particular, a theory of i-learning which can explain the constraints that linguistic systems obviously exhibit.1 In previous chapters we saw that the essential point about the constraints issue has to do with how the primitives of the representational systems in which linguistic cognition is encoded develop. This point is a critical one for first language acquisition alone. It is of secondary importance for a theory of adult second language acquisition since we must assume that the learner possesses adult, mature representational systems, including one for the L1 grammar. More importantly, it is possible to propose an explicit model of i-learning which addresses directly the constraints problem. I have done so. Specifically, in Chapters 4 and 5 I hypothesised that i-learning is severely constrained by the representational system in which the L1 is encoded, by the functional architecture of the mind, and by the nature of language processing, in addition to a number of independent contraints stipulated to hold. Further aspects of the model dealing with how information encoded in conceptual structures can interact with the grammatical system will be taken up later. This leaves us to face what may be the only legitimate conceptual argument on behalf of UG qua acquisition mechanism in SLA, namely that linguistic cognition is modular and that all acquisition takes place on the basis of a strictly bottom-up processing of the linguistic stimuli with no input affecting acquisition from longterm memory or from the conceptual system. If these properties were to turn out to be true of the language faculty,
250
INPUT AND EVIDENCE
there could be no role for concept-based i-learning, for i-learning based on inferential reasoning, or for any enrichment of the input to learning through correction and feedback. If the input to language learning is restricted, there must be compensatory sources of information which the learning mechanisms can draw on. The validity of the Poverty of the Stimulus Hypothesis then forces a priori knowledge on us if we are to explain how the learner indeed learns. I first introduced the Modularity of Mind Hypothesis (Fodor 1983) in Chapter 1, Section 2.2. It is time now to be more precise about what this term means.
2.
The Modularity of Mind Hypothesis
The Modularity of Mind Hypothesis begins in a model of faculty psychology. Fodor’s model is shown in Figure 7.1. The mind consists of “horizontal” and “vertical” faculties. The former correspond to such things as memory, imagination, attention, judgement, perception, etc. Cognitive processes exhibit the interaction of these faculties. In addition, they exhibit properties of the second type of faculties — the modular ones. Modules do not share and do not compete for horizontal resources. Language is a module, and moreover is an input system with the following properties:
Input Modules
Central Processing System
Domain specific
Operates in multiple domains
Processing occurs in a mandatory fashion
Processing is not mandatory
Operates fast
Is comparatively slow
The central processor has limited access to the operations of the modules, and, as a consequence, they function outside of consciousness
Processing is interactive meaning that the central processor can access all components (processing produces awareness).
Is informationally encapsulated (i.e. it is uneffected by feedback, and the belief system cannot direct the operations of the processors)
Is (by definition) interactive
Input analysers have shallow outputs, in particular what is in the input system are those properties of utterances which they have by virtue of their linguistic structure. Fodor suggests (1983: 88) grammar + logical form.
The central processor deals with those aspects of language dependent on inferencing, e.g. reference and coreference, metaphor, humour, irony, etc..
INPUT AND THE MODULARITY HYPOTHESIS
251
Input modules are associated with fixed neural architecture (meaning they are hard-wired)
Central processing is also obviously associated with neural architecture, but it is not hardwired
Consequently, they are associated with specific pathologies, e.g. maintenance of phonological and syntactic knowledge about words when semantic knowledge about the same words has broken down (Schwartz, Marin, and Saffran 1979).
Disease and/or trauma lead to domain-general pathologies
The ontogeny of input systems exhibits charac- (so does the development of classification, teristic pacing and sequencing problem-solving and inferencing skills but Fodor has nothing to say about this)
Figure 7.1. Fodor’s model of faculty psychology
The properties of this model will be reviewed briefly below. 2.1 Componentiality, autonomy and domain-specificity The Modularity of Mind Hypothesis amounts to several claims. The first is that the mind consists of various components performing specific processing functions, and representing information in ways distinct from one another. Modularity thus entails that cognition is componential. This claim is hardly new; psychologists and linguists alike readily speak of a syntax, a phonology or a lexicon. Talk of components of knowledge does not require specific commitments as to the existence of processors computing differentiated representations. “Syntax” might merely refer to information about sentences stored in longterm memory. Or it might refer to the representations that some general processing mechanism constructs which encode syntactic distinctions. In short, one can adopt the claim that processors compute differentiated representations without thereby adopting the position that some or all grammatical distinctions require separate processors. The Modularity of Mind Hypothesis, in contrast, is first and foremost a claim about the domain-specificity of information-processing subsystems. The language module processes only linguistic representations. This means that its processors are sensitive only to distinctions that can ultimately be encoded in linguistic representations. It follows that the Modularity of Mind Hypothesis also requires that the various representational systems which are domain-specific be autonomous. In other words, the representational system encoding language exists independently of the representational system used for encoding thought.2
252
INPUT AND EVIDENCE
2.2 Modules are part of the “hardware” I said that linguists and psychologists alike can talk about a mental grammar without committing themselves to the idea that the functional architecture of the mind/brain includes a processing unit specific to grammar. It should be apparent that they also need not commit themselves to the claim that these components actually correspond to defined neural structures or to neural areas in the mind/ brain. It is important to understand that Fodor (1983, 1987), in contrast, is committed to representational realism. In particular, modules are claimed to be instantiated in the brain.3 This means that they correspond to and operate through dedicated neuroanatomical structures. Fodor’s claim is that the human brain manifests dedicated neuroanatomical structures for language. As evidence for this, he points out that subsequent to injury or other insults to the brain, humans can manifest quite specific pathologies involving language without the performance of other modules or of unspecialised processors being affected. Aphasic patients can manifest such disassociations. They can be unable to speak and yet have no difficulty orienting themselves spatially; they can be able to perform a variety of non-linguistic tasks such as adding up a bill or playing a musical instrument, learn to recognise new faces, etc. It has been claimed that similar dissociations exist in dementia (Schwartz, Marin, & Saffran 1979) and in development (Van der Lely 1995, 1996). For a detailed introduction to this topic, see Eysenck & Keane (1995). 2.3 Modular domains exhibit a specific ontogeny As additional evidence for modularity, Fodor points to the distinct ontogeny of language. First language acquisition exhibits characteristic stages, sequencing, and pacing. The acquisition of the first language is, as we have already noted, often said to occur relatively quickly, and uniformly. Fodor suggests that such properties make the acquisition of language quite different from the acquisition of other sorts of knowledge. We saw in Chapter 6, however, that conceptual development can be likened to the development of complex and rich (tacit) theories about entities, including their essential properties and the relations they bear to other entities. We also noted there that such learning is internally directed by a priori tendencies to attend to certain kinds of stimuli, and by the basic features of the conceptual system The learner is constantly interacting with the environment and encoding information on the basis of those interactions. It is occasionally suggested that this kind of learning is very slow and effortful, in contrast to language acquisition which is the proverbial “piece of cake.”
INPUT AND THE MODULARITY HYPOTHESIS
253
Such claims are exaggerated and overlook the incremental nature of language acquisition (Ingram 1989), as well as the rapidity with which children develop rich representations of concepts. In addition, one can observe unique phases of development in other cognitive domains, where, on a domain-specific basis, conceptual information is reorganised (Keil 1989; Karmiloff-Smith 1992). We also saw in the last chapter that certain kinds of domain-specific capacities in non-linguistic areas of cognition can suddenly emerge. This means that linguistic ontogeny is not as unique as Fodor would have us believe. Rather, linguistic ontogeny shares certain global properties with conceptual acquisition, reflecting the domain-specific characters of both. Fodor also points to the fact that disassociations can be found between linguistic development and other areas of cognition. Children deprived of language experience in the first few years of life show clear and irreparable linguistic deficits although other aspects of their cognition is intact and they show the ability to learn other things (Curtiss 1977, 1982, 1988). Whether their non-linguistic intelligence is normal is a separate issue and hard to assess, given the considerable damage done to such children in other ways. Consequently, a more interesting example, one not specifically mentioned by Fodor but certainly one which adds grist to his mill, involves children with Specific Language Impairment or SLI. Subgroups of such children can perform normally on standardised tests of non-linguistic cognitive and motor skills, and normally on standardised tests of hearing and emotional development, but reveal severe problems in their skills at comprehending and expressing language (Van der Lely 1995, 1996). In particular, they have problems comprehending and producing grammatical morphemes, e.g. for English-speaking SLI children, regular verb inflections of past tense, third person singular, and the copula (Leonard 1979). There is a lively debate going on about the origins of these problems. Gopnik and Crago (1991) have proposed that SLI children have a featural deficit, which makes it impossible for these children to encode the morphological markers of those features. Such a deficit would of necessity be specific to the language faculty. The Featural Deficit Hypothesis has been vigorously contested by other researchers. Leonard, Bortolini, Caselli, McGregor and Sabbadini (1992) have shown that SLI children do not produce grammatical morphemes in inappropriate contexts and therefore are not treating the forms as meaningless. Clahsen, Rothweiler, Woest, and Marcus (1992) have shown that German SLI children overregularise noun plurals showing that they can identify morphological endings and extract generalisations from the input. Clahsen (1991) has argued that the deficit should be attributed to either a failure to set relevant parameters or to setting the wrong parameter. Clahsen (1995) has argued that SLI should be
254
INPUT AND EVIDENCE
characterised in terms of a lack of agreement, which will explain (given the characterisation of agreement in the Government and Binding theory of morphosyntax) not only the English data but account as well for the tendency that Italian SLI children have of omitting obligatory object clitics. See also Clahsen (1989). Bishop (1994) has shown that the errors that SLI children make in marking morphosyntactic features are not random — thus they may omit affixes but they do not misanalyse nominal features and assign them to verbal categories. Moreover, they make fewer errors in marking noun plurals than in marking verb tense showing that within the morphosyntax, SLI children are differenciating types of features. The case system seems to be especially vulnerable. Van der Lely (1995) has proposed that the children manifest an inability to control government relations. Obviously, there is no consensus on the explanation offered, but most of these researchers agree that SLI is a deficit of the language faculty and is not due to some general deficit.4 2.4 The automaticity and speed of modular processing According to Fodor, modular processes operate automatically, and they function outside of a person’s conscious awareness and control. For Fodor, this means an additional difference between language and the operations of the conceptual system. Fodor makes the point that one can choose not to listen to one’s mother tongue being spoken (by avoiding situations where it will be used, or stopping one’s ears), but one cannot choose not to “hear” it once it is spoken. One can never hear one’s mother tongue the way one hears a second language on first exposure, namely as a continuous, unbroken stream of noise. The perception of speech is thus like the visual perception of objects. The very existence of visual illusions (mirages, Wittgenstein’s (1953) duck-rabbit, and other visual phenomena studied by Gestalt psychologists, see Wertheimer 1923; Köhler 1929; Koffka 1935) shows not only that perception is not something derived solely from properties of the stimuli, it is also something which happens to us rather than something we consciously do. Modules also operate very quickly. In particular, Fodor wants to make the claim that they operate much faster than the information processing unit responsible for inferencing, problem-solving and thinking. Therefore, language processing operates faster than the systems responsible for i-learning.
INPUT AND THE MODULARITY HYPOTHESIS
255
2.5 Modules produce “shallow” outputs Modules are associated with “shallow” outputs. This is related to the issue of autonomy. In the brief discussion of speech processing in Chapter 1, I pointed out that parsers are assumed to have quite specific inputs — the phonological parsers will be sensitive to features of the speech stream, the morphosyntactic parsers are sensitive to features of morphosyntactic categories like [+N, −V]. I have assumed that the phonological parsers output an abstract phonological representation which activates abstract phonological representations in entries in a mental lexicon organised along phonological lines (Cutler and Fay 1982). The phonological parsers thus do not output morphosyntactic representations; that job belongs to morphosyntactic parsers. Modules exhibit this sort of job specialisation.5 Fodor’s claim that modules have shallow outputs means that parsers perform limited sorts of operations on their inputs — operations which are also domain-specific. The effect of this claim is that the language module does a little bit of the processing work and then passes on the job to the central processor, which Fodor assumes is responsible for thought. 2.6 “Cross-talk” is limited Modules interact in a highly restricted fashion. They may provide input representations to one another but their internal operations are relatively closed to outside influence. This is a property referred to as information encapsulation. Fodor claims that language is an input system to the central processor. It follows that the language module must analyse and represent speech with little or no use of information from the central processor, from longterm memory or from a person’s beliefs and assumptions about the ongoing context of speech. 2.7 The Schwartz model of modular L2 acquisition 2.7.1 K-acquisition vs. k-learning Schwartz’ (1986, 1987, 1992, 1993) goal is to motivate theoretically a distinction made by Krashen (1981, 1982, 1985) between “language acquisition” and “language learning”. I will henceforth call these “k-acquisition” and “k-learning” to underline the speculative and theory-specific nature of these constructs and to be able to continue to use the terms “acquisition” and “learning” in an intuitive way. Krashen’s distinction, as I understand it, is based on little more than the observation that naturalistic L2 acquisition and L2 acquisition in the classroom lead to different results. Crudely put, acquisition of an L2 in natural settings is
256
INPUT AND EVIDENCE
claimed to lead to superior and more native-like competence and proficiency than acquisition based on instruction in the classroom to the tutored learner. The untutored adult k-acquires a second language by the intuitive analysis of spontaneously provided stimuli from native speakers observed in normal communicative settings. The intuitive analysis is the work of whatever mechanisms are responsible for L1 acquisition. As to what those mechanisms are, Krashen does not say beyond speculating that they must be the same mechanisms that are responsible for primary language acquisition. For Krashen, the superiority of k-acquisition is not simply the result of the untutored learner getting more or better stimuli. Rather, the nature of linguistic processing and hence of acquisition and performance are differentially organised based on the nature of the input to learning — normal speech in the first case, and metalinguistic instruction in the the second. For discussion of Krashen’s work, see Klein (1986: 28–9), McLaughlin (1987), Larsen-Freeman and Long (1991: 240–9), and Towell and Hawkins (1994: 25–31). Schwartz attempts to provide some content to Krashen’s model, which on its own is too vague to be falsifiable, by adding two different ideas to Krashen’s k-acquisition and k-learning distinction. The first involves Fodor’s claim that language processing in monolingual, mature adults is modular. See Figure 7.1 above. For Schwartz, language is an input system with information flowing from the acoustic-perceptual stimuli to interpretation. This is made manifest in her model of information processing. See Figure 7.2 reproduced from Chapter 1. A. Linguistic Competence arises from the activities of the Language Acquisition Device, which is modular at least in the sense of being informationally encapsulated. The No Negative Evidence Hypothesis is a major presupposition. So is the claim that language acquisition is unconscious. Linguistic Competence is the normal result of learning in naturalistic situations.
Learned Linguistic Knowledge arises from activities of the central processor, which are domaingeneral (not specific to language and possibly not innate), result in conscious awareness, and are based on exposure to negative evidence, metalinguistic information, and explicitly taught “rules”. Learned Linguistic Knowledge is the normal result of learning in classrooms.
B. The direction of information-flow within a Schwartz model
Things out there in the world that we see or hear
Vision Module Language Module
Central Processing Systems
Etc. Figure 7.2. Schwartz’ model of modular processing (adapted from Schwartz 1993: 157)
INPUT AND THE MODULARITY HYPOTHESIS
257
The second idea involves adopting P&P as a theory of linguistic representation. The P&P theory thus provides Schwartz with an explicit definition of language. She puts the three sets of ideas together to argue that naturalistic L2 acquisition is, must be, and could only be, a bottom-up process whereby the adult L2 acquirer represents the linguistic stimuli guided by Universal Grammar, on the one hand, and linguistic representations of the L1 (also a consequence of UG), on the other hand. K-acquisition and k-learning are distinguished in Krashen’s work largely by the claim that the first occurs unconsciously, while the second takes place consciously. For Schwartz this means that k-learning must take place in the central processor. K-acquisition, in contrast, will occur in the language module. For Krashen, k-acquisition results from whatever the LAD does, and we saw in Chapter 1 that for Krashen this means that the LAD can use conceptual information to bootstrap the grammar from Stage i to Stage i + 1. In other words, k-acquisition can be a form of i-learning. For Schwartz, this possibility is excluded, except in the realm of lexical learning. She hypothesises that adult learners transfer the parameter-settings of the L1 grammar and then, upon the identification of appropriate triggers, reset their parameters to those which are consistent with the triggers. In short, k-acquisition is said to result when a language is acquired unconsciously on the basis of exposure to stimuli produced in meaningful contexts through the operation of parameter-(re)setting. Induction, in this theory, as in other P&P research, is limited to the acquisition of soundmeaning pairs in the lexicon. Input from feedback, correction, metalinguistic instruction or other forms of negative evidence is explicitly excluded. Basically, [k-]Acquisition of a second language is likened to the processes that obtain when a child acquires his/her first language, in that the subject “picks up” the language unconsciously without recourse to explicit (instructional) rules. This is distinct from [k-]Learning a second language whereby a subject incorporates rules, namely descriptive generalisations, of the language by consciously mastering them. [k-]Acquisition is the real knowledge of a second language whereas [k-]Learning is not. (Schwartz 1987: 27, emphasis in the original).
K-learning which, by hypothesis, results from attempts to teach the language by teaching about the language involves quite distinct processes and leads to different results. In her thesis (Schwartz 1986), Schwartz develops the idea that k-learning involves the conscious learning of largely metalinguistic information presented in the form of rules. This information could come from grammatical instruction (explicit or implicit), feedback and correction, studying linguistics, and so on. Learners internalise propositions which express grammatical information
258
INPUT AND EVIDENCE
(more or less well). According to this analysis, the statement “English is an SVO language”, or the truth-functionally equivalent statement “In English, subjects precede tensed verbs” would count as instances of such metalinguistic rules. However, Schwartz insists that if such learning is not optimal, i.e. if it doesn’t lead to the same knowledge or proficiency that natives demonstrate, it is not merely because the statements are uninterpretable, incomplete or inaccurate. Nor is it merely that adult learners have inadequate automatisation when they attempt to deploy their knowledge in speech production. Rather, k-learning leads to inadequate performance because metalinguistic information is represented and deployed in the central processor, and cannot be deployed in any way, shape, or form by the modular language processing system. Only information extracted from the processing of linguistic stimuli can be input to the LAD. There is no input from the conceptual system. What are the consequences of this distinction? One consequence is that we might assume that the representations which correspond to k-acquisition instantiate such relations as dominance, precedence, locality, c-command, or the Empty Category Principle (ECP). It would follow that the representations resulting from k-learning do not. We might predict therefore that learners whose knowledge of the L2 has been acquired solely on the basis of classroom learning will not cognise these properties in their L2. Schwartz, in fact, makes no such claim, and moreover, makes no attempt to demonstrate that speech behaviour arising from k-learning does not manifest the relevant properties. There is no empirical evidence to show that the basic properties of k-acquired systems are fundamentally different from the basic properties of k-learned systems. Indeed, it would be difficult to prove such a thing, since even in classrooms using grammar/ translation methods, learners also get linguistic stimuli. Schwartz thus faces a terrible methodological problem of how to demonstrate such radical differences in knowledge. The distinction between these two types of knowledge is also not motivated on the basis of empirical evidence contrasting first and second language acquisition. I would argue there is no empirical evidence for the distinction. It is, rather, strictly a conceptual consequence of Schwartz imposing Fodor’s (1983) theory of modularity on Chomsky’s theory of UG. There are certain other aspects of the model which merit some attention. The first is that the modularity of second language acquisition is made dependent upon the modularity of language processing as purportedly exhibited by highly competent, highly proficient monolingual L1 speaker/hearers. This move seems unwarranted in the absence of empirical evidence suggesting that language processing in such learners is modular. It is to Schwartz’ credit that she has attempted to connect SLA to an explicit model of the cognitive architecture of
INPUT AND THE MODULARITY HYPOTHESIS
259
the mind and specific hypotheses about language processing in adults. Such attempts are rare in the field. But it needs to be kept in mind that Fodor did not formulate his theory of faculty architecture on the basis of facts of language processing drawn from language acquisition. His claims are limited to monolingual adults processing their L1. I see no reason to assume that the properties of such processing systems, even if they were to be modular in Fodor’s sense (more on this below) would necessarily extend to learning mechanisms. Indeed, I have already spoken of evidence which shows that linguistic categories manifest the same typicality properties of categories in other cognitive domains, assumed by everyone to arise from i-learning. Schwartz’ notion of normal language acquisition is a major departure, rather than merely an explicit working out, of Krashen’s views. Both equate k-acquisition with the operations of the LAD, and hence with Universal Grammar, but 10 years of linguistic theorising lie between the two and the changes in the conceptualisation of UG have been fundamental. At the time that Krashen was defining his terminology, UG was construed as a set of constraints on rule construction and representation. When we spoke then of learners acquiring linguistic competence, we spoke of them learning rules. We also said that they learned categories and category typologies. Consequently, it was possible in those days to draw explicit parallels between language acquisition and other domains of learning. Now it is often asserted that rules are not part of linguistic competence at all (Clahsen and Muysken 1986, 1989). This would certainly be true if linguistic competence were nothing more than UG. Although, like everyone else, Schwartz has her own views on what to maintain and what to jettison among the hundreds of various proposals in the literature, this is the overall view of UG that she subscribes to too (Schwartz and Tomaselli 1990; Schwartz and Vikner 1989). One may well ask: Are these the same notions of k-acquisition?6 I suggest they are not. The earlier proposals were much more compatible with i-learning. Current P&P theory is not. 2.7.2 Summary Let me now summarise the Schwartz model of language acquisition. (7.1) the Schwartz model a. Language processing is modular in precisely the sense of Fodor (1983), in particular, one can establish a clear division between the construction of linguistic structures and the processing of meaning, the first being modular while the second is not. b. The on-line processing of linguistic stimuli is the sole source of input
260
INPUT AND EVIDENCE
c. d.
e.
f. g.
for grammatical restructuring. There can be no k-acquisition based on other sources of input. Because on-line processing of linguistic stimuli is modular, language acquisition is modular. There is a distinction in processing terms between k-acquisition (unconscious, automatic, bottom-up processing), and k-learning (conscious, non-automatic, occurs only in the central processor) which has consequences for the representation of linguistic information. Language and linguistic competence correspond to whatever results from UG, parameter-setting and lexical acquisition. Specifically excluded from linguistic competence is pragmatic competence, that is to say that knowledge which permits speaker/hearers to know what forms to use in particular contexts in order to make particular logical inferences. Also excluded are induced categories or rules, structures learned through metalinguistic instruction, or inferencing, and all information derived from correction and/or feedback. UG and parameter-setting can be construed as acquisition mechanisms, therefore k-acquisition can be defined as parameter-setting. K-acquisition leads in principle to native-like competence but k-learning never can.
In the rest of this chapter, I wish to focus on the processing and representational issues implicit in Schwartz’ work. Although there is a substantial literature on language processing, indeed there is a substantial literature on Fodor’s notion of modularity, and on some definition or other of modularity (Crain and Steedman 1985; Tanenhaus, Carlson, and Seidenberg 1985; Garfield 1987; Marslen-Wilson and Tyler 1987; Linebarger 1989; Frazier 1990; Fodor 1990; Thompson and Altmann 1990; Karmiloff-Smith 1992; Shillcock and Bard 1993), Schwartz cites none of it in her later work either to justify her adoption of Fodor’s claim that the language faculty is modular, or to motivate her claims about L2 acquisition and learning.7 This literature must be drawn on to determine the relevance of Fodor’s work for SLA theorising and certainly should be kept in mind by those who cite Schwartz’ take on modularity in approving tones.
3.
What’s wrong with Fodorian modularity?
Since Schwartz bases her model of L2 acquisition on Fodor’s model of modularity, it inherits all of the weaknesses of that model. I hope to undermine any belief that it can be incorporated into SLA theorising. My criticisms, it should be noted,
INPUT AND THE MODULARITY HYPOTHESIS
261
are partial motivation for the adoption of an alternative version of modularity, namely Representational Modularity. For criticism, I shall not turn to the literature which rejects more or less outright the concept of modularity (see, e.g., Anderson 1983), rather, I shall look at the limitations of Fodor’s model from the perspective of research which accepts the basic idea but thinks that Fodor’s model is the wrong model. 3.1 Problem one: We’re not talking about “language acquisition”, we’re talking only about “grammar acquisition” In adopting Fodor’s version of modularity, Schwartz commits herself to the claim that there is a clear division of labour in language processing between structural processes and the processing of meaning, the first being modular and the second not. Now what we normally think of as “language” corresponds to complex and diverse types of knowledge and we have no reason at present to expect that the processing mechanisms needed to parse representations corresponding to these types of knowledge will not be equally complex and diverse. If it turned out to be true that only some subparts of the language processing mechanisms manifested modularity, then Schwartz could hardly equate “language” with the language module. This in turn would make it difficult to equate the operations of the LAD with k-acquisition and the output of activities of the central processors with k-learning since knowledge of language would manifest properties of both systems. In other words, maintaining a distinction between k-acquisition and k-learning requires that k-acquisition correspond to what we intuitively think language acquisition ought to be. If it turns out that only some subpart of language processing and/or k-acquisition is modular, then Schwartz’ model cannot be a complete model of second language. It will need to be supplemented with other mechanisms and one of those mechanisms would be i-learning. It would then follow that an explanatory theory of SLA would have to be compatible with i-learning. The Autonomous Induction Theory was explicitly designed to meet this criterion; the Schwartz theory does not. Schwartz (1993) differenciates between semantics and pragmatics and puts semantics in the linguistic module. She does not explain what the difference between semantics and pragmatics is. Establishing a difference has over the years proven extremely difficult and many semanticists do not consider the terminological distinction to be, pardon the pun, meaningful (Jackendoff 1983, 1990b). Schwartz’ claim that semantics belongs in the language module is consistent with Chomsky’s early view that semantics belongs to the language faculty but it is a serious departure from Fodor’s position and, I think, not a
262
INPUT AND EVIDENCE
coherent stance. If semantic information can guide a syntactic parse, then there is no information encapsulation. To be consistent with Fodor’s claims about processing, we ought to assume that linguistic semantics falls outside of the language module except for those aspects of meaning which are directly connected to linguistic structure, such as defining the scope of operators and modifiers, quantifier-binding, binding of reflexives. In short, Logical Form, which is really syntax, is “in”, reference, coreference, denotation, connotation, inferencing and all supra-sentential interpretation are “out.” So this leaves us with only the grammar, inside the language module. The construct “language” has been reduced to “grammar”. Schwartz has written that native speakers can have Learned Linguistic Knowledge about a language’s grammar, for example, when they attend a grammar class on the native language and learn explicitly what its description is. But although Learned Linguistic Knowledge is originally defined as metalinguistic in nature, this cannot be entirely what Schwartz has in mind, for there could be many kinds of k-learned knowledge which are not metalinguistic in nature. Thus, what Schwartz is really contrasting is not Krashen’s distinction between k-acquired and k-learned (metalinguistic) knowledge, but rather parameter-setting and i-learning. Observe, now, that much of i-learning will take place independently of awareness. We cannot tie the distinction between induction and parameter-(re)setting to the distinction between conscious and unconscious learning. We thus lose Krashen’s claim that the product of k-learning is conscious and the produce of k-acquisition is unconscious.8 So, Schwartz has shed the distinction between naturalistic and metalinguistic learning, as well as the distinction between conscious and unconscious learning. We are left with the distinction that knowledge of the grammar will result from bottom-up only processing of the stimuli. Other types of linguistic knowledge do not. This is, however, a conclusion to emerge from empirical evidence and not evidence for the distinction. The relevant question to now ask is: Do the processing mechanisms which parse knowledge corresponding to grammatical knowledge operate in a strictly bottom-up fashion? The correct response here is “Yes, but also no”. Some aspects of grammatical knowledge appear to be parsed in modular fashion, but others clearly are not. 3.2 Problem two: Fodor’s concept of what a module is is too crude As Tanenhaus et al. (1985: 361) put it, a central assumption of the Modularity of Mind Hypothesis is that we are “computing different levels of linguistic representations that correspond to separate processing modules”. The most obvious
INPUT AND THE MODULARITY HYPOTHESIS
263
criticism to level at Fodor’s model is that the grammar must correspond to the processor’s modules. Schwartz also adopts this view although it clearly runs counter to the spirit of the Principles and Parameters theory, where, adopting a simple-minded view of the relationship between grammar and processor, we might expect to find a semantic-role processor, a phrase structure processor, a function-variable binding processor, and so on. See Frazier (1990) for some suggestions along precisely these lines. There simply is no reason to assume that there must be a single set of processors which correspond to the phonology and syntax and that they count as part of the relevant modular system with semantics on the outside. The first line of attack has therefore been to show that this slicing of the pie doesn’t work. Consequently, one problem with Fodor’s model is that is misanalyses the relationship between modular and central processes. Both Tanenhaus et al. (1990) and Marslen-Wilson and Tyler (1987: 37) argue that the boundaries between language input systems and the central processor do not coincide with the boundaries conventionally drawn between grammar and referential semantics and/or pragmatics. It is not the case, therefore, that one can use Fodor’s criteria of modularity to put morphosyntax inside but conceptual processing outside of the language module. There are several arguments here to be made with respect to Fodor’s definition of modules. I take them up one by one. i. The specificity of the processing domain: Fodor makes the inference that if the representations which encode knowledge in a domain are autonomous, then the processor responsible for the domain could be modular. Fodor notes, however, and Marslen-Wilson and Tyler insist, quite correctly, that the truth of the inference does not follow as a matter of conceptual necessity. Rather it is a matter of fact. As they emphasise (Marslen-Wilson and Tyler 1987: 39), Fodor provides no evidence that the processing which maps stimuli onto shallow linguistic representations is any more domain specific than the processing which maps those shallow linguistic representations onto conceptual structures. Indeed, we have documented in Chapter 6 evidence that thinking and problem-solving are domain-specific too in the sense that particular types of problem-solving require domain-specific representations and domain-specific operations on those representations. See Hunter (1966, 1977), Huttenlocher (1968, 1977), Clark (1969, 1977) and Johnson-Laird and Wason (1977a) for further discussion.9 Since thinking and problem-solving are for Fodor the defining responsabilities of the central processor, if they turn out to be domainspecific, this will undercut dramatically the contrast between language processing and central processing which Fodor wants to make.
264
INPUT AND EVIDENCE
ii. Obligatory or automatic processing: The existence of obligatory or automatic processes has been well-documented (see Posner and Snyder 1975; Shiffrin and Schneider 1977; Marslen-Wilson and Tyler 1980, 1981; for discussion in the SLA literature, see Bialystok 1979, 1982; McLaughlin, Rossman, and McLeod 1983; McLeod and McLaughlin 1986). Marslen-Wilson and Tyler (1987) point out, however, that there is no evidence to show, and no reason to suppose, that the automatic projection of linguistic stimuli onto higher-order representations stops precisely at the level of computing morphosyntactic representations. They argue in contrast (Marslen-Wilson and Tyler 1987: 39) that in normal first-pass processing the projection of linguistic stimuli onto a discourse interpretation can be as obligatory as the processing of the lower-order representations. Once again, this weakens the contrast that Fodor wants to make between the processing of grammatical representations and the processing of conceptual ones. iii. Limited access to intermediate representations by the central processor: The central processor, according to Fodor and Schwartz, has limited access to the intermediate shallow representations. This hypothesis is designed to explain why we cannot introspect on-line about the nature of the syllable structures of the speech stream — we merely “hear” sounds in a particular way. Marslen-Wilson and Tyler (1987: 40) observe that all of this might be true without it saying anything at all about which level of processing is the one most readily accessible to the central processor.10 Fodor merely assumes it must be the morphosyntax. He provides no empirical evidence to bolster this assumption. Jackendoff (1989: 117–8) questions the very concept of a central processor à la Fodor. He reports on an experiment by Lackner and Tuller (1976) in which subjects heard repeated instances of stimuli like those in (7.2). (7.2) a. b.
the sun I see the sun I see the sun I see… I like to fish I like to fish I like to fish
After listening to the uninterrupted sequences for a while, subjects reported hearing quite different sequences, which apparently arise as the result of spontaneously regrouping the cues in the stimuli into an alternative representation. Subjects reported hearing (7.3). (7.3) a. b.
ice on the sea ice on the sea ice on the sea… to fish, I like; to fish, I like; to fish, I like
Along with the reported changes in lexical items were reported changes in stress and intonation. Jackendoff (1989: 118) concludes:
INPUT AND THE MODULARITY HYPOTHESIS
265
This experiment shows that multiple interpretations are not limited to the conceptual or syntactic level but extend down even to the level of phonological structure, including intonation. This confirms the claim that STLM [short term linguistic memory, SEC] is computing all levels in registration with each other and that the selection function is choosing not just a high-level conceptual structure but the syntactic and phonological structures in registration with it. It would thus seem to dispel any notion that the selection function is performed by a “higher-level cognitive device” that deals exclusively with central levels of representation such as meaning or even syntax.
Data like this thus suggest that the idea of a central processor interacting with a grammatical module provide a poor model of language processing. Observe that they also provide evidence for top-down effects on the processing of both phonological and morphosyntactic processing. The grammar is not an input system in the sense Fodor intended. iv. Modular processing is fast The processing of speech is very fast, but this does not inform us about the boundaries between modular and the central processes (Marslen-Wilson and Tyler 1987: 40). Marslen-Wilson and Tyler (1987: 44–61) provide a resumé of experimental studies involving word monitoring, the resolution of syntactic ambiguity, and the use of pragmatic inference in anaphor resolution to show that mapping onto a mental model can be just as fast as a putative mapping onto a disambiguated syntactic representation, even when it involves pragmatic inferencing. In other words, having recourse to background information, performing inferencing based on contextual information, visual information or information from outside the hypothesised language module can be extremely fast. Once again, this undercuts the division that Fodor makes between it and the processing of grammatical representations. v. Information encapsulation A module is informationally encapsulated if the information available in and/or derived from central processing cannot direct or guide processing within the module. Marslen-Wilson and Tyler (1987: 41) point out that information encapsulation is not a diagnostic feature of modularity in the same way as the other criteria. Rather, it is definable only in terms of the relationship between the modular and conceptual systems. They note too that there are a number of construals of this constraint on sentence processing. One version, which they dub “autonomous processing”, citing Forster (1979), claims that the output of each processor is determined strictly by its input provided in a bottom-up only fashion.11 There is also output autonomy, meaning that each processor constructs representations
266
INPUT AND EVIDENCE
only in a given representational format, and, finally, there may be late interaction of semantic with morphosyntactic information (Linebarger 1989: 199–200). This third type of encapsulation permits a number of different relationships between the grammatical and conceptual processing systems. All possible parses might be computed with the “correct” one being selected late in the parse based on a match with concurrent parses in the nextup processor. Marslen-Wilson and Tyler suggest that Fodor is prepared to live with the third version (Fodor 1983: 76–78), namely a limited amount of top-down communication taking place late in the parse. Schwartz explicitly denies this possibility. Her view of modularity rather, seems to one of input and output autonomy, which therefore makes it incompatible with the evidence showing that conceptual information can affect processing as early as lexical recognition. Let me develop this further with an extensive quote from Marslen-Wilson and Tyler (1987: 41) Top-down communication between levels of linguistically specified representation does not violate the principle of informational encapsulation… The language module as a whole, however, is autonomous in the standard sense. No information that is not linguistically specified — at the linguistic levels up to and including logical form — can affect operations within the module. Fodor believes that the cost of fast, mandatory operation in the linguistic input system is isolation from everything else the perceiver knows. As stated, this claim cannot be completely true. When listeners encounter syntactically ambiguous strings, where only pragmatic knowledge can resolve the ambiguity, they nonetheless seem to end up with the structural analysis that best fits the context. To cope with this, Fodor does allow a limited form of interaction at the syntactic interface between the language module and central processes. The central processes can give the syntactic parser feedback about the semantic and pragmatic acceptability of the structures it has computed (Fodor 1983, pp. 134–135). Thus, in cases of structural ambiguity, extramodular information does affect the outcome of linguistic analysis. (MarslenWilson and Tyler 1987: 41).
Marslen-Wilson and Tyler (1987: 41) then go on to make clear that to ensure autonomy, Fodor must prohibit the syntactic parser from being told which parse to pursue by the discourse processor; semantic information cannot be used to guide the parse on-line. As Marslen-Wilson and Tyler (1987: 41) emphasise, however, this means that the claim for information encapsulation hinges entirely on the point during the parse at which the contextual resolution of syntactic ambiguities occurs, a question for which Fodor himself provides no empirical evidence. They argue, in contrast, using the same evidence already cited, that context does direct the parser on-line. In particular, citing experimental work by
INPUT AND THE MODULARITY HYPOTHESIS
267
Marslen-Wilson and Young (1984), they describe a naming experiment in which subjects heard one of two contextally-situating clauses followed by an ambiguous fragment such as those in (7.4). See Marslen-Wilson and Tyler (1987: 49–50). See Tyler (1989: 446–8) as well. (7.4) a. b.
If you want a cheap holiday, visiting relatives … (Adjectival bias) If you have a spare bedroom, visiting relatives … (Gerund bias) (Marslen-Wilson and Tyler 1987: 49).
Immediately after the offset of the /s/ of the word relatives, subjects saw a visual probe which either was or was not an appropriate continuation of the ambiguous fragment (is or are). Subjects named the probe and were timed doing so. Marslen-Wilson and Tyler (1987: 49) report a significantly faster naming latency to appropriate probes. They conclude that nonlinguistic context affects the determination of a syntactic analysis very early on. While these results do not force one to adopt an interactive parsing model, in contrast to one where several syntactic parses are simultaneously computed and then the appropriate one selected at the earliest possible moment from contextual information, Marslen-Wilson and Tyler (1987: 51) point out it becomes impossible empirically to distinguish between the two approaches. Information encapsulation will not decide the issue. In addition to the studies reviewed in Marslen-Wilson and Tyler (1987), Marslen-Wilson (1989) provides a considerable amount of evidence suggesting that the lexicon must be excised out from the modular grammar. They propose the interactive model in Figure 7.3. According to this modification of Fodor’s proposals, processing becomes interactive at the point of lexical selection. The Cohort Model of lexical activation and selection proposed by Marslen-Wilson requires that the (modular) processing of the speech signal produce a representation which will then activate all lexical representations with the same phonological representation, linearly, left-to-right. These multiple activations constitute the Cohort. As the phonological processing continues more and more entries will be deactivated, eventually reducing the activated entry to just one or to a handful (in the case of homophones). However, Marslen-Wilson has emphasised that hearers are able to recognise a word without hearing the whole thing. His explanation of this remarkable ability is that word recognition cannot literally involving matching the signal onto the lexicon but rather must involve projecting an appropriate word onto the signal. This projection appears to crucially involve information from the on-going mental model of the discourse. So the selection of the appropriate “lemma”, in Levelt’s (1989) terminology, involves interaction between the grammar, the lexicon, and the conceptual system.
268
INPUT AND EVIDENCE Assume that each stage of processing occurs more-or-less simultaneously.
Semantic and conceptual processing
→
Syntactic parser
→
Phonological parser
→
Semantic representations of the sentence within the ongoing discourse ↑ Syntactic representations (including LF) ↑ Lexical selection and ← On-going model of lexical integration discourse ↑ Lexical Activation ↑ Phonological Representations
Figure 7.3. Interactive parallel parsing
Under this revision, Fodor’s language faculty is simply carved up into smaller “boxes”, what Jackendoff (1989: 101) has dubbed “minimodules”. Thus, phonology becomes a minimodule. Whether syntax is also a module is controversial. Marslen-Wilson’s current position, I believe, is that syntactic processing is completely interactive so that only the phonology is modular. Other modularity fans are basically forced to concede the lexicon but argue that syntax can still be modular after word recognition. Under this view of things, the lexicon becomes, as Jackendoff (1983) argues, a set of correspondences between the autonomous representational systems phonology and morphosyntax. What strikes me, however, is that the modified modular proposals shown in Figure 7.3 correspond to a very structuralist view of language processing. Structuralism has taught us that explanation of linguistic knowledge requires various autonomous representational systems — phonetics, phonology, morphosyntax, derivational morphology, semantics, etc. But it is a great conceptual leap to conclude, as Karmiloff-Smith (1992) has reminded us, that autonomous representational systems are modular. A second leap is made when we move from the need for autonomous representations to the claim that processing is modular throughout development.12 Observe that the structuralist box view of linguistic cognition goes against the spirit of the P&P approach, by which I mean the insight that knowledge of language arises from the complex interaction of certain very abstract principles. Carried over to processing, in an admittedly simplistic way, we would expect that it too takes place on the basis of the deployment of abstract principles which
INPUT AND THE MODULARITY HYPOTHESIS
269
cut across the components of the grammar. The components of the grammar would then not need be modules even in a modular theory of the grammar. With something like this in mind, Frazier (1990) has proposed that there are several postlexical modules, also a significant weakening of Fodor’s position. See Figure 7.4. central system
reference
binding
c-command
θ-predication
c-structure
sisterhood
Figure 7.4. Frazier’s modularity (Frazier 1990: 413)
We come to the conclusion that there is a certain lack of coherence between Schwartz’ adoption of the P&P theory of grammar, her use of this theory to account for L2 acquisition, and the Fodorian view of parsing. How a system like Frazier’s, which is compatible with the P&P theory, actually would enter into acquisition matters is hardly obvious. We are far from having consensus on which parts of the processing system respect information encapsulation. I do not require consensus on this point, however; I merely need some agreement among psycholinguists that the grammar is not informationally encapsulated. That agreement can be found. vi. How to define the dedicated neurological structure? There can be no doubt that dedicated structures relevant to language of some sort exist (Marshall 1980) but it is also clear that there are no “language zones” in the brain. Chomsky, in arguing for his “UG as mental organ” metaphor (Chomsky 1980c) has argued that one cannot state where the circulation system begins and ends. This might be correct, nevertheless, the heart is delimited sufficiently to make heart transplants possible. There is no “language lobe” which could be similarly removed from the brain. It may well be that language and even grammar are distributed across the parts of the brain in much the way connectionists claims. For some evidence to this effect, see Caplan and Hildebrandt (1988) and R.-A. Müller (1996). The literature on brain-imaging provides further support for this idea. The connection between linguistic cognition and neurochemical processing is even more complex making the claim as to what “counts” as a dedicated neuroanatomical structure difficult to assess at the present. One of the most striking facts about The Modularity of Mind is that the empirical evidence for the neurological basis of modularity comes entirely from studies of aphasia. This evidence shows that certain types of aphasia differentially affect language and the conceptual systems. But the evidence is problematic insofar as, say, Broca’s or Wernicke’s aphasics apparently manifest disturbances
270
INPUT AND EVIDENCE
only to some subparts of their linguistic knowledge. It isn’t the case that all of language or even all of the grammar is disturbed. One can read Fodor as saying language is a box; facial recognition is another box; problem-solving and central processing is a third, and the operations in each box do not affect one another. Given his intended audience — psychologists, and others committed to an undifferentiated theory of mind, this makes sense for the existence of specifically linguistic disturbances in an undifferentiated mind is surely surprising. However, once we grant autonomy and modularity as possibilities, what is striking is that the disturbances affect quite specific sub-parts of the grammatical components. Contemporary work on aphasia suggests that impairments in processing affect subparts of the traditional grammatical components. This provides additional evidence that the structuralist views of grammatical components (phonology, morphology, syntax, semantics) do not lend themselves readily to the psycholinguistic facts and will have to be modified (Caplan and Hildebrand 1988; Emmorey and Fromkin 1988; Linebarger 1989). Aphasic breakdowns occur along other lines. Consider in this regard, Kean’s (1977, 1980) now well-known arguments that the telegraphic speech typical of Broca’s aphasia is the result of a phonological deficit. Kean argues that the language processors in such patients are not computing the relevant distinctions between clitics and phonological words. The deficit is quite specific; the computational system is not making a given distinction with respect to phonological form which has consequences for the shape of morphosyntactic representations. But notice that the phonological system in its entirety is not disrupted. The deficit does not otherwise affect the subject’s knowledge of syllable structure constraints, or her knowledge of stress placement. Comparable remarks can be made about morphosyntactic processing. Caplan and Hildebrandt show that antecedent-trace relations can be rendered inaccessible. Subjects exhibiting this particular pattern of disturbance do not necessarily manifest disturbances elsewhere in the syntactic system, say in their knowledge of predication or modification patterns. There can be quite specific disturbances affecting the lexicon, for example, an aphasic patient might read nonce expressions but not be able to write them to dictation (Beauvois and Dérousné 1979). Emmory and Fromkin (1988) review a number of studies which demonstrate how localised within the lexicon such disassocations can be. Moreover, they do not just affect the phonological or morphosyntactic representations in the lexicon. An aphasic can exhibit quite specific semantic deficits having almost all of his lexical semantics intact but being unable to access words from one semantic field. So the evidence does not support a cut between grammar and semantics, and suggests, moreover, that the deficits are affecting subparts of grammatical knowledge, not the grammar itself.
INPUT AND THE MODULARITY HYPOTHESIS
271
If the grammar were a module, one would predict exactly the opposite. To sum up, research in aphasia makes it difficult to support Fodor’s claim that a module corresponds to a specific neuroanatomical structure. There is no simple association between parts of the brain and grammar, on the one hand, and specific impairments typically manifest much smaller parts of the grammar, the lexicon, or the semantic system. Neurolinguistic research provides little support for the idea that there is a either “language organ” or a “grammar organ” in any non-metaphorical sense. vii. Developmental dissociations Similar conclusions can be made on the basis of ontogenesis. Dissociations between grammatical development and intelligence as measured by standardised intelligence tests do not of themselves support Fodor’s claims. Conceptual development, as we have seen, manifests a characteristic pace and path. Similar observations must be made about metalinguistic development, problem-solving, and inferencing. Fodor’s ontogenetic argument, if construed as a sufficient condition for modularity, would support the claim that just about all of cognition is modular. Even SLI is not such a clear example supporting Fodor’s position. All of the studies mentioned earlier proposing a specifically linguistic deficit (as opposed to a general one) claim that some single aspect of the morphosyntax is affected (binding, government, agreement, or morphosyntactic features limited only to suffixes). There is no general syntactic deficit proposed. In addition, none of this data involves on-line studies of language processing, so they deal not with modularity but with the autonomy question. Let me conclude Subsection 3.2: The empirical data from processing does not support the idea that grammatical processing is encapsulated, operating independently of the conceptual system. At best we have evidence for modular processing in the phonetic system. We have, however, seen substantial evidence for the Autonomy Hypotheses. Fodor’s arguments for modularity are in reality arguments for the autonomy of grammatical representations. 3.3 Problem three: The relationship between parsing and knowledge in proficient native speakers and acquisition in L2 learners is unclear Even if we had a theory of parsing for adult native speakers, it is hardly obvious that it should generalise in a straightforward way to second language acquisition. While it may be true that parsing in such adults is mandatory, unconscious, and fast, there is no reason to assume at present that parsing in learners exhibits the same properties. There has been little exploration of the nature of parsing in
272
INPUT AND EVIDENCE
learners beyond those works cited in earlier chapters, and few of these studies ask how parsing changes over time. Generative studies of parsing in L2 learners are few and far between. This means that SLA generativists mainly have studies of native speakers to rely on. It is hardly obvious that evidence derived from studies of such speaker/hearers with highly overlearned parsing procedures and the “right kind” of grammars will carry over to learners hearing forms and structures for the first time. In addition, the generative parsing models available presuppose that the language has been acquired. They explore the question of how linguistic knowledge is deployed in real-time (e.g. Frazier 1978; Pritchett 1987, 1992; Gorrell 1995). They do not ask the question: How do parsing mechanisms interact with acquisition mechanisms? They simply have nothing to say about this. I have already introduced the one major exploration of the relationship between parsing and language acquisition in primary language acquisition within a UG perspective, namely Berwick (1985). It is quite limited in its scope, examining only the acquisition of certain basic parameters of syntax, and presupposes that lexical acquisition, including the acquisition of major syntactic classes is complete when syntactic parameter-setting occurs. How this happens is not spelt out but it is assumed that conceptual knowledge will play a role in telling the learner which categories in the input will correspond to THING categories, which to ACTIONS, which to EVENTS, etc. So Berwick is assuming some form of semantic bootstrapping where prior conceptual structures guides morphosyntactic classification. To accept bootstrapping in acquisition is to deny that information goes bottom-up only from the stimulus to the conceptual system. What about the phonetic module? I made the point above that there is some consensus emerging among psycholinguists that the phonetic processors may be modules in Fodor’s sense. Recall that phonetic acquisition may be an instance of canalisation. We observed that the neonate has perceptual discrimination capacities which can categorically distinguish certain basic contrasts. Over the next few months the capacity emerges to discriminate all of the phonetic distinctions needed to acquire natural languages.13 However, as observed by Jusczyk and his associates (Jusczyk, Pisoni, Walley, and Murray 1980; Jusczyk, Pisoni, Reed, Fernald, and Myers 1983; Jusczyk, Rosner, Reed, and Kennedy 1989), there are numerous parallels between enfants’ processing of speech sounds and their processing of complex non-speech sounds, so that general auditory mechanisms might be responsible for the initial development of speech perception (Jusczyk 1992: 43). Kent (1992) has, in like fashion, drawn explicit parallels between the development of speech production and general biological development, especially motor development. How do we relate these facts to the issue of autonomy and modularity in development? As we saw previously in the
INPUT AND THE MODULARITY HYPOTHESIS
273
discussion of innateness, one can grant that a cognitive system can be under strong biological control and still manifest experience-dependent learning (see, e.g. Kent 1992; Locke and Pearson 1992). Jusczyk (1992), for example, is committed to the view that knowledge of the sound system is in some sense given. The infant’s knowledge of the specific properties of her L1 develops so quickly that acquisition processes must be exploiting autonomous representations. However, they are interacting constantly with stimuli and non-language specific perception. In short, one can observe parallelisms across cognitive domains without denying autonomy. The converse also holds; the existence of autonomous representational systems does not preclude the deployment of general auditory or motor mechanisms. 3.4 Problem four: It cannot be true that all grammatical restructuring takes place as a direct consequence of a failure to process input bottom-up In a modular system the prompt for the acquisition mechanism to go into action can only be the failure of the parser to correctly analyse input arriving from processors lower down in the analysis system. This is because each module can only analyse module-internal information or inputs derived as outputs of such processors. In particular, whether the current grammar is compatible with the learner’s mental model of the language, his represented beliefs about the current situation or how conceptual representations is irrelevant. Conceptual representations cannot trigger acquisition. Interpreted in the context of the P&P theory, two obvious drawbacks from this assumption arise. The first is that it predicts rapid restructuring of the L2 interlanguage grammar whenever novel input requires restructuring. Since parameters are binary, there can be no doubt about the consequences of parameter-resetting. If setting “+P” doesn’t work, then setting “−P” is the only alternative. The fact that learners may and do tolerate several representations of the same grammatical information over time is incompatible with the claim that restructuring occurs on-line in response to a parsing failure. Flynn’s frequent assertions that L2 learners need time to “work out the consequences of parameter-resetting” is difficult to interpret if we attempt to understand parameter-(re)setting in terms of modularity, parsing and the triggers for learning. The assumption also fails to explain why restructuring occurs even when the learner’s representation seems to accurately match the input (Bowerman 1981a, 1982a, b, 1987; Karmiloff-Smith 1986, 1989; de Bot et al. 1995).
274 4.
INPUT AND EVIDENCE
Whither Linguistic Competence?
Recall that Schwartz defines the term “Linguistic Competence” as the knowledge which results from acquiring language via modular processing. This usage constitutes a radical departure from the way this term is normally used in linguistics and psycholinguistics. Chomsky (1965: 4) originally introduced the term linguistic competence as a general cover term for whatever knowledge a speaker-hearer requires to understand or produce a sentence of his language. That knowledge was considered to include semantics, writ large, and I see no reason to abandon that usage. To adopt Schwartz’ position is to use the terms “linguistic competence” and “language” to denote an ever smaller range of phenomena, leaving many of the things we want to discuss and analyse outside of the purview of a theory of “language”. This is hardly an outcome to be desired. Schwartz’ Linguistic Competence could not include semantics and lexical knowledge since they fall outside the modules. We were obliged to place semantics outside of Linguistic Competence because there is no real distinction between semantics and pragmatics and pragmatics must belong to the domain of non-modular knowledge. Linebarger (1989: 201–2) has made the important point that the distinction between what counts as syntax and what counts as semantics has been considerably obscured in generative grammar. This has obvious consequences for claims that syntactic processing is or is not influenced by information encoded in conceptual representations. If semantic information were to be distributed across modules, it would make it extremely difficult to test claims about the encapsulation of information, or indeed about the autonomy of representational systems. I shall argue below that we do find semantic information distributed across modules. Finally, consider that maybe only the phonology is modular in Schwartz’ sense. If so, then Linguistic Competence would reduce to phonological knowledge alone. I doubt if even Schwartz would be willing to live with this result. 4.1 What does a psychogrammar really consist of? Having argued that language acquisition theory must concern itself with the development of psychogrammars, I will now try to establish just how rich that construct is, and by implication, how impoverished the construct Schwartz invokes is. 4.1.1 Rules Baker (1991) has argued that psychogrammars must include rules in addition to universals. He argues for a language-specific rule of movement (Not-movement)
INPUT AND THE MODULARITY HYPOTHESIS
275
which violates several assumptions of P&P theory (and a fortiori the Minimalist program), namely it mentions a language-particular contextual term, is obligatory, and requires the presence of a construction that exhibits idiosyncratic lexical restrictions. In addition, Baker argues that the properties are not the incidental result of the application of universal principles (Baker 1991: 389). Consequently, the phenomenon in question falls within the so-called periphery of the grammar, rather than its core. It should be understood, given the nature of the phenomenon, that this terminological distinction can hardly relate to the conceptual, semantic or syntactic importance of a structure — negation in English is of central importance. Rather, it falls within the periphery because it is rule-governed, rather than the consequence of parameter-setting. Baker (1991: 399–400) proposes the analysis in (7.5) to account for the facts in (7.6). (7.5) a.
b. c. d. e.
(7.6) a. b. c. d. e. f. g. h. i. j. k. l.
A finite verb must move to the left of not. This rule requires a special-purpose verb phrase. not V (obligatory) [+Tense] Condition: V must be [+Special] The special purpose verb phrase as defined in (7.5d) and (7.5e) falls within the general class of limited-use phrases. A limited-use phrase can be employed only when it is specifically called for by the grammar. do is a verb and takes a VP as a complement. For most English verbs, the special-purpose variant of a VP is formed with do. Modals, perfect have, and be have special-purpose VPs identical in form to the general-purpose phrase. *Carol not will send the package. Carol never/always/frequently (will) send(s) the package. *Carol not sends the package. Carol will not send the package. *Carol sends not the package. Harold never WAS very polite. *Harold WAS never very polite. Carol does not send the package. *Carol does send the package. *Carol does never send the package. Carol DOES send packages. Carol SENDS packages.
276
INPUT AND EVIDENCE
Sentences (7.6a, c) show that the VP-scope-taking adverb not cannot appear to the left of a tensed verb, while (7.6b) shows that other adverbs can appear in precisely this position.14 Sentence (7.6d) shows that modals can appear to the left of the adverb while (7.6e) shows that non-modal and non-auxiliary verbs cannot. Sentences (7.6f, g) show that Focus blocks the appearance of the auxiliaries and modals to the left of the adverb. Sentences (7.6h–l) show that do must be inserted when not appears and may occur when the predicate is focused. Recall that there are core grammar analyses of negation and do-support which do not use lexically-specific rules to account for the facts (Pollock 1989 and Chomsky 1991c). These analyses are construed in the SLA P&P literature as an exact account of the linguistic knowledge of native speakers and L2 learners who have attained the relevant proficiency. They posit novel functional categories (Tns, Neg, Agr), general movement schemas, and a parametric difference between languages that have “strong” Agr and those that have “weak” Agr, where these terms are meant to refer to the semantic-role assigning properties of Agr. So in English, tense gets lowered onto the verb stem in VP because Agr is “weak” and only non-semantic-role assigning verbs can raise, while in a language like French, all verbs raise. See the intermediate derived structures in (7.7). We saw that precisely this parametric difference between the two languages is designed to explain why in French the adverb is normally to the right of the verb (Jean voit souvent son amie) while in English it appears to the left of the verb. (John often sees his friend but *John sees often his friend) (7.7) a.
TP Tns′
DP Johni + Nom
Tns
AgrP
Pres
Agr′ Agr ti
VP VP
ti Adv
V′
often
see his friend
INPUT AND THE MODULARITY HYPOTHESIS
277
TP
b.
Tns′
DP Johni + Nom
AgrP
tj
Agr′ VP
Agr ti
Adv often
V′ V0
DP
see+Presj
his friend
In order to explain the existence of do-support in English, it is hypothesised that Neg heads a projection of its own above VP and that lowering of the Tns to an embedded position beneath it leads to an Empty Category Principle violation.15 It must be emphasised that there are many accounts of these facts in the linguistics’ literature, so what is relevant to the discussion is the adequacy of any given account. Baker (and Iatridou 1990) make a case that in order to get an account in terms of principles and parameters of UG, there are unacceptable costs. Baker lists (i) granting every adverb in English (but not in French) two d-structure positions as modifiers of Tns and as modifiers of V. This move is ad hoc. (ii) There must also be dual lexical specifications for the same adverbs so that they will project into the syntax in the correct fashion. (iii) This move also entails a second but otherwise unmotivated parametric choice within UG. (iv) The analysis also entails complications in the representation of modals and auxiliaries. (v) Since the differences between raising verbs and verbs which do not raise has been attributed to semantic-role assigning differences, there ought to be semantic differences across the relevant sets of verbs to explain wellknown dialectal differences (John doesn’t have a car/John hasn’t a car). In my English (where both forms exist), there is no obvious difference in meaning, although these two sentences might differ in register.16 (vi) Pollock treats do in imperatives, e.g. Don’t be cruel! as a quite distinct phenomenon from dummy do. Rather he regards it as main verb do although this is entirely inconsistent with the semantics of do in this sentence. The only motivation for this move is that it is required by the analysis. (vii) He postulates an empty verb Ø, a null version
278
INPUT AND EVIDENCE
of dummy do to account for simple affirmative sentences, e.g. John left. Not only is this move ad hoc but it raises some rather obvious difficulties for language acquisition. Why would a language learner postulate both a phonetically null and a phonetically full dummy verb? Chomsky’s analysis posits an intermediate Agr present at S-structure but missing at LF, which is equally ad hoc. In short, the P&P analysis of English negation is problematic. Let us assume that Baker’s conclusions in this instance are well-founded. Then the conclusions for us in acquisition are straightforward — psychogrammars consists not only of representations constructed out of the interaction of principles and parameters, they also includes representations constructed on the basis of rules. We must assume that these rules are acquired since they are language-specific. Indeed, if Baker is correct, they are lexical-item specific. The consequences are obvious: to explain this type of cross-linguistic variation, the LAD must include mechanisms capable of learning the rules. What could this mechanism be except a mechanism which induces rules?17 Let us suppose that we grant that rule-learning must be one of the possible outputs of the language acquisition mechanisms. What mechanisms must we then grant? One move to make (in the spirit of Schwartz’ approach) would be to distinguish two classes of rules, say k-acquired and i-learned rules. If one indeed took this option, one would then have to demonstrate empirically that the rulelearning needed to explain the distribution and interpretation of not and do in English constitute an instance of k-acquired rules. K-acquired rule learning would be different from i-learning in that it would be bottom-up, automatic and modular. The demonstration of empirical differences between k-acquired rules and i-learned rules would be anything but straightforward. Let me therefore turn to another class of problems, involving the interaction of meaning and form. 4.1.2 Meaning and form Meaning and form are intertwined throughout the grammar. This is hardly surprising since language is normally used to convey meaning. The question of interest for a modular theory of parsing is: Can one design a theory of parsing in which conceptual structures and mental models do not drive the construction of phonological and morphosyntactic representations? At the moment the answer is open, but, as we have seen, it is well understood in contemporary parsing theories that conceptual information is used at various points during a parse. It is simply not clear whether it is determining the nature of a syntactic parse or merely selecting among a number of options made available by an informationally encapsulated syntactic processor.
INPUT AND THE MODULARITY HYPOTHESIS
279
The question for a theory of language acquisition is quite different: Can one acquire the properties of language specific phenomena without deploying semantic information in order to properly delimit the class of structural phenomena? I’m going to try to make the case that you can’t, and moreover I’m going to try to show to what extent semantic constructs are already embedded in the theory of grammar. Consequently, it will be virtually impossible to make the kind of grammar/ semantics and acquisition/learning bifurcation that Schwartz is trying to make. I have been able to locate in the literature 3 different potential problems for Schwartz’s approach: (i) prosodic meaning, (ii) the status of language-specific semantic phenomena, in particular, constructions whose use depends on the deployment of pragmatic information (iii) the nature and status of the lexicon. i. Prosodic meaning The prosodic systems of tone, pitch-range, loudness, tempo and rhythm can all indicate expressive (non-logical) aspects of meaning. In other words, we use these systems to infer information about the speaker’s state of mind, his attitudes towards what he is saying, or to some proposition in the on-going discourse (Crystal 1969: Ch. 7). Increased loudness can indicate anger. Increased tempo can indicate excitedness. Particular tone contours can be used to infer sarcasm, doubt or disbelief, etc. In short, various aspects of the acoustic-phonetic signal serves as a basis for drawing conclusions about the speaker, which are of an obviously pragmatic sort. In the Schwartz model, this kind of inferencing should take place in the central processor. Notice, however, that the central processor has no direct access to the acoustic-phonetic attributes of the signal. Consequently, the central processor doesn’t have the relevant input. It is also important to note that the particular acoustic-phonetic property-meaning correspondence is language or language-community specific. We are not talking about universals here. So the correspondences must be learned. If the central processor cannot see the properties of phonetic representations, and if it cannot influence the processing of this information, it is not obvious how the relevant correspondences between acoustic-phonetic property and pragmatic inference could be acquired.18 A related, but not so clear set of cases involve the acoustic-phonetic correspondents of information structures, i.e. of new and old information or focus constructions. The stress placement in the sentences of (7.8) are relatable to presuppositions, presuppositions being encoded in conceptual structures. (7.8) a.
What HAPPENED? The BOOK fell over. The BOOK fell OVER. *The BOOK FELL over.
280
INPUT AND EVIDENCE
b.
WHAT fell over? The BOOK fell over. *The book FELL over. c. Did you throw the book against the wall? No, it just FELL OVER. *No, the BOOK fell over.
That the placement of nuclear stress has implications for interpretation is not in debate (Halliday 1967; Chomsky 1970; Jackendoff 1972, 1995; Selkirk 1984). Nor is it in doubt that the proper interpretation of such discourse depends upon mental models of the ongoing discourse, stored information about the world, etc. (Culicover and Rochemont 1983; Rochemont 1986). What is, however, of controvery is whether the acoustic-phonetic properties can be directly read by the conceptual system or whether there is a morphosyntactic feature, namely [+Focus], assigned to morphosyntactic representations on the basis of the prosodic characteristics of stimuli, which in turn serves as input to the conceptual system (Jackendoff 1972, 1995; Selkirk 1984; Culicover and Rochemont 1983; Rochemont 1986; Winkler 1994). It is normally assumed in generative treatments of information structure that there is a mediating syntactic feature. The arguments for this assumption, as far as I can tell, are theory internal rather than empirical, in other words, it is precisely to preserve separate and non-interacting components in the Y-model of transformational generative linguistics (Chomsky and Lasnik 1977) that the syntactic feature is needed.19 This model is shown in (7.9). (7.9)
d-structure s-structure phonological form (PF)
logical form (LF)
phonetic interpretation
conceptual interpretation
The bifurcation of the grammar into two distinct “interpretive” components PF and LF makes it essential to have abstract features like [+Focus] so that both interpretive components of the competence grammar get the right input. The model in (7.9) is a competence model. It certainly does not correspond to a model of parsing. What assumptions would be needed to preserve the information-encapsulation of the grammar? Minimally, one would need to assume that there were a unique set of acoustic-phonetic or phonological properties which could be put into correspondence with the [+Focus] feature, in turn assigned to a morphosyntactic constituent, which would then be fed into the
INPUT AND THE MODULARITY HYPOTHESIS
281
conceptual system to be interpreted. Unfortunately, there is no such unique set of properties — as Chomsky (1970) made clear, the same structure and prosody could derive multiple foci. The actual selection of a relevant focus thus depends on the use of information in the current MM of the discourse. Consequently, learning the correspondence between the feature and the prosody means that pragmatic information must interact with phonetic information. ii. Pragmatics, syntactic structures, and the language faculty Prince (1988) has pointed out that the term “pragmatics” is not well understood insofar as it is often used to characterise a disparate set of phenomena. Some aspects of interpretation of sentences in discourse are not linguistic in that an inference does not require any knowledge of the language beyond the ability to process the literal meaning of the utterance. She provides the cases in (7.10). (7.10) a. b.
It’s cold in here. Shut the window. (Prince 1988: 166)
Prince observes with respect to (7.10a) that it can, in some particular context, pragmatically implicate (7.10b). In other words, the uttering of (7.10a) will be construed as if it meant (7.10b). In at least some contexts, that is precisely the meaning of (7.10a) that the speaker will have intended. Prince, however, adds that the existence of cases such as (7.10) does not mean that there is no linguistic inferencing at all, i.e. inferencing in which drawing the correct implicature depends precisely on the knowledge of what a given linguistic structure can mean in a given context. She then proceeds to investigate a number of cases where the inferencing must be language-specific, and hence acquired with the language. For example, English has a specific construction for presenting focused information, namely the it-cleft. (7.11) a.
b.
[Whether the Israelis found Eichmann alone, or whether someone informed them, is not known. Both Wiesenthal and a second Nazihunter, Toviah Friedman, have claimed that …] i. … they found Eichmann. ii. … it was they who found Eichmann. [Just last week Eichmann’s supporters claimed he would never be found and this morning Wiesenthal and Friedman announced that …] i. … they found Eichmann ii. #… it was they who found Eichmann. (Prince 1988: 168)
282
INPUT AND EVIDENCE
Both the clefted and non-clefted sentences in (7.11a) are fine because they both can be construed as having the presupposition SOMEONE FOUND EICHMANN and the new information SOMEONE = WIESENTHAL AND FRIEDMAN. In contrast, (7.11bii) provides a presupposition SOMEONE FOUND EICHMANN where none is possible. Prince then goes on to demonstrate that the information structure of the cleft construction does not follow from some nonlinguistic iconicity and that a closely related language, namely Yiddish, has no construction which syntactically highlights the focused constituent, while syntactically subordinating the presupposed information. Yiddish has a functionally equivalent construction — the dos construction. See (7.12). (7.12) a. b.
… zey hobn gefunen aykhmanen they have found Eichmann … dos hobn zey gefunen aykhmanen this have they found Eichmann … it have they found Eichmann (Prince 1988: 169)
The construction in (7.12b) consists of a dummy NP in initial position, a raised auxiliary verb (V2 position), postposed subject and VP. The same structure can be found in the es-construction which has quite a different semantics. So, the basic point is that the functions of these different constructions are the same cross-linguistically but their syntax is language-specific, and so, obviously, is the correspondence between the syntax and the pragmatic functions. I conclude that the correspondence must be acquired. Now let us return to our basic question: How could a learner learn the relationship between the syntax and the pragmatics of these constructions? According to the Schwartz proposals, the acquisition must proceed in such a way that the syntax can be acquired independently of the semantics, bottom-up. But surely there is nothing in syntactic parsing which requires that It is John who broke the glass should have a different discourse implicature than John broke the glass. See (7.13). (7.13) A: B: A: B:
Someone broke a glass in the kitchen. It was JOHN who did it. I beg your pardon? JOHN broke the glass. #John broke the GLASS.
In the cases in question, even if we assume that the syntactic representation includes a Focus feature on John in (7.13), for example, it is still the case that
INPUT AND THE MODULARITY HYPOTHESIS
283
the learner must be able to learn the appropriate meanings (what Focus entails) and establish the appropriate functions for the structure in discourse. Parsing may not require working backwards (or downwards) from the pragmatic system to the syntax but it is difficult to see how the pragmatic functions could be acquired and the correspondences between the syntactic form and the meaning made without such a move. iii. The nature and status of the lexicon Schwartz regards the lexicon as part of the language module, not surprisingly since it is hard to imagine a language without the information presumed to be stored in the lexicon — forms, meanings, and form-specific morphosyntactic information. However, this makes no sense from either a processing perspective or from the perspective of the contents of the lexicon. As Marslen-Wilson (1987: 71) has emphasised, understanding language involves connecting forms and meanings. Word recognition is at the heart of this since it is the representations of both, stored in longterm memory, which serves as a bridge between, for example, what we actually hear and the syntactic and semantic properties of the sentences which are projected from words. Marslen-Wilson points out that word recognition involves three functions: lexical access, lexical selection and lexical integration. Lexical access involves the activation of stored information in the lexicon. Lexical selection involves choosing an appropriate candidate for integration into higher order syntactic and semantic representations. MarslenWilson and his colleagues have shown that word recognition is very very fast. Words can be accessed and selected before sufficient acoustic-phonetic information is available for word identification on the basis of matching a stored representation of form to the processed signal. As I pointed out above, MarslenWilson has argued that signal processing is modular up to the point of lexical access (i.e. it is fast, automatic, informationally encapsulated, etc.).20 However, as stimuli are processed on-line by phonetic and phonological processors, multiple lexical entries are accessed and then assessed for their syntactic and semantic appropriateness in the current context. Assessment is clearly interactive — contextual representations (pragmatic information) are used on-line to assess the appropriateness of a particular lexical entry for the ongoing parse. What this means for our purposes is that the lexicon itself is not modular. This conclusion should hardly be surprising when we consider again what the lexicon does. Jackendoff (1983: 9, 1987, 1995) has described the lexicon as a correspondence system linking phonology to syntax, and syntax to conceptual structures. (This view of the lexicon has been explicitly encoded in the functional architecture of the language faculty shown in Chapter 8.) Almost by definition,
284
INPUT AND EVIDENCE
then, the lexicon must not be informationally encapsulated. Moreover, studies of lexical structures over the last 25 years have led to a considerable enrichment of the semantic elements assumed to be part of lexical entries — argument structures (Grimshaw 1990), semantic roles (Gruber 1965; Jackendoff 1972, 1983, 1990b; Higginbotham 1983, 1985), event structures (Davidson 1967; Higginbotham 1983, 1985; Parsons 1990; Grimshaw 1990; Pustejovsky 1991), and aspect (Tenny 1987; Grimshaw 1990). These semantic elements constrain word formation processes, as well as the correspondences between arguments of predicates and potential arguments in sentences. They and certain non-lexical semantic properties, e.g. predication (Williams 1980; Rothstein 1985; Bowers 1993), provide general constraints on the well-formedness of sentences. This last aspect is critical. Semantic elements are now considered to be licensing elements for syntactic constructions. So we can legitimately ask the question: What is the nature of modularity in a theory of grammar in which a rich variety of semantic constructs are required to license syntactic structures? What could it possibly be? Let me get at the problem from another direction. What exactly is the nature of lexical semantics vs. pragmatics such that the former might be included in the language module while the latter definitely is not? The answer is anything but clear. It is hardly convincing to claim that “lexical semantics” is whatever we discover plays a role in licensing syntactic positions, while “pragmatics” is everthing else. This is not a principled response. Finally, I note that Schwartz (1993) explicitly argues the lexical information is learned. This basically renders vacuous her assertion that the grammar is a module for language acquisition.
5.
Summary
All of the evidence reviewed above suggests that the psychogrammar involves a much richer construct than the principles and parameters which hold the spotlight in linguistic theory. Consequently, it involves a much richer notion than Schwartz’ Linguistic Competence. A model of the acquisition of a psychogrammar must therefore countenance the acquisition of language-specific morphosyntactic rules, possibly like the rule of verb-movement-over-negation proposed by Baker. It must also explain how learners can learn to link specific interpretations both with syntactic constructions and prosodic structures. It must explain lexical acquisition. All of these aspects of linguistic knowledge would appear at present to require the development of a theory of induction, in particular, a theory of induction in which linguistic structures interact in constrained fashion with
INPUT AND THE MODULARITY HYPOTHESIS
285
conceptual information. We have seen in this chapter that Schwartz has attempted to exclude induction from the theory of second language acquisition on the grounds that linguistic processing is modular and, in particular, that information from the conceptual system cannot interact with grammatical information. I have shown that there is no empirical basis for this claim. Moreover, the Fodorian version of modularity is seriously flawed as a framework for characterising language processing in competent and fluent monolinguals. I have emphasised that the connection between a model of on-line parsing in mature L1 speakers is not, in any event, obviously relevant to a theory of language acquisition in adult L2 learners. The connection must first be articulated. There are therefore no grounds for excluding i-learning from a theory of SLA, and many good reasons to adopt it.
Notes 1. A weak argument is nonetheless an argument. A non-argument is that adopting the P&P theory allows one to discuss second language acquisition in terms of “serious” theories of grammar. It may be true that this is the correct explanation for the amazing institutional success in our field of the P&P theory. To which one can only comment: “Say, it ain’t so, Joe!” 2. Logically speaking, modularity does not entail autonomy. It might be that there are separate processors processing the same distinctions in the same representational format. This seems a rather perverse way for nature to have arranged things. 3. There is much talk of “modularity” in current linguistic literature — unfortunately since what is at issue are things like case theory, binding, and so on. It is not clear if grammarians are making claims about the autonomy of the components they attribute to the grammar, or if they intend the term in the way that Fodor did. In the latter case, one would expect the putative module to correspond to a specific processor, to be subtended by fixed neuroanatomical structures, to be automatic, fast, and informationally encapsulated, etc. By these criteria, there is no evidence that theta theory, the binding theory, or Case theory are modules. To refer to these components of the grammar as “modules” is likely to create enormous and unnecessary confusion. 4. Naturally, there are nay-sayers. Leonard and his colleagues have argued in a variety of studies (see, e.g. Leonard, Sabbadini, Volterra, and Leonard 1988) that SLI is due to a problem in auditory perception which is not limited to language. Bishop (1994: 523), however, points out that this analysis is inconsist with Leonard et al.’s own data which shows that noun plural /s/ created fewer problems for English-speaking SLI children than third person singular /s/. A perceptual deficit alone does not explain this observation. Bishop himself considered two alternative accounts (i) the Misattributed Markers Hypothesis (a grammatical deficit which forces children to rely on semantic cues to direct their use of morphosyntactic affixes), and (ii) the Vulnerable Markers Hypothesis (grammatical distinctions are intact but control of the distinctions is inhibited by processing complexity). Bishop (1994: 532) notes that the Misattributed Markers Hypothesis cannot be adequately tested using spontaneously produced corpora. His own experimental work did not support the Vulnerable Markers Hypothesis. He suggests nonetheless that some children appear to be affected by a slow parser which “cannot keep up
286
INPUT AND EVIDENCE with the input delivered by a message conceptualiser or, alternatively, an influence of phonologically complex material on the formulation of the next stretch of an utterance” (Bishop 1994: 532–33).
5. Notice that the hypothesis that autonomous systems are connected by sets of correspondence rules follows from these assumptions of sensitivity to specialised inputs and shallow outputs. 6. Asking this question involves asking if Krashen’s references to the LAD are de re or de dictu. If he had something specific in mind then it is likely that Schwartz’ model is a radical departure for the reasons stated. If, however, Krashen meant merely “Whatever you linguists mean by UG and the LAD, that’s what I mean too”, then Schwartz’ model can be seen as a consistent modification. Schwartz’ views on the modularity of acquisition remain incompatible with Krashen’s claims about k-acquisition based on understanding what one cannot parse. 7. In fairness to Schwartz, it is likely that even the earliest cited works may not have been available to her when she was writing her thesis, completed in 1987. 8. It is not clear that so much should be riding on the distinction anyway. Studies of implicit learning abound which show that humans can learn implicitly structural and non-structural generalisations (Berry and Broadbent 1984; Reber 1976, 1989). There is even research which shows implicit memory for linguistic and non-linguistic stimuli, i.e. that recall can occur without any conscious recollection (Tulving, Schachter, and Stark 1982; Roediger 1990; Jacoby, Toth, and Yonelinas 1993). 9. One should not be led away from the veracity of this claim by the fact that much of the literature on problem-solving attempts to couch it in terms of domain-general procedures. That is a consequence of discussing problem-solving in performance terms (as a skill) rather than as competence. 10. Tanenhaus et al. (1985) make a similar observation. 11. Linebarger (1989: 199) calls this “input autonomy”. 12. Karmiloff-Smith (1992: 6) rightly decries the constant confusion of autonomy and modularity to be found in the linguistics literature. She writes that a knowledge domain is “the set of representations sustaining a specific area of knowledge: language, number, physics, and so forth. A module is an information-processing unit that encapsulates that knowledge and the computations in it… the storage and processing of information may be domain specific without being encapsulated, hardwired, or mandatory.” 13. As noted previously, these capacities appear to be given a priori and are not induced (Streeter 1976; Trehub 1976; Werker and Lalonde 1988; Best, McRoberts, and Sithole 1988; Werker and Pegg 1992). 14. Adverbs vary according to their scope and semantics. Jackendoff (1972: Ch. 3) refers to 4 classes — Speaker-oriented adverbs, as in (i), subject-oriented adverbs as in (ii), predicate adverbs, as in (iii), and sentence-final modifiers, as in (iv). The adverb in (v) has the entire proposition in its scope. (i) (ii) (iii) (iv) (v)
Frankly, John doesn’t intend to leave Marsha. John carefully kept the truth from Marsha. John answered the questions promptly. John did the work alone. John never answers the questions.
15. There are various definitions of the ECP in the literature. For the sake of concreteness, I state Chomsky’s (1981: 274–5) definition, which may be one of the most explicit, in (i):
INPUT AND THE MODULARITY HYPOTHESIS (i)
287
Generalized ECP: If α is an empty category, then (a) α is PRO if and only if it is ungoverned, (b) α is trace if and only if it is properly governed (c) α is a variable only if it is Case-marked.
16. This is obviously a speculation on my part and I hesitate to make it since introspection is notoriously unreliable. Certainly, the two sentences in the text seem to contrast in formality to me, but then John hasn’t a bloody dime to his name is just fine and definitely informal. 17. Following a similar line of argumentation, Culicover (1999) argues for lexeme-specific morphosyntactic categories, lexically idiosyncratic constructions. Independently of the line of reasoning I have developed over the last ten years, he also argues for a correspondence-based approach to constraining induction. 18. Robert Bley-Vroman (p.c.) has pointed out to me that this problem could be fixed if it were assumed that there was a separate non-linguistic input system conveying the acoustic information directly to the central processor. I disagree. The phenomena that I am referring to are paralinguistic, they are indeed instantiated in linguistic stimuli and evaluated with respect to the semantic contents of those stimuli. The inferences we draw from listening to a sentence articulated with certain paralinguistic properties are not the inferences we draw from listening to a sequence of arbitrary sounds articulated with the same properties. I don’t think this problem will go away quite so easily. 19. Culicover and Rochemont (1983) argue that the two aspects Focus interpretation and prosodic prominance are not commensurate so that one cannot reduce to the other. Winkler (1994) disputes the claim, which leads me to conclude that the issue is not yet settled.. 20. Marslen-Wilson and his colleagues have not concerned themselves with the problem of prosodic meaning, a problem which, as we have seen, suggests that the phonology may not be completely encapsulated.
C 8 The evidence for negative evidence
1.
Introduction
In previous chapters I have discussed theoretical and metatheoretical debates in order to motivate the Autonomous Induction Theory, a theory which includes both bottom-up and top-down processing and which allows a place for correction, feedback and other forms of metalinguistic information to play a role in second language acquisition. In this chapter, I want to examine the empirical evidence that this move is necessary. Three classes of studies will be examined in Section 2. First of all, I will examine some of the literature looking at interactions between native speakers and non-native speakers. I then go on to review some of the literature on the effects of grammatical instruction on L2 knowledge. Then I look specifically at studies of feedback and correction. The heart of this section will be a review of my own experimental work on this topic. I will then go on in Section 3 to present some new experimental evidence which provides additional confirmation of my claim that learners can learn abstract linguistic relations and properties from feedback and correction. I must emphasise, however, that for all three types of studies results are very mixed. Thus consider the literature on interaction and SLA. Although it has been argued that conversation structures the input in such a way as to facilitate learning, the results show only that interaction can lead to greater comprehension on the part of the learner, and then perhaps only on certain tasks. Despite claims to the contrary, the literature cited does not show that indirect forms of feedback cause grammatical restructuring. Or, consider grammatical instruction. Classroom studies are notoriously difficult to conduct because of the large number of factors which can influence the results on tests. The clearest evidence for the effects of grammatical instruction therefore comes from experimental studies such as those examining input processing or those involving the learning of artificial languages. The former research attempts to demonstrate that instruction can have an effect on how learners perceive and parse stimuli. The latter
290
INPUT AND EVIDENCE
examines specific aspects of learning under highly controlled conditions. Finally, the studies of feedback and correction also give mixed results. Some of this research shows that feedback and correction can help learners learn lexical exceptions to rules, some of it shows that feedback and correction can help learners move back from overgeneralisations. It looks as if some types of linguistic properties can be acquired in the manner described, but there is no reason to believe that generalisation from these results to just any property of the grammar will be possible. In other words, there is no support at present for a claim that feedback, correction, or metalinguistic instruction are essential to the acquisition of all types of linguistic knowledge. Thus, we must show some scepticism towards the hypothesis that feedback, correction or metalinguistic instruction are major factors in determining central properties of interlanguage grammars and that they are necessary to account for SLA. We have already seen strong conceptual and empirical reasons to reject any such claim. However, this still leaves a role for inputs of this sort as a developmental factor. An explicit theory of feedback and correction will lay out the proper role of these factors, attributing to them no more nor no less than they do in fact accomplish.
2.
The empirical studies of indirect negative evidence, metalinguistic instruction, and feedback and correction
2.1 Indirect negative evidence Various researchers have argued that indirect forms of negative evidence arise from the nature of talk to learners, and, in particular, that the structure of conversation will lead the learner to realise that certain forms in his production are not possible.1 Long (1980, 1983), in particular, has developed Krashen’s input hypothesis (see Chapter 1) and has argued that interaction aids learning. Recall that Krashen hypothesised that language learning proceeds when the learner comprehends more than he can parse. I pointed out that this hypothesis seems to require a non-modular faculty of mind; certainly it requires an interactive theory of language processing. The reason is that since the parser cannot compute a parse of the stimulus, the meaning derived must be derived via inferencing. The information stored in the conceptual system must then be able to influence the parsers. Krashen has never provided any details of how this might happen, and it should be apparent now that there are numerous obstacles in the way of the i + 1 Hypothesis. The research derivative of Krashen’s work suffers from the same
THE EVIDENCE FOR NEGATIVE EVIDENCE
291
deficiency, namely the absence of any clarification as to how indirect negative evidence might actually work. Long’s particular version, the Interaction Hypothesis, is based on the observation that comprehension checks (where a native speaker asks a question to see if she has understood correctly something the nonnative speaker has said), topic shifts and clarification requests occur more often in conversations involving non-native speakers (NNS) and native speakers (NS) than conversations among NSs. This is, of course, reasonable since inadequate mastery of the sound system or structures of the L2 would create comprehension problems for the NSs. Both Long (1985) and Schachter (1986) have argued that such types of behaviour, as well as failures to understand, silence, and other responses to errors, provide indirect forms of feedback and promote SLA by facilitating comprehension. Long (1981, 1983) has formulated the Interaction Hypothesis in (8.1). (8.1) The Interaction Hypothesis Speakers in conversations negotiate meaning. In the case of conversations between learners and others, this negotiation will lead to the provision of either direct or indirect forms of feedback, including correction (models), comprehension checks, clarification requests, topic shifts, repetitions, and recasts. This feedback draws the learner’s attention to mismatches between the input and the learner’s output.
This has become the standard claim of the interaction literature (Gass and Varonis 1985b, 1989; Pica, Young and Doughty 1986, 1987). The claim is that errors in the learner’s speech lead to NS comprehension problems which in turn induces feedback, which provides the learner with the basis for a systematic comparison of the contents of his grammatical representations and the feedback. It is this last assumption which is central to the utility of the Interaction Hypothesis. Were it to turn out to be the case that learners could not represent mismatches between their own production and the stimuli at the relevant level of analysis, then the issue of how interaction helps would be left in the air. I return to this issue below. The Interaction Hypothesis amounts to the hypothesis that interaction enriches the input to the learning mechanisms but we need to know how it could do this. None of the standard literature on interaction addresses this question. Some studies (Pica and Doughty 1985; Varonis and Gass 1985a, b; Pica 1987, among others) have investigated the organisation of conversation and of how negotiation furthers conversation towards specific communication goals. However, as Gass and Varonis (1994: 285) point out, this research has not shown any effects of negotiation on acquisition. Studies focusing specifically on that question give mixed results. Thus, Sato (1986) found no facilitating effects of
292
INPUT AND EVIDENCE
NS speech on the acquisition of past tense marking in English by two Vietnamese L1 speakers. Brock, Crookes, Day, and Long (1986) also claimed little effect on learning based on free conversations. Crookes and Rulon (1985) examined the effects of negotiation with different tasks, e.g. free conversation versus two-way communication tasks. The latter evoked more feedback, but the study did not establish increased learning. Pica, Young and Doughty (1987) showed facilitated comprehension on a picture arrangement task as a direct result of negotiated interaction. Gass and Varonis (1994) showed similar results. Loschky (1994: 306) argues that the literature shows only “mixed support” for a relationship between increased comprehension and acquisition, and shows instead a relationship between acquisition and adjustments by the participants in their speech. It should be kept in mind that the majority of the studies cited have examined the effects of feedback on speech production only, despite the supposed focus on comprehension and parsing. Loschky’s own study is the only one focusing on comprehension and the processing of stimuli using listening tasks (instead of using production data). It shows that negotiated interaction facilitates comprehension of language one is listening to at that moment,2 but there was no relation between levels of comprehension and levels of acquisition. Finally, Pica (1988) and Pica, Holliday, Lewis, and Morganthaler (1989) have argued that negotiation leads to better output from learners when it includes direct or indirect feedback from an uncomprehending native speaker. The relevance of this claim for SLA is, however, not clear since it is not obvious why failures to understand by anyone should lead to restructuring of the learner’s psychogrammar. As noted, Long’s Interaction Hypothesis states that the role of feedback is to draw the learner’s attention to mismatches between a stimulus and the learner’s output. This hypothesis seems to be widely shared by proponents of the idea that one can learn a grammar on the basis of the “negotiation of meaning.” To make sense of this we need to know what is meant by these terms. If “input” means stimulus, and the term “output” means what the learner actually says, then the claim would be that the learner can compare a representation of his speech (an acoustic/phonetic representation) to the intake as I have defined it (an analog representation of the speech signal). Why this should assist the learner in learning properties of the morphosyntax or vocabulary is, however, not clear since the learner’s problems may be problems of incorrect phonological or morphosyntactic structure. To restructure the mental grammar on the basis of feedback, Long’s claim would have to mean that the learner is able to construct a representation at the relevant level and compare her output — at the right level — to that. It would appear then, although this is never made explicit in this literature, that the Interaction Hypothesis presupposes that the learner can
THE EVIDENCE FOR NEGATIVE EVIDENCE
293
compare representations of her speech and some previously heard utterance at the right level of analysis. But this strikes me as highly implausible cognitively speaking. Why should we suppose that learners store in longterm memory their analyses of stimuli at all levels of analysis? Why should we assume that they are storing in longterm memory all levels of analysis of their own speech? This is what would be required for learners to carry out the suggested comparisons. Certainly nothing in current processing theory would lead us to suppose that humans do such things. On the contrary, all the evidence suggests that intermediate levels of the analysis of sentences are fleeting, and dependent on the demands of working memory, which is concerned only with constructing a representation of the sort required for the next level of processing up or down. Intermediate levels of analysis of sentences normally never become part of longterm memory. Therefore, it seems reasonable to suppose that the learner has no stored representations at the intermediate levels of analysis either of her own speech or of any stimulus heard before the current “parse moment.” Consequently, he cannot compare his output (at the right level of analysis) to the stimulus in any interesting sense. The only intermediate levels of analysis which are normally relevant for LTM are lexical elements, and other elements stored in the lexicon (idioms, phrases, formulae, etc.).3 Moreover, given the limitations of working memory, negotiations in a conversation cannot literally help the learner to re-parse a given stimulus heard several moments previously. Why not? Because the original stimulus will no longer be in a learner’s working memory by the time the negotations have occurred. It will have been replaced by the consequences of parsing the last utterance from the NS in the negotation. I conclude that there is no reason to believe that the negotiation of meaning assists learners in computing an input-output comparison at the right level of representation for grammatical restructuring to occur. Do studies of indirect forms of feedback, especially those based on precise if complex communication tasks (such as figuring out how to identify a nonnatural object from a description and a choice of pictures when speaker and hearer cannot see one another), show anything at all? I think so. It would appear that negotations help learners establish form-meaning pairs, i.e. the referents for particular sounds, or the sounds expressing particular meanings. Thus, these forms of stimuli help learners learn limited forms of lexical representations, the sound-meaning correspondences themselves. There is no evidence at present that other types of acquisition are involved, especially viewed from the perspective of parsing and speech comprehension. When we consider the claims from the perspective of production, the studies reviewed suggest that negotiation helps the learner make more precise her choice of lexical item. This might strengthen the
294
INPUT AND EVIDENCE
learner’s encoding of a given form and lead to greater practice, which in turn will lead to a lower activation level of the same item in the future, making it easier to remember the right word. That means that recall of the relevant item will be enhanced. This is not an insignificant consequence because it means that the learner will be better able to say directly what he means rather than resorting to the various performance strategies adults often use in the face of lexical gaps or recall problems. Notice, however, that such an effect falls squarely under the rubric “performance phenomena.” That negotiated interaction can accomplish anything else has not been demonstrated, and the case to be made for it remains indirect, as Gass and Varonis (1994) and Loschky (1994) stress. Much more research is needed showing a relationship between the amount or type of negotiated input and the amount or type of learning which occurs. In particular, some thought needs to be given to the causal factors — How could negotiation of meaning and interaction cause learning to take place?4 2.2 The metalinguistic instruction studies There is a significant body of studies which ask the question: Does grammatical instruction have a positive effect on language acquisition? Results are again mixed, with some researchers answering in the affirmative (Gass 1979; Lightbown 1983, 1987; Long 1983; Pica 1983; Spada 1986; Eckman, Bell, and Nelson 1988; Doughty 1991), others in the negative (Perkins and Larsen-Freeman 1975; Felix 1981; Ellis 1984, 1989; Pienemann 1984; Eubank 1987).5 Some of this research is concerned with showing that classroom learning does/does not follow hypothesised universal developmental orders observed in naturalistic acquisition and that grammatical instruction does/does not alter the orders observed. The class of studies turns out to involve the examination of grammatical instruction on the restructuring of linguistic knowledge. Since, in contrast to the preceding literature, these studies address the question of whether instruction helps learners to learn specific properties of grammars, I will examine some of them in detail. i. The Harley aspect study Harley (1989a) studied the effects of form-focused instruction on the learning of French verb aspect in the context of Grade 6 early total French immersion classes in Ontario (Canada). This study thus dealt with the appropriate contexts of use for the distinction between verbs marked for the [±punctual] distinction (referred to in the literature as the imparfait/passé composé distinction).6 Subjects belonged to experimental or comparison classes, the first group receiving approximately eight weeks of form-focused instruction with increased
THE EVIDENCE FOR NEGATIVE EVIDENCE
295
opportunities for production of the target forms. The comparison group received other kinds of stimuli which included the relevant distinction. Pupils were tested by means of a close test and an oral interview task. On an immediate post-test, the instructed group outperformed the comparison group. On a delayed post-test three months laters, there were no significant differences between the two groups, raising serious doubts about whether instruction had led learners to restructure their grammars. ii. The White verb-raising/adverb studies In Chapter 4, I introduced a study by White (1991a) in which she discussed the relevance of negative evidence within the framework of learnability research. There I examined the study from the perspective of clustering effects in L2 learning and argued that this study fails to confirm the predicted clustering effects of parameter-(re)setting theory. Here I want to focus on what the study says about the effects of metalinguistic instruction. It is known from the learnability literature, e.g. Berwick (1985), Manzini and Wexler (1987), Wexler and Manzini (1987), that certain types of overgeneralisations might be uncorrectible on the basis of exposure to stimuli alone. On the well-motivated assumption that feedback and correction have little influence on early first language acquisition, these researchers have formulated the Subset Principle to ensure that the child develops a grammar generating the smallest language consistent with the data. Such a grammar will be in a subset relation with grammars which generate the same correct data plus other ungrammatical data. White (1989b: 143–4) puts the matter this way: assume data sets X and Y construed as sets of sentences as in (8.2). (8.2) illustrates grammars in a Subset Condition. (8.2)
X Y
The learnability problem is as follows: Y sentences are compatible with two grammars, the grammar that generates Y and the grammar that generates X. Suppose that the L1 learner is learning a language which only contains sentences like Y. For this language, the appropriate grammar will be the one which generates Y but not X. If, on hearing a Y sentence, the learner hypothesises the grammar that generates X, this will result in overgeneralisations (i.e. X sentences) that cannot be disconfirmed on the basis of positive evidence.
296
INPUT AND EVIDENCE They simply do not occur in the language being learned. (White 1989: 144).
White (1989a, 1989b: Ch. 6) makes the point that if adult learners are transferring certain types of knowledge from the L1 grammar, this process might lead to unacceptable representations which are nonetheless compatible with available stimuli. Stimuli would then never lead to the detection of errors. The learner’s interlanguage grammar would correspond to a superset of the grammatical sentences of the language in question. Consider in this regard a learner of English who transfers her Italian patterns of subject instantiation. Recall that Italian exhibits null subjects. (8.3) a.
b.
(Ho) Vado al cinema. (I go-1 to the cinema ‘I am going to the cinema.’ Sperava di trovare la soluzione. tried-3 to find the solution ‘S/he tried to find the solution.’
If the learner transfers the optionality of Italian subjects to English, we predict that she will incorrectly produce sentences like (8.4). (8.4) go to the cinema.
The correct English input will always be compatible with an interlanguage grammar which permits optional subjects. Therefore, normal stimuli, i.e. without feedback, or other metalinguistic information, will not provide the cues to the fact that the grammar is incorrectly generating a superset of the L2, namely sentences with explicit subjects and sentences with null subjects. White (1989a) examined the ability of French-speaking learners of English to accept grammatical sentences and reject violations involving adverbs and verbs. Recall that French and English differ in that adverbs can appear between the French main verb and its direct object complement while in English this is excluded. (8.5) a. John drinks his coffee quickly. b. Carefully John opened the door. c. Mary often watches television. d. *Mary watches often television.
French adverbs partially overlap in their distribution with the patterns shown in (8.5) but specifically the equivalent to (8.5c) is ungrammatical and the equivalent of (8.5d) is fine.
THE EVIDENCE FOR NEGATIVE EVIDENCE
297
(8.6) a. Jean boit son café rapidement. b. Prudemment Jean a ouvert la porte. c. *Marie souvent regarde la télévision. d. Marie regarde souvent la télévision
White (1989a) characterises this difference in terms of a binary parameter of UG — the Adjacency Condition on Case assignment — whose settings meet the Subset Condition of (8.2). To describe the facts Chomsky (1986) proposed a parameter [±strict adjacency] which applies to Case assignment. The setting [+strict adjacency] generates a language like English which does not permit adverbs to intervene between the verb assigning Case to the direct object and that NP. The alternative setting generates languages which permit but do not require such intervening adverbs. If the francophones transfer their knowledge of French to the task of learning English, the hypothesis is that they will transfer the [−strict adjacency] parameter-setting to English. This means that they will understand or produce sentences like (8.5a, b), and will parse or produce (8.5d).7 White did not examine her subjects’ comprehension or production. Instead, subjects (adults) performed three separate acceptability tasks: a paced acceptability judgement task, an unpaced multiple choice acceptability judgement task, and a preference task in which subjects rated pairs of sentences as to which one was better. The paced and unpaced acceptability judgement tasks required subjects to decide if sentences were more or less “grammatical” or “correct/incorrect”. The preference task required no absolute classification but rather asked merely for a relative assessment. Native speakers were also tested to provide comparison data. The results showed that native anglophones consistently agreed in accepting grammatical sentences like (8.5a, b, c) while rejecting sentences like (8.5d). Thus, White showed that her tasks elicit consistent and uniform responses from native speakers. Learners also consistently accepted grammatical sentences but in addition they exhibited a tendency to accept sentences like (8.5d). White concluded that adult L2 learners’ grammars do not obey the Subset Principle. The Case Adjacency Parameter did not last too long as a theoretical account of the phenomenon in question, potentially undermining the assessment that adult learners do not follow the Subset Principle. White (1991a) reformulated the characterisation of this knowledge in terms of the verb-raising analysis of Emonds (1978, 1985), Pollock (1989) and Chomsky (1991c), specifically, in both languages the adverbs are generated in the Spec position to VP and verbs move to the left of them. According to this revised analysis, in French, all finite verbs move out of VP, raising to AgrS (INFL, or to some other head position in the domain of the subject), while in English only auxiliaries and modals can occupy the same position. Finite verbs cannot raise, and do-support is triggered instead.
298
INPUT AND EVIDENCE
(8.7)
AgrSP SpecAgrSP NP
AgrS′ Agr
verb
VP AdvP
V′ V
NP
t
(all irrelevant movements have been omitted from the tree)
According to this revised analysis, a subset of finite verbs raise in English, while in French all finite verbs raise. The verb sets thus meet the Subset Condition. White (1991a) demonstrates that francophone learners of English do indeed produce errors like that in (8.5d), just as her earlier study had shown that they accept such sentences. Her subjects in the later study (138 children in the 11–12 year age range) were divided into two groups, one of which received explicit instruction about the meaning and distribution of English adverbs. The comparison group spent an equal amount of time receiving instruction on question formation.8 Analysis of the results showed a main effect of treatment. There were significant differences between the subjects instructed on adverbs and those instructed on questions. On an acceptability judgement task, the adverb group’s performance was superior to the question group on Subject Verb Adverb Object (SVAO) stimuli, indeed there were no differences between the adverb group and native speaker controls on this measure. On the same task, there were also significant differences between the groups on the acceptable Subject Adverb Verb (SAV) order in favour of the adverb group. Results on a preference task show comparable differences between the two instructional type groups, but are also revealing of important differences between the adverb L2 learner group and native speakers. White (1991a: 148–9) observes that the adverb L2 learner group reject Subject Verb Adverb Prepositional Phrase (SVAPP) (Jane goes sometimes to the movies) in preference to SAVPP (Jane sometimes goes to the movies). This means that they have incorrectly grouped this pattern with the ungrammatical SVAO pattern. A sentence manipulation task provided similar results. A followup study one year later of one of the adverb-instructed classes revealed SVAO error scores similar to those on the pre-test. The effects of the instruction clearly did not last.
THE EVIDENCE FOR NEGATIVE EVIDENCE
299
This study shows that the acceptability judgement tasks used elicit consistent responses from L2 learners for grammatical stimuli, stimuli which might correspond to patterns of stimuli the learners have actually been exposed to over the full course of their acquisition of English. The results suggest, moreover, that subjects can learn linear ordering relations on the basis of grammatical instruction and apply that knowledge on acceptability judgement and preference tasks. It also suggests that the knowledge acquired generalises to knowledge not taught — the SAVPP pattern. Learners are apparently using linear information to differentiate between what is acceptable and what is not, but in so doing, are not distinguishing between subcategorised and non-subcategorised complements (argument complements versus adjuncts). Under the early characterisation analysing adverb distribution in terms of an adjacency parameter, this result would have been problematic since the use of the parameter hinges on this very distinction. If subjects are not making the distinction between sisters and nonsisters to case assigners, they could hardly be using the parameter.9 Under the revised analysis of adverb distribution, this problem disappears but the fact of the matter remains unaccounted for. If the distinction between direct object complements and adjuncts is relevant for hierarchical embedding (the first being a sister to the verb, the latter being immediately dominated by VP but outside V′), then it is not apparent why it isn’t exploited by learners. To sum up, the results appear to suggest that the French learners of English overgeneralise the distribution of the adverb, permitting it to appear (incorrectly) between the verb and its direct object. Instruction leads the learners to reject such strings but still also leads them to reject patterns SVAPP which are in fact acceptable.10 White characterises her study as an investigation of the effects of negative evidence on parameter-(re)setting. However, there are several reasons to question this interpretation, as we have seen. The question is: Are such results to be described as restructuring of the mental grammar? Rejecting White’s claim about parameter-resetting does not force us to conclude that learners have not learned information of grammatical relevance. Indeed, the most parsimonious account is to say that the instruction did lead to grammatical restructuring. In the discussion which follows, I shall therefore simply assume that all results which indicate that learning occurred are also support for the claim that grammatical restructuring occurred. However, there are methodological reasons for interpreting White’s results with caution. First of all, note that while she mentions that the instructional materials specifically emphasised that adverbs could not go in the SVAO position, White provides no other information about the provision of negative evidence to the experimental groups. In particular, we know nothing about the frequency or systematicity of error correction, whether it was direct or indirect,
300
INPUT AND EVIDENCE
verbal or written, and so on. We are, however, told that the normal teaching practices in the programs the francophone students attended do not focus on the instruction of form or on the correction of errors. Should we suspect that the provision of feedback and correction in this study has been insufficiently controlled? In the absence of further information, probably yes. Studies of error treatment in language classes (Chaudron 1977; Courchene 1980; Lucas 1975; Fanselow 1977 and Salica 1981, see also Chaudron 1988) have noted that its provision is often infrequent and unsystematic. Chaudron (1986) in a state-of-theart review of this issue stated that the median percentage of phonological errors corrected in these studies was 54% (X = 48%) and the median percentage of grammatical errors was 49% (X = 51%). In other words, half of the errors in these studies went uncorrected. Swain and Carroll (1987) in a study of error treatment in French immersion classes observe quite different figures: 77.3% of errors went uncorrected. The French immersion classes, like the English classes described by White, focus on using the language to communicate about topics other than the language. They are “communicative” in orientation. This means that the provision of feedback and negative evidence might have been extremely limited. For this reason, White’s study should probably be seen as pertaining to the effects of metalinguistic instruction on grammatical restructuring rather than as pertaining to the relevance of feedback and correction for the development of correct morphosyntactic patterns, as she intended. I have included it in this section for that reason. iii. The Doughty study Doughty (1991) conducted a study of 20 ESL learners with various mother tongues in order to investigate the effects of instruction on the acquisition of relative clauses involving “extraction” from a Prep Phrase. The linguistic observation behind the study is that such relative clauses, e.g. (I met) the girl from whom you got a present, do not occur in every language. Cross-linguistically, they are said to be relatively “marked” in comparison to relative clauses involving extraction from a subject NP ((I met) the girl who knows you) or a direct object NP ((I met) the girl who you know). Markedness here means less frequent typologically, and more difficult to learn. It stands to reason, therefore, that if the ESL learner’s L1 does not include such more marked relative clauses, learning them will present certain difficulties. Doughty’s experiment involved computer-provided instruction. Learners were divided into three groups for the purposes of the stimuli. One group of experimental subjects got instruction which used techniques focusing on the meaning of the stimuli. A second experimental group got instruction which used
THE EVIDENCE FOR NEGATIVE EVIDENCE
301
techniques focusing on rules. A third group served as a comparison group who were exposed to stimuli consisting of relative clauses but did not get the instruction. All subjects spent 10 days participating in the experiment and studied five or six items per day. Time on task was very carefully controlled, with the experimental groups spending 4 minutes on each item while the comparison group spent two and a half minutes on the items. Results showed significant differences between the experimental groups and the comparison group in favour of the former. In particular, learners were able to generalise from the instructed marked relativisation environments to less marked but not instructed environments. In addition, Doughty found that both instructional techniques were effective in facilitating learning. iv. The Spada and Lightbown studies Spada and Lightbown (1993) report on a quasi-experimental study with francophone children (aged 10–12) learning English in Quebec. Such children are taught a number of courses supplemental to the regular academic program via an intensive communicative language teaching approach over a 5-month period (see Lightbown and Spada 1990; Lightbown 1991, for descriptions of the classes and program). The focus of this particular study was on the effects of instruction on specific aspects of acquisition. Spada and Lightbown studied the effects of formfocused instruction and correction on the use of questions in an oral communication task. Subjects received over a two week period 9 hours of form-focused instruction involving questions with the wh-words what, where, and why and the auxiliaries can, be, and do. Thus, activities were designed to draw attention to the fact that these verbs “raise” into the head position of some functional category (INFL, AgrS, TENSE, or some such category; see the discussion in previous chapters). Subjects performed tasks unscrambling interrogatives which were out of order, guessing games requiring the use of questions, and judgement tasks in which they had to decide if stimuli were well-formed. Oral performance of experimental subjects was compared to students at the same level (grade 5) from a regular intensive-English instruction class. All groups improved from pretest to post-test to follow-up test (5 weeks after the instruction). The experimental groups also improved over the longterm as measured by testing 6 months after instruction. The comparison group was not given the longterm test. Spada and Lightbown recorded the classroom interactions in the experimental classes along with 4 hours of instruction in the comparison group class. They transcribed this data and analysed both the teachers’ language and the students’ language, thus controlling the nature of the feedback. The teachers’ language was analysed in terms of the number and types of questions they asked, and the
302
INPUT AND EVIDENCE
amount and type of feedback they gave relating to question formation. Teachers modelled correct questions, gave feedback on student productions, asked display and spontaneous questions. See (8.8). (8.8) Examples of teachers’ questions a. spontaneous modelling: Her mother is a teacher. So, “Is her mother a teacher?” b. metalinguistic question: Now where do we put the is or the are or the am? c. repetition: S: Where he go? T: Where he go? (peculiar or exaggerated intonation) d. correction: No, that should be “Where do you live?” e. routine questions: Can you make a question from this sentence? f. spontaneous questions: S: On the weekend we played store. T: Who did you play with?
Analysis showed that the absolute number of questions used by the teachers varied, with the teacher in the comparison class asking the most (1480 compared to the experimental group A teacher’s 731 and the group B teacher’s 1065). Moreover, the types of questions varied across the classes. The comparison group teacher used far more spontaneous questions than the other two during the recorded periods (151.5 compared to A’s 4.3 and B’s 34.2). The comparison group teacher used much less feedback (49.8 compared to A’s 70.0 and B’s 81.0). There were comparable amounts of modelling (24.8/24.3/27.0). Finally, the use of routine questions varied, with the comparison group teacher and the experimental group B teacher using similar amounts (152.3 vs. 145.0) and the group B teacher using much less (70.8). Overall responses to error also varied. The comparison group teacher responded to 84% of student errors, group A teacher responded to 85%, while the group B teacher responded to 54%. The students’ language was analysed in terms of the total number and accuracy of the questions used. Students in the experimental groups used far more questions than the comparison group (group A = 315, group B = 730, comparison group = 194). However, the accuracy rate was highest in the comparison group (80% compared with group A’s 74% correct and group B’s 61% correct). These higher accuracy rates were also reflected in an analysis of the different types of questions the students asked. Since there was no “clean” difference between the comparison teacher’s behaviour and that of the experimental groups’ teachers in terms of the amount and type of questions asked, and the amount and type of feedback given to
THE EVIDENCE FOR NEGATIVE EVIDENCE
303
errors, it is not possible to attribute student results to particular types of teacher verbal behaviour. In particular, it is not clear to what extent acquisition was based on the amount and type of correct stimuli heard (which could provide positive evidence for a particular analysis) as compared to the amount and type of feedback and correction (which might provide negative evidence for a particular analysis). v. The Alanen study Alanen (1992, 1995), reported on in Tanaka (1999), conducted a study of the acquisition of two locative suffixes and four kinds of consonant alternations in a simplified form of Finnish. Subjects were 36 anglophones who were divided into four subgroups: three experimental groups and one comparison group. One experimental group examined texts in which target forms were italicised, a second experimental group was instructed in rules relevant to the forms and alternations, a third experimental group received instruction and was also exposed to the italicised forms. All subjects participated in two treatment sessions of about 15 to 20 minutes, at a 7-day interval. Results showed an effect of the instruction of rules related to the forms, with both group two and group three outperforming the comparison group. Moreover, the group instructed on rules alone did better than the group getting only italicised stimuli. vi. The Robinson studies Robinson and his colleagues have conducted a number of studies looking at various aspects of focus on form. Robinson and Ha (1993), Robinson (1996, 1997a, b) have examined a variety of instructional techniques on learning in experimental conditions as well as the automaticity of rule use within a framework dealing with instance-based learning. Robinson (1997b), for example, trained 60 native speakers of Japanese on the double object construction of English (see below) under one of four training conditions, namely the “implicit” condition during which subjects were trained on an exercise in reading and remembering the positions of words in sentences, the “incidental” condition during which subjects were trained on an exercise in understanding the meaning of sentences, and the “enhanced” condition which used the same sort of exercise but during which a box appeared around words and verb stems in such a way that the number of syllables of the verbs would be boxed in in addition to the to-object. Finally there was the “instructed” condition which consisted of an explanation of the rule. See Robinson (1997: 230) for details. Subjects were then tested on prior and novel material. Results showed no effect of learning condition for familiar material. There was a partial effect for instruction on novel
304
INPUT AND EVIDENCE
material over the other learning conditions, with some effect for the enhanced condition over the implicit condition. In addition, learners in the instructed condition responded faster to novel grammatical sentences than did learners in the other groups. Robinson concludes Instructed learners are clearly superior to learners in other conditions in their ability to generalise the knowledge developed during training to novel transfer set sentences… the instructed condition is superior in accuracy to all other conditions on new grammatical sentences. However, the implicit and incidental conditions perform with high accuracy on these sentences, about 80%, and are equivalent to the enhanced condition. As the results for performance on new ungrammatical sentences show, however, accurate performance on the new grammatical sentences is not evidence of the acquisition of generalisable knowledge. Implicit learners wrongly accept around 80% of the ungrammatical sentences… the instructed condition is superior to all others in judging new ungrammatical sentences. (Robinson 1997: 237–9)
This study then provides strong support for the utility of instruction in metalinguistic information. vii. The VanPatten and Cadierno studies on “input processing” VanPatten and his colleagues have turned their attention in a number of studies to the question of how metalinguistic instruction can influence language learning. This research is based on the premise, made explicit in VanPatten (1993: 436) that second language acquisition is based on the processing of stimuli through what he calls the accommodation and restructuring of “intake” into the grammatical system. The idea that SLA is based on the processing of stimuli is thus one element that VanPatten’s work shares with the Autonomous Induction Model. VanPatten (1984) observed that English learners of Spanish adopt NVN = Subject Verb Object relations. This observation is consistent with the Competition Model research which has shown that anglophones are transferring their L1 strategies for mapping semantic roles onto NPs, and that English speakers typically use linear strategies in ambiguous strings. This parsing strategy, however, is frequently invalid for Spanish, which has much freer phrase order patterns than English. Moreover, VanPatten (1984) showed that anglophones will ignore stimuli that have to be parsed if they don’t fit the English linear order. Such stimuli include the preposition a in the string Visita a la muchacha el chico ‘Visits to the girl the boy’ (= “The boy is visiting the girl”) or the pronoun la in Mario la conoce bien ‘Mario her knows well’ (= “Mario knows her well”). Anglophone learners of Spanish appear to treat such stimuli as “noise”. Similarly, they will ignore case marking on pronouns which clearly indicates that a
THE EVIDENCE FOR NEGATIVE EVIDENCE
305
pre-verbal pronoun is a direct object, e.g. Me gusta el helado ‘Me+dat pleases the ice cream’ (= “I like the ice cream”). These data about the interpretation strategies of anglophones learning Spanish are important for several reasons. Firstly, they reveal one of the major differences between real language acquisition and the assumptions of the idealised learnability research: adult parsers do not always “fail” whenever they encounter unanalysable stimuli, at least not in the sense that they fail to assign an interpretation to unparsed stimuli. Rather, transferred parsing routines will map interpretations onto the stimuli and ignore unanalysable parts of speech. Such data suggest that formal models of learning, based on the assumption of detectable errors, are seriously lacking in that they do not accurately model “noise” in the input and do not explain how, when or why the learner begins to deal with it. The Autonomous Induction Theory shares this weakness. An explanatory theory of SLA must, however, provide some account of how and why an interpretation can emerge even when individual parsers fail to analyse an input. Secondly, this body of research suggests why feedback and correction may ultimately be essential to an account of the triggering of the learning mechanisms in SLA. Feedback and correction will be the mechanisms which make it clear to the learner that his parsing strategies are wrong and that ignored stimuli provide critical cues to the proper analysis of the sentence. VanPatten and Cadierno (1993a, b) explored the nature of forms of instruction asking how it can alter learners’ knowledge. The major premiss behind such work is the idea, as just mentioned, that language acquisition must be explained in terms of how stimuli are processed, and that studies of learner output will not accomplish this.11 They define traditional foreign language instruction in the following way: Normally, this instruction focuses on the manipulation of learner output. In most foreign language classrooms, instruction occurs by explaining a grammatical concept and then having learners practice producing a given structure or form… (VanPatten and Cadierno 1993a: 226?)
Input processing, in contrast, is designed to change how the stimuli are perceived. The authors conducted an off-line study which dealt with the acquisition of Spanish clitic objects. Subjects were divided into three different groups. One group received traditional grammar instruction, which meant in practice that explanations of the form and distribution of object pronouns were given. These were followed by production practice, characterised as being first “mechanical”, then “meaningful”, and finally “communicative”. A second group, who received input processing instruction was also given explicit instruction on object pronouns. In addition, they were also explicitly told not to rely on word order to
306
INPUT AND EVIDENCE
understand sentences. This group did not receive output practice, but rather did a variety of communicative tasks which utilised stimuli involving sentence-initial object pronouns. In other words, the stimuli contained sentences which could not be correctly interpreted using an SVO parsing strategy, but rather could only be correctly interpreted using an OV, OVS, or OVSPP parsing strategy. The third group served as a comparison group. It did not receive any explicit instruction about object pronouns. All three groups were given a pre-test and post-tests consisting of a comprehension test and a production test. The comprehension test involved listening to stimuli and selecting an appropriate picture from 2 options shown. The production test involved completing sentences based on pictures. Analysis of the results showed a main effect for test (showing improvement from pre-test to post-test), and for treatment. The processing instruction group did better than the other two groups on the comprehension post-tests and did equally well on the production post-tests as the traditional grammar instruction group. Both instructed groups did better on the production post-tests than the comparison group. The authors interpret their results as indicating that processing instruction and structured input cause restructuring of the learners’ grammars, but traditional grammar instruction does not.12 Cadierno (1992, 1995) conducted comparable experiments focusing on the acquisition of Spanish verb tense (present vs. past) morphology. The problem of interest in this case was the tendency of learners to initially analyse lexical items for their referential content, ignoring the contribution that inflectional morphology, e.g. tense marking, makes to the semantics of the sentence. As has been noted many times, by many authors, adult learners tend to initially have difficulty with the expression of temporal notions marked morphologically and prefer to express such notions “lexically” via adverbs (see Clahsen et al. 1983; Andersen 1991, 1993).13 The methodology in the Cadierno study was the same as that discussed above, meaning that there were 3 groups of subjects differing in whether they got instruction or not, and in what kind of instruction they received. In this case, verbal tense morphology was used as a unique cue to temporal reference. Traditional grammatical instruction involved presenting learners with past tense endings and then allowing them to practice using these suffixes in sentences expressing past time. Manipulation involved rewriting present tense sentences in the past tense, performing sentence completion activities requiring the use of a past tense verb form, and completing questionanswer activities involving the use of the past tense. Input processing instruction involved tasks which focused learners’ attention on past tense morphology in the input, practice in how to assign present vs. past tense, and activities focusing on the content of sentences. Instruction focused especially on the cues marking past
THE EVIDENCE FOR NEGATIVE EVIDENCE
307
tense, and on the interpretation of the forms (contrasting them, in particular, to the present tense forms). Subjects were anglophone adult learners of Spanish. The results were similar to the previous study, showing a main effect for test (showing improvement from pre-test to post-test), treatment, and a significant interaction between group and test. The group receiving structured processing input did significantly better on the comprehension post-tests than either the group receiving traditional grammatical explanations plus production practice or the comparison group receiving no explanation.14 The former also did better than the other two groups on the production post-tests, there being a significant main effect for instruction type, for test and a significant interaction between instruction and test. In this case, the traditional instruction group did significantly better than the no-instruction group, as did the input processing group, and there was no significant difference between the latter two groups. The conclusions are also the same, namely that input processing is essential for restructuring the learners’ grammars but that traditional instruction, focusing on practising the forms, merely affects metalinguistic knowledge. The results of these experiments are very interesting and clearly show that instructional activities which lead the learners to attend to particular forms can produce superior results to activities which focus merely on manipulating output. However, the interpretation of these results from a theoretical perspective is difficult. Any conclusions regarding the superiority of processing input over the provision of metalinguistic information, or the complete lack of utility of metalinguistic information do not follow. The reason for this should be obvious: both instructed groups received metalinguistic information. The one difference was that the traditional grammar instruction group did not, in addition, process stimuli for comprehension but rather were forced to direct their attention to production activities. The structured input group, in contrast, got both metalinguistic information and exposure to relevant stimuli. VanPatten and Oikkonen (1997) therefore conducted a partial replication of the VanPatten and Cadierno (1993a, b) studies. They observed correctly that the original research confounds input processing and monitoring. Since the structured input group got the same explicit instruction as the traditional grammar instruction group plus information that NP V NP sentences were not necessarily subject–verb–object sentences, the authors noted that the processing group might have used explicit metalinguistic knowledge about the order of stimuli to monitor their own behaviour on the interpretation tests and explicit metalinguistic knowledge about pronouns to monitor their behaviour on the production tests. In that case, they would have used metalinguistic information, rather than direct processing of the stimuli, on both types of tests.
308
INPUT AND EVIDENCE
In the replication study, there were again 3 groups of subjects. One received processing instruction (i.e. explicit information + structured input activities), one received explanation but no practice, and the third group received only the structured input activities. The comparison group here is the group receiving processing instruction. They received metalinguistic information about the form and distribution of Spanish object pronouns, the order of constituents expressing major grammatical relations, the role of the preposition a, the tendency to interpret verb-initial NPs as subjects and that this “first noun strategy” was not a reliable cue for Spanish. Following the explanation, this group received structured input involving the interpretation of sentences with various phrasal orders, then responded in Spanish to the content of various kinds of stimuli with their personal opinions (e.g. “I agree” or “I disagree”, “That’s true”, “That’s not true”), made inferences based on stimuli and so on. The second group received the same metalinguistic information as the previous group but did not receive any structured input. This group received no practice of any kind, but did get the explanation spread out over four days so that we may assume there was repetition of the information provided. The third group received no explanation but did receive the structured input. This study, unlike the previous ones, thus attempts to zero in on the relative roles of explicit information versus practice as variables in causing learning of the relevant phenomenon. Testing took place over one week. Results on the comprehension tests showed a significant main effect for treatment in favour of the comparison and the structured input (no metalinguistic instruction) groups over the explanation only group. Results on the production tests showed a significant main effect for treatment. In this instance, the comparison and the structured input groups did equally well, and the comparison group did better than the explanation only group. From these results the authors concluded that improvement on the comprehension post-tests is caused by the presence of structured input activities and not by the presence of explicit metalinguistic information. They exclude the possibility that explanation is causally related to improved performance on the production tests because the group which received only explicit explanation did not do better than the structured input group. This leads them to question the claim that explicit information is necessary to grammar learning. Taken together, these studies are important for they establish that second language learning takes place on the basis of the parsing of stimuli, lest anyone have any doubts on this score. In particular, they show that activities which focus exclusively on manipulating speech or writing do not lead to language learning. One can also argue that they independently motivate the distinction between conscious metalinguistic learning and unconscious induction. It is the latter which
THE EVIDENCE FOR NEGATIVE EVIDENCE
309
seems to be relevant for input processing, and is teased apart from the metalinguistic instruction in the final experiment. Apparently, implicit learning occurring during the structured input tasks leads to the identification of cues in the stimuli relevant for the analysis of grammatical roles. Extrapolated to the earlier experiments, we might say that implicit learning also leads to the identification of the cues relevant for tense marking. If we step back from the question of the role of metalinguistic information on learning and ask a more general question: Can you manipulate the environment in such a way that language becomes learnable?, then the results of these studies are extremely positive.15 The role of the instruction from VanPatten’s perspective is not to allow the learner to compute comparisons between stimuli and speech output, but rather to allow the learner to attend to, identify, and induce the functions of certain cues. What is induced? Learners can i-learn that the preposition a is a cue that the adjacent NP is a direct object complement regardless of whether the phrase precedes or follows the verb. Similarly, the tense markers on the verb are cues for the interpretation of the sentence with respect to present or past time. That such markers can be learned on the basis of “formfocused” instructional activities is important given the comparative difficulty that verbal morphology causes in production for learners at even advanced levels of L2 proficiency (Larsen-Freeman 1983; Bardovi-Harlig and Bofman 1989). viii. The Izumi and Lakshmanan study Izumi and Lakshmanan (1998) examined the role of instruction which explicitly informs the learner that some form is not acceptable in a study investigating the acquisition of the English passive by speakers of Japanese. Japanese has two kinds of passive — a “direct” passive and an “indirect” passive. Examples of each are shown in (8.9). (8.9) a.
b.
John-ga Bill-na/niyotte yob-(r)are-ta John- Bill-by call-- ‘John was called by Bill.’ sono-geki-ga Shakespeare-niyotte kak-(r)are-ta that-play- Shakespeare-by write-- ‘That play was written by Shakespeare.’ (Izumi and Lakshmanan 1998: 68)
The authors report that the structural ambiguity arises because the Japanese verb rare is itself ambiguous functioning as an auxiliary element without semantic role assignment properties, and as a verb which does assign semantic roles. Izumi and Lakshmanan then hypothesise that Japanese learners of English will transfer the lexical properties of rare to the learning of the English passive,
310
INPUT AND EVIDENCE
assigning passive be dual status like rare. This would then lead them to produce both indirect and direct passives in English. In that case, learners would require negative evidence to retreat from the overgeneralisation. 15 Japanese learners of English participated in this study, assigned to four different proficiency levels (ranging from high-beginner to advanced). Subjects were assigned to either a comparison (11 subjects) or to an experimental group (four subjects). The experimental group included at least one subject from each proficiency level and included only subjects who accepted or had produced indirect passives. They received formal instruction on the English passive one week after a pre-test, while the comparison group did not. Subjects performed a translation task, a picture-cued production task, and an acceptability judgement task. Instruction sessions were held on two days for about an hour and a half. On the first day, instruction consisted of systematic contrastive information including normal passive stimuli (likely to be construed as positive evidence as to the grammaticality of the strings) and information about what is not possible in English. This instruction was explicit and designed to draw the learners’ attention to the ungrammaticality of the indirect passive in English. On the second day, subjects reviewed the first day’s information and then practiced making passive sentences. Five days later, both groups were tested again. Results of the pre-tests showed that subjects in both the experimental and comparison groups produced and accepted the indirect passive in English. Results of the post-tests revealed significant differences between the instructed and the comparison group in favour of the former. The authors conclude that the metalinguistic instruction was highly effective in leading the subjects to understand that the indirect passive is not possible in English, and to modify their behaviour accordingly. However, given the size of the sample, caution is required before generalising to other learners. ix. Experiments in the learning of artificial languages In Chapter 5, I discussed research dealing with the learning of artificial languages in the context of a discussion of distributional analysis. Some research using artificial languages as the target language has focused on the differences between explicit and implicit learning (Reber 1976, 1989; Reber, Kassin, Lewis and Cantor 1980; Mathews, Buss, Stanley, Blanchard-Fields, Cho, and Druhan 1989; see also Carr and Curran 1993). Following in this tradition in particular is research by DeKeyser (1995), Hulstijn (1989, 1992), de Graaf (1997a, b), and Robinson (1997a, b). Some of these studies have focused on implicit and explicit learning, see also the other papers in Hulstijn and DeKeyser (1997), in particular Hulstijn (1997) which lays out the logic of experimentation with artificial
THE EVIDENCE FOR NEGATIVE EVIDENCE
311
languages, DeKeyser (1997) which focuses on the nature of automatisation of learning, and Yang and Givón (1997) which contrasts implicit learning of pidgin stimuli vs. more complex “fully grammatical” stimuli. DeKeyser (1995) investigated the ability of learners to induce or deduce the rules of an invented “natural” language Implexan. This is an agglutinative SVO language with number and case marking on the noun. Number marking is categorical; case marking, in contrast, is prototypical. Verbs are marked categorically for gender and prototypically for number. It was hypothesised that prototypes would be harder to learn than categorical rules, and that the type of rule would interact with the type of learning. Thus, explicit learning (from explicit rule to instances) would be more effective for learning categorical rules and less effective for learning prototypes. In the pilot study reported on, six learners learned the Implexan over the course of 20 learning sessions during which they saw pictures on a computer screen along with a corresponding sentence. Subjects were to infer some sort of meaning for the graphic stimulus on the screen. The treatment group were taught grammatical rules of Implexan before doing the first, third and tenth sessions. The comparison group received no metalinguistic instruction. Subjects were tested on 20 sentences out of 124 stimuli with a judgement task at random times during each learning session. Two weeks after the last training session all subjects were given a judgement and a production task of the same 44 sentences. Initial responses to the judgement tasks suggest that subjects were responding randomly. Protocol analyses were then carried out. The author argues that the study provides limited support for the hypotheses that prototypical rules are harder to learn than categorical rules, that explicit learning (i.e. learning based on grammar instruction) works better than implicit learning for categorical rules, and that both work equally well for prototypical rules. The study is, however, a pilot study and needs to be replicated. de Graaf (1997a, b) investigated the role of explicit instruction in the learning of inflectional morphology (singular vs. plural noun marking, formal vs. informal imperative mode) and word order (position of the negator, position of the object) of an artificial language eXperanto, a modified version of Esperanto. Fifty-six Dutch-speaking participants comprised the subject pool and were divided into two groups. One group (the “explicit” group) received explanation on the grammatical structures after exposure to dialogues and comprehension activities. The other group (the “implicit” group”) were exposed to repetitions of the materials in the dialogues and comprehension activities to ensure equal amounts of exposure but were not otherwise instructed. Subjects were tested three times during the experiment on four tasks — a 60 item sentence judgement task, a 60-item gap-filling taks, a 30-item Dutch-eXperanto translation task, a 45-item
312
INPUT AND EVIDENCE
judgement and correction task. Results showed that the explicit group outperformed the implicit group on all 4 tasks over all 3 test periods. In addition, an effect for the complexity of the learning problem was found with scores on simple items significantly higher than scores on complex items, due to effects of instruction. However, no interaction between complexity and instruction was found, nor was there a specific effect for type of learning (morphological vs. syntactic) demonstrating the relationship between specific types of acquisition and instruction (or focus on form more generally) remains obscure. This study raises in an acute way the issue of the level of analysis that the learner is carrying out in that it failed to provide any evidence that learners were in fact able to use instructional information to differentiate between morphological and syntactic information. x. Summary of the instruction studies Although there are problems in the interpretation of each of the studies surveyed, taken together, they provide some evidence that metalinguistic instruction has a definite effect on learner behaviour with respect to a small number of linguistic phenomena which, nonetheless, cover a broad range of types of linguistic knowledge (direction of semantic role assignment, semantic sub-classes of verbs, various aspects of word order, verbal morphology, consonantal alternations). The studies by White, or Spada and Lightbown are probably the most complex since they deal with the true nature of instruction in the classroom. However, their results are not clear since they fail to show a systematic relationship between the stimuli in the classroom and what the learners come to know. Much clearer results come from the VanPatten and Cadierno studies which have shown that certain kinds of metalinguistic instruction can work (instruction focusing on cues relevant for parsing grammatical roles and linear order, cues for tense marking and the expression of time relations). They have also suggested how it might work. The most intriguing studies may well be those involving the learning of artificial languages for these permit one to control most carefully the nature of the input for the learning of specific types of formal structures, or for investigating the relative efficacity of various kinds of evidence. The Robinson studies offer clear support for the utility of metalinguistic instruction provided as explicit pedagogical rules. Some support is provided for implicit and enhanced forms of focus on form by both the Robinson studies and the Doughty study, although how these types of stimuli work remains to be worked out in detail. Some might argue that a certain scepticism as to whether these studies show that instruction has altered the interlanguage grammars of the learners is necessary. Opponents of the hypothesis that metalinguistic instruction can indeed cause
THE EVIDENCE FOR NEGATIVE EVIDENCE
313
learning in autonomous cognitive systems would insist on such scepticism in the absence of explicit models of how the instruction in question could play its putative causal role. As we seen, most of the studies actually provide the learners with lots of stimuli from which they could i-learn the relevant properties of the target language. In that case, it may be that instruction assists the learner in developing greater control over certain aspects of his speech production, or may assist in performing essentially metalinguistic tasks, such as making acceptability judgements. Moreover, as the de Graaf studies show, these types of experiments also do not resolve the issue of the relevant level of analysis at which learning is occuring. We might think we are instructing on “syntax” or “phonology” but what the learner actually represents is a separate matter. It is early days yet for this type of research and we must await further studies before we can safely assert that explicit instruction clearly can lead the learner to restructure her mental grammar, thereby representing distinctions which had not been encoded previously. Demonstrating that instruction is necessary requires yet another research step. 2.3 The feedback and correction studies i. The Tomasello and Herron studies Tomasello and Herron conducted three studies looking at the role of different kinds of feedback on the learning of aspects of French grammar by adult native speakers of English. The subjects were students in a French-language classroom. These studies tie in with those just reviewed in that, in all cases, the subjects were given grammatical instruction about French. Tomasello and Herron (1988) investigated the instruction of 8 targetted forms of French. These are listed in (8.10). (8.10) French forms under investigation du (contraction of de + article le, when used as a preposition) au (contraction of à + article le, when used as a preposition) mon (possessive adjective used before a feminine noun beginning with a vowel in apparent violation of agreement constraints because the form is the same as that used with masculine nouns) ne (verb) pas de (une is replaced by de after certain negated verbs) dites (irregular form of the verb) cet (masculine demonstrative specifier used before nouns beginning with a vowel, varies within the paradigm with ce) meilleur (irregular form of the comparative of bon) -er imperative (irregular form of the second person, familiar imperative)
314
INPUT AND EVIDENCE
It should be noted that these forms constitute exceptions to generalisations of one sort or another. The targetted forms were characterised in terms of their degree of difficulty and then randomly assigned to one of two treatment conditions taught in one term. In the subsequent term, the assignment was reversed so that each structure was taught in each condition. Forms were introduced illustrating relevant generalisations via oral pattern drill. This initial instruction did not include the forms in (8.10). The targetted exceptions were then introduced under a “Garden Path condition” or a comparison condition. First of all, students were asked to examine fill-in-the-blanks strings on the blackboard which included stimuli where the exception would have to be used. In the comparison condition, the teacher pointed to the novel stimulus and explained to the students in the L2 that the form used in the relevant stimulus was exceptional. She then illustrated by providing a model and an explanation of its nature. In the Garden Path condition, the teacher led the students to actually make errors by overgeneralising the instructed pattern and then corrected the error. The forms were tested three times using fill-in-the-blanks tests like those used during the instruction. Results were very variable across the structures tested with some forms being reproduced completely correctly on the final test and close to completely correctly as early as the first test (e.g. meilleur, mon, dites, and cet), while others were much harder to master. The results were significantly higher under the Garden Path condition. Although the authors insist that the mechanism explaining these differences is not known, they speculate that the forms investigated and the correction provided lead to the perception of a contrast between the form generated by the learner’s rule and the form actually used. It is unlikely that this is the correct explanation because under both conditions we may assume that learners have represented in their conceptual system the generalisation induced during the instruction. The explicit provision of metalinguistic instruction would provide a contrast just as much as the correction of errors. The difference is, of course, that correction provides a model which the learner can encode and store in longterm memory, and which might be rehearsed and strengthened each time the learner makes a mistake. Learners who got models introduced at the moment of instruction of the forms may have formed only weak representations of the correct forms. Herron and Tomasello (1988) looked at similar phenomena but investigated the relative efficacity of modelling the correct forms versus feedback which helped the students to identify and locate their own errors. One set of French structures was corrected, while another was explicitly modelled. Results showed significant differences between the two treatment contexts in favour, for most structures, of the feedback contexts.
THE EVIDENCE FOR NEGATIVE EVIDENCE
315
Tomasello and Herron (1989) examined the effects of correcting errors resulting from transfer from the L1 to the L2 interlanguage grammar. The phenomena examined are listed in (8.11). (8.11) a.
b.
c.
d.
e.
f. g. h.
the elimination of the article in a predication stating a profession: Je suis professeure. I am professor ‘I am a professor’ écouter ‘to listen to’ c-selects a direct object: J’écoute la radio. ‘I am listening to the radio’ the use of avoir to express one’s age: J’ai 43 ans I have 43 years ‘I am 43 years old’ the use of aller to express destination: Je vais à Potsdam chaque semaine. ‘I go to Potsdam each week’ the use of prendre, emporter and emmener to render the English “to take”: Je prends un taxi. ‘I will take a taxi’ J’emporte mon passeport quand je quitte l’Allemagne. ‘I take my passport when I leave Germany’ J’emmène mon chien avec moi quand je voyage. ‘I take my dog with me when I travel’ the use of two verbs to express “to visit”: visiter NP and rendre visite à NP the use of two verbs to express “to know”: connaître and savoir. the use of aller to render transportation by car when destination is mentioned: Je vais nullepart en voiture. ‘I don’t drive anywhere’ vs. Je prends toujours la voiture quand je fais le marché ‘I always drive when I go shopping’
Structures were randomly assigned to either the Garden Path and the comparison conditions during one term and assigned to the other condition in the following term. Instruction began with the teacher teaching a French expression and then asking students to translate four English sentences written on the blackboard. These sentences were designed to encourage transfer errors. In the comparison
316
INPUT AND EVIDENCE
condition, the problematic correspondences were explained, modelled, and the students were warned to pay attention. In the Garden Path condition, the students were led to make errors and these were corrected. Forms were tested three times. Once again, there was considerable variability across the targetted forms with some eliciting perfect responses by the final test as well as on the initial test while other forms elicited few correct responses initially and did not reach 80% correct by the final test. The authors again speculate that the correction technique may work well when learners must withdraw from an overgeneral generalisation. These studies offer more support that feedback, and in particular correction, can help learners learn the correct information needed to perform on production tasks. In each of the studies examined, correction appeared to help the learner restructure his grammatical knowledge in the direction of the L2 target, learning exceptions to generalisations involving lexical substitutes for expected strings of lexical items (du, au), exceptions to verbal paradigms, alternate forms for specific words, learning the form-meaning correspondences of certain predicates in French which deviate from those of English by expressing slightly different semantic features, or learning alternate subcategorisation patterns. These studies show that correction can have a positive effect particularly on elements likely to cause problems for learners precisely because they are exceptional. In a previous review of these studies, Carroll, Roberge, and Swain (1992) and Carroll and Swain (1993) observe that the Tomasello and Herron studies do not address the one critical question for negative evidence within learning theory: Can learners induce grammatical generalisations on the basis of feedback and correction? Unless it can be shown that negative evidence can lead learners to induce novel categories, structures or patterns in the L2, it will be clear that its role in a theory of language learning must be quite limited. The burden of learning the principal properties of a grammar would then fall on the observation of properties of the stimuli — induction without feedback or correction. Moreover, it might be argued that feedback and correction, if useful only for limiting too general generalisations is only enhancing an input which would signal the learner’s problem anyway. In other words, normal stimuli would in the course of time provide exactly the same information as the feedback and correction making it a possibly helpful but unnecessary part of the learning process. ii. The Carroll and Swain studies Carroll et al. (1992) attempted to address precisely this issue. We examined the relative efficacity of explicit correction in learning the selectional properties of two noun-forming suffixes in French. We chose to focus first on this feedback type because correction is often taken as the most “natural” and frequent form
THE EVIDENCE FOR NEGATIVE EVIDENCE
317
of feedback. In fact, it turns out to make complex cognitive demands of learners if they are to assign an interpretation to the correction rather than rejecting it as irrelevant “noise.” Correction, in particular, is an indirect or tacit form of negative feedback, although it does provide a model of the desired form. I return to this issue in Chapters 9 and 10. The subjects (79 native speakers of English) consisted of two groups of adult intermediate and advanced learners of French who were trained to draw an explicit relationship between French verbs and corresponding nouns. Stimuli were of the form shown in (8.12) and were designed to draw the subjects’ attention to the formal and semantic relatedness of the verb and noun forms. (8.12) In French, one can say Marie a bien attelé les chevaux (‘Marie has harnessed the horses well’). Once again, Marie a bien attelè les chevaux. One can also say, Marie a fait un bon attelage des chevaux (‘Marie did a good harnessing of the horses’). Once again, Marie a fait un bon attelage des chevaux. The word attelé resembles the word attelage, and they have a similar meaning.
In the case of the -age nouns, the stimuli all involved sentences in which the verb was immediately followed by a direct object. After training on this relationship, subjects were trained in a similar manner to draw a relationship between verbs and -ment nouns. In this case, the verbal stimuli were contained in sentences where there was either no complement at all, or else the complement involved an adjunct PP. The stimuli were thus designed to illustrate the generalisation shown in (8.13). (8.13) Within the stimulus set, a transitive verb forms its deverbal nouns by suffixing -age to the verb stem. Within the stimulus set, an intransitive verb forms its deverbal nouns by suffixing -ment to the verb stem.
After the training sessions, all subjects saw cards on which were written French sentences accompanied by an English translation. The French verbs on the cards were graphically highlighted. Subjects were asked to read the sentences aloud and then to state the appropriate noun corresponding to the highlighted verb. The experimental groups were given feedback or corrected whenever they made an error. A comparison group was not corrected. Following the feedback sessions, all subjects saw new items and were again asked to provide the appropriate noun corresponding to the verb stimulus. No feedback was provided at this point. Results of ANOVAs showed that the groups which received correction performed significantly better than the groups which received no feedback but t-tests done comparing differences between the experimental and the comparison groups for each of the non-feedback sessions revealed no significant differences
318
INPUT AND EVIDENCE
between the groups. Carroll et al. (1992) interpreted these results in terms of four distinct questions. (8.14) a. b. c. d.
Does error correction have an effect? Can error correction help adult learners construct morphological generalisations? Can error correction help adult learners restrict the domain of application of generalisations in a principled way? Assuming that it has some effect, does error correction have the same effect on adult learners regardless of how much of the language they already know?
The response to question (8.14a) was “Yes, error correction had an effect”, namely that the experimental groups guessed the correct form of the noun significantly more often than the comparison group. However, the response to question (8.14b) was in the negative since there were no significant differences between the groups on novel items suggesting “that the experimental subjects did not extract the expected generalisations from the feedback.” Instead, the authors suggested that experimental subjects were being helped to memorise words by the feedback. This is not an insignificant result if it shows that lexical learning is aided by feedback, but it hardly leads one to predict a major role for feedback in the development of grammatical knowledge. Carroll and Swain (1993) used the same methodology to investigate whether different types of feedback and correction could assist Spanish learners of English in inducing the correct phonological and semantic constraints on double object or “dative” verbs. As is well-known, a subset of English verbs alternates between an NP PP subcategorisation and an NP NP subcategorisation. (8.15) a. b.
John gave an apple to Mary. John gave Mary an apple.
Equally well known (see Pinker 1989, for details) is that the alternation is subject to certain limitations. On the one hand, verbs must consist of a phonological foot. This means essentially that the verb stem consists of a single syllable or of two syllables with stress on the first. The verb give can alternate but the verb donate cannot; the verb tell can alternate but the verb explain cannot. (8.16) a. *John donated the museum a painting. b. *John explained me the story.
This constraint appears to be very difficult to learn from primary data alone, since it leads to incomplete learning or fossilisaton on the part of many learners of English (R. Hawkins 1987; Mazurkewich 1984a, b, 1985). French, Dutch, and
THE EVIDENCE FOR NEGATIVE EVIDENCE
319
German learners of English, for example, who produce otherwise native-like speech still will make errors like those in (8.16). The second constraint is semantic; the object of the preposition can be either a recipient hence [+animate] or a goal [±animate] but the alternating first object in a double object construction must be a recipient. (8.17) a. John sent a picture to the border. b. *John sent the border a picture. c. John sent the boarder a picture.
In the experiment reported on, adult Spanish speakers saw sentences of English showing the alternation, along with translations. The stimuli in the training sessions looked like (8.18). (8.18) Training example (on card) Peter wrote a letter to Theresa. Peter wrote Theresa a letter. Verbal instructions (read out loud by the experimenter as the subjects examined a card containing the stimulus shown above) We are doing a study concerned with English as a second language. I will give you a sentence and I would like you to think of a different way of saying the same thing. For example, in English you can say Peter wrote a letter to Theresa. Once again, Peter wrote a letter to Theresa. But you can also say Peter wrote Theresa a letter. I repeat: Peter wrote Theresa a letter. These two sentences, Peter wrote a letter to Theresa and Peter wrote Theresa a letter, have the same meaning; they alternate.
We did not teach the subjects any metalinguistic information or explicit rules about the alternation. We gave subjects models of the relationship, drew their attention explicitly to the information (independently inferrable) that the stimuli meant the same thing, and taught them one bit of metalinguistic vocabulary (the word alternate) to facilitate the running of the rest of the experiment. As in the previous experiment, subjects were divided into an experimental and a comparison group. In the experimental sessions we were able to get the subjects to make errors. This is because a strictly formal relationship, induced from the stimuli in the training session, is overgeneral. It fails to exclude the phonological and semantic exceptions. It is interesting in itself to observe that we had no difficulty in getting subjects to make a formal generalisation of the relevant sort. They readily cognised the alternating relationship in the eight items of the training session even in the absence of explicit metalinguistic training. Subjects were divided into five subgroups. We corrected errors made by the
320
INPUT AND EVIDENCE
experimental group. The comparison group was not corrected. In this experiment we actually added a wrinkle by offering different kinds of feedback to subgroups of experimental subjects. See Carroll and Swain (1993) for details. The feedback sessions were then followed by sessions when subjects saw new material and were asked to guess if particular stimuli alternated or not. Analysis of results showed signficant differences between the experimental subjects and the comparison subjects in favour of the experimental subjects. As in the previous experiment, the provision of feedback helped the experimental subjects learn the particular stimuli. This time, however, it also helped them zero in on the correct generalisation. How this might have happened, I will take up in Chapter 9. In Section 3, I will report on the last experiment in this series which also provides evidence that negative evidence can lead learners to induce linguistic generalisations. iii. The Tanaka (1999) study Tanaka (1999) investigated the acquisition of the Japanese subject (ga)/topic (wa) distinction in a computerised study involving 52 adult learners with various mother tongues. 25 of the subjects were native speakers of English, and all of the rest were learners of Japanese who also spoke English, a fact which is of importance since the materials included information presented in English. The experiment was conducted in three sessions over several days. Subjects saw 58 cartoons on a computer screen which involved a discourse context established in English by the first three boxes of the cartoon. They were required to provide the text for the fourth box in Japanese by selecting one of four options from a multiple-choice test. The list of options included a focus NP marked by ga (followed by a non-focus NP + copula), a non-focus NP marked by wa (followed by a focus NP + copula), a focus NP marked by wa (followed by a non-focus NP + copula), and a non-focus NP marked by ga (followed by a focus NP + copula). Subjects typed in a number corresponding to their choice of response and were then asked to indicate how certain they were of its correctness. Subjects were divided into several treatment groups. Two groups (MO and M) received feedback if they made incorrect choices during the experiment. Two groups (MO and O) were required to produce vocal responses (which were recorded). Two groups (M and C) did a reading task during the time that the output group were producing output. The feedback provided was metalinguistic in nature and consisted of detailed information about the use of wa and ga as topic and focus markers. The testing phase then consisted of novel items. Results indicate that feedback groups scored significantly better than nonfeedback groups. Tanaka also reports that subjects who received feedback rated their
THE EVIDENCE FOR NEGATIVE EVIDENCE
321
certainty of response more accurately than those who did not. She suggested that this is because they have greater knowledge of the domain of knowledge in question. iv. Summary of the correction and feedback studies In this sub-section, I have reviewed several studies which involve explicit and implicit kinds of feedback to learners and/or correction (i.e., a model), all of which indicate that learners can modify their behaviour on that basis. Once again, the various studies involve a limited number of linguistic phenomena covering a broad range of types (exceptions to morphosyntactic generalisations, restrictions of derivational suffix attachment, semantic and phonological constraints on the double object construction, pragmatic function markers). In several of these studies, learners were asked to apply what they had learned to novel items and were successful in doing so, more successful than learners who had received no comparable feedback or correction. These studies provide pretty solid evidence that learners can learn information associated with individual lexical items based on feedback and correction. More importantly, they show that learners can learn abstract properties about the target language on the same basis. Overall, these studies provide considerable support for the hypothesis that L2 learners can restructure their grammars on the basis of correction and feedback. In the next section, I will provide some new data which point to the same conclusion.
3.
The -ing affixation/morphological conversion study
This study was done as the third in the series reported on above.16 3.1 The subjects The experiment was conducted with 100 adult ESL learners whose primary language was Spanish. These subjects had all previously participated in the study reported on in Carroll and Swain (1993). They were all literate, and the vast majority had had some post-secondary education. At the time of the experiments, they were enrolled in various low-intermediate ESL classes in the Toronto (Canada) area from which they had been recruited. Participation was voluntary and all subjects were paid. To ensure uniformity of language learning level beyond that provided by the subjects’ grouping in their ESL programs, we had previously administered an acceptability judgement test to everyone. This tested various types of grammatical problems exemplified in a set of 50 sentences recorded on tape and presented orally. See Carroll and Swain for further details.
322
INPUT AND EVIDENCE
Subjects were required to identify each sentence as correct or incorrect. Minimum and maximum scores were established.17 All of our subjects had scores falling between the limits selected. 3.2 Design and methodology The design and methodology followed that of the two previous experiments. i. The sessions The acceptability judgement task had been administered to all subjects in an initial meeting preceding the double object experiment. Several months later, the current experiment was conducted with each subject individually on two separate occasions. The first meeting consisted in giving the experimental sessions and our first test (Test 1). One week later we administered the second test (Test 2). All sessions were tape-recorded, transcribed and scored by the same individual. The complete set of items can be found in Appendix 1. ii. The learning problem We wanted our subjects to learn a particular contrast involving noun formation from verb stems. The verb stems in question can be simple or can be based on the participal V+ing. Thus, one can relate sentences like those in (8.19) and (8.20) through the contrast in question. (8.19) a. b.
Can you help Margarita with her homework? I think she needs some help.
(8.20) a. b.
Many people help me at odd jobs. I am grateful for this helping out.
The forms in (8.19) involve a contrast between a simple verb form and a simple noun form with no morphology marking the shift in syntactic category. This relationship is sometimes referred to as category conversion. The relationship can be expressed as in (8.21) which indicates formally that conversion involves no addition of an affix or stem to a base root or stem. (8.21) a. b.
N→V N | V
What then constitutes the cues that conversion has occurred? It must be the case that the morphosyntactic distribution of the forms in the two distinct sentences is critical. In other words, the appearance of the form help in a position reserved
THE EVIDENCE FOR NEGATIVE EVIDENCE
323
for verbs in (8.19), and its appearance in (8.20) in a position reserved for nouns, serve as the cues that the same form is a noun in (8.19) but is a verb in (8.20). Notice, however, that this explains very little. In an inductive learning theory without recourse to language universals one can legitimately ask the question: What prevents the learner from assuming that (8.19) simply indicates the existence of a single category, a prototype with both nominal and verbal properties but something that is neither [+N, −V] nor [−N, +V] as current category theory requires? Why must the learner learn a distinct and separate category for these sentences? In addition, what prevents the learner from collapsing the classes illustrated in (8.19) and (8.20) with the class of adjectives. It is known that participles and gerunds manifest mixed properties cross-linguistically (Spencer 1991: 27–30). One might respond that extended experience with the form help tells the learner that there is a version which can take verbal morphology, and a second form which can take nominal morphology. Unfortunately, in contrast to many languages, English has no unique inflectional morphology which will serve to tell the learner that a given instantiation of the word must belong to a specific lexical category, therefore, distributional cues must be the main cues. Derivational affixes will not necessarily help either since it is now known that Aronoff’s (1976: 47) Unitary Base Hypothesis, which states that the base to all morphological operations must be syntactically specified, or at least be specifiable as a natural class, is not correct (Drapeau 1980; Aronoff 1994). In any event, it will become obvious below that our subjects needed no such extensive experience; they understood the contrast readily. In order to explain the ease with which learners grasp this categorial distinction, two explanations are possible. One is that they are transferring categorial properties from Spanish to their interlanguage grammars based on a translation of the English words into the corresponding Spanish lexical items. A simple rendition of the stimulus in (8.19) could be expressed in Spanish as (8.22) (8.22) a. b.
¿Puedes ayudar a Margarita con sus lecciones? ¿can+2 help+. Margarita with her+ homework+ Creoque necesita ayuda. believe+1+that need+3 help
It will seen that there is no identity of form between the Spanish noun and the Spanish verb since the verb must always be inflected for tense, person and number. One can, of course, argue that the stem is identical but this is irrelevant to the claim that learners are transferring the syntactic class from Spanish. A second hypothesis is that learners deploy representational systems incorporating a set of morphosyntactic features, minimally [±N, ±V], and that UG
324
INPUT AND EVIDENCE
excludes the possibility that a single morphosyntactic word can be simultaneously specified as [+N, −V] and [−N, +V]. Learners, on seeing the contrast in (8.19), and on identifying these contexts as precisely verbal and nominal contexts, would then be forced to analyse help as two separate words, one specified as a noun and bearing the features [+N, −V], the other specified as a verb and bearing the features [−N, +V]. This is, I believe, the correct explanation and demonstrates what I mean when I say that induction must “respect” the properties of UG. Induction cannot lead the learner to specify a single form as having contradictory formal featural specifications, moreover, it cannot leave elements appearing in morphosyntactic structures underspecified for syntactic category. The forms in (8.20) also illustrate conversion and can be described in terms of (8.21). The formal difference between help and helping is in fact predicted by (8.21) since V can be realised as a verb stem, or as a participle consisting of the verb stem plus the suffix -ing. The formal difference can be expressed as in (8.23).18 (8.23) a. b.
[[help]V ]N [[[help]V ing]V]N
The formal difference representing different internal structures of the two nouns corresponds to a semantic difference. Quirk, Svartvik, Greenbaum, and Leech (1991: 1064) characterise the -ing forms in nominal -ing clauses as able to refer to a fact or to an action. In fact, the semantics of the noun phrases containing derived -ing nouns is a complex, combinatorial computation based on the meaning of the verb stem, the process semantics of the -ing suffix, and the semantics of noun complements.19 There are clear contrasts expressed in (8.24–8.25). (8.24) a. *The singing was difficult to compose. b. The song was difficult to compose. (8.25) a. b. c.
The inspirational The inspirational The inspirational different ditties. d. #The inspirational different ditties.
singing lasted for hours. song lasted for hours. singing lasted for hours, and the choir sang many song lasted for hours, and the choir sang many
The examples in (8.24) establish a contrast between the action associated with the verb sing, and the product or thing which is both input and output of the singing event.20 The activity of composing is precisely the activity which creates the product. The action of singing is conceptually not equivalent to this event, although someone might spontaneously sing and compose a song. The examples in (8.25) are designed to show that both the musical event and the musical
THE EVIDENCE FOR NEGATIVE EVIDENCE
325
product can be characterised as lasting for a period of time, but only the -ing noun can refer to repeated actions. Sentence (8.25b) must refer to a single event. The action interpretation is also illustrated in the common injunctions in (8.26). (8.26) a.
No smoking! No spitting! No loitering!
We wanted our subjects to associate an event interpretation with the -ing form, and to recognise that the form derived by conversion from a verb root or stem has no systematic meaning beyond denoting a THING concept. Beyond this, I need to point out that conversion word formation rules interact with the phonology of English in different ways. Certain lexical forms are subject to word stress, ablaut and rules like trisyllabic laxing, others are not. Consequently, many linguists (e.g. Siegel 1974; Kiparsky 1982; Mohanen 1986) have proposed that there are levels of word formation which interact with the phonology in an ordered fashion. The figure below is based on Jensen (1990: 92).
Dictionary of morphemes ↓ Level 1 morphology + boundary affixes e.g., + ian, +al; irregular plurals
→ ←
Level 1 phonology e.g., word stress, trisyllabic laxing ↓
Level 2 morphology # boundary affixes e.g., #ism
→ ←
Level 2 phonology
↓ Level 3 morphology e.g., regular inflection
→ ←
Syntax
→
Level 3 phonology ↓ Surface structure ↓ Postlexical phonology (vowel insertion, devoicing phrasal stress)
Figure 8.1. A model of Lexical Phonology showing the interaction of word-building processes and phonological rules (based on Jensen 1990: 92)
326
INPUT AND EVIDENCE
According to Figure 8.1, conversion rules are represented at different levels depending upon whether conversion acts on a verb to make a noun or acts on a noun to make a verb. Phonological distinctions, as well as certain semantic distinctions, provide cues as to which process is involved in relating a given noun–verb pair. This has certain consequences for our methodology. Our stimuli presented all of the input first in the form of a verb, and then elicited a noun form. Thus, we presented the data in such a way as to gloss over the fact that the verb–noun relationship illustrated over the sentence pairs could be derived in principle via either a rule converting a verb into a noun, or via a rule converting a noun into a verb. The methodology favours interpreting all input as if derived via (8.21). Since certain phonological processes in English differentiate verb-to-noun conversion and noun-to-verb conversion, we must exclude the possibility that our data illustrated noun-to-verb conversion. It should be noted, first of all, that our stimuli generally involved simple one or two syllable words, thus excluding any possible interaction of the conversion processes and either word stress or trisyllabic laxing. Secondly, we did not restrict ourselves to verbs with only regular past tense forms; several verbs used (e.g. fly, read, choose, give, etc.) have ablaut forms. However, none of these ablaut forms appeared in our stimuli. Moreover, even if they had, it would not have been problematic since the ablaut forms are in fact instances of verb-to-noun conversion. Thus, despite the fact that English provides phonological cues for determining the order of derivation (verb-to-noun or noun-to-verb), our data were consistent with the intended analysis verb-to-noun. To sum up, we presented our subjects with instances of verbs embedded in a syntactic context, and asked them to derive one of two types of noun: a noun derived by conversion from an uninflected verb root or stem, or a noun derived by conversion from a present participal. In the training sessions, the -ing forms appeared in noun phrases which could be given an event interpretation (action, activity, accomplishment), while the simple root/stem conversions appeared in noun phrases which lent themselves to a THING or INDIVIDUAL interpretation.21 iii. The procedures The procedures began with an 8-item training session. Subjects saw texts like the one in (8.26) and simultaneously heard the texts in (8.27). (8.26) Training example (on a card) Call me tomorrow morning! Give me a call in the morning! (8.27) Verbal instructions This is also a study for learning English as a Second Language, somewhat
THE EVIDENCE FOR NEGATIVE EVIDENCE
327
different from the one we did together before. This time, I will give you a sentence with a special word, a verb, and I would like you to think up a related word, a noun, in a second sentence. Here is an example: Call me tomorrow morning! Again: Call me tomorrow morning! You can also say: Give me a call in the morning! I repeat: Give me a call in the morning! The verb call in the first sentence Call me in the morning! has a similar meaning and form to the noun call in the second sentence Give me a call in the morning!
Subjects saw and simultaneously heard a second pair of sentences relating a simple verb to a simple noun form. Then, they were presented one-by-one with two cards that had a stimulus sentence and a response with a blank to be filled in. (8.28) Verbal instructions Now I will give you more sentences. I would like you to read aloud the first sentence, then read the second sentence and fill in the blank with a noun with a meaning similar to the verb underlined in the first sentence. (8.29) Elicitation stimulus cards A. Can you answer this question? Sorry, I don’t know the __________. B. Steve likes to fish. Yesterday, he caught a large __________.
The A and B cards were followed by two more examples illustrating for the subjects the formal identity and semantic relatedness of the verb and noun pairs. These were followed by two more response elicitation cards. If any of the subjects’ responses were incorrect during the elicitation part of the training session, it was repeated from beginning to end once. Immediately following the training session on the bare-stem conversion, subjects were trained on the -ing conversion. The training format was the same, with alternating pairs of complete illustrations and elicited response stimuli. Once again, if subjects erred on any of the four elicited response stimuli, the complete training session was repeated once. Each subject was told that sometimes the noun in the second sentence (the one with the blank) would be identical to the verb in the first sentence, and sometimes it would not. They were to decide when the noun–verb pairs were identical and when they were not and to furnish an appropriate form. The experimental groups were then given information about the feedback they would receive. Prior to the experimental session, each subject received information tailored to his or her feedback group. Since our subjects had participated in a previous experiment using the same methodology, we assigned them to the same
328
INPUT AND EVIDENCE
treatment group they had previously been assigned to. We were not worried about subjects being familiar with the design, on the contrary, if it made the procedures understandable it was a good thing.22 In contrast, we were worried that assigning subjects to a different treatment group might confuse them. The experimental session consisted of four parts: an initial feedback session and a guessing session, followed by a second feedback session and a second guessing session. There were 12 cards in each part so that subjects saw and heard 48 distinct stimuli (see Appendix 2). (8.30) Experimental design Training session — 8 items V — N conversion (possibly twice) 8 items V — N+ing conversion (possibly twice) Experimental session – 12 feedback items – 12 guessing items – 12 feedback items – 12 guessing items 48 total Test 1 – 48 items (randomised) Test 2 – 48 items (Test 2 repeated one week later)
During the feedback sessions, experimental subjects received feedback whenever they erred in responding. During the guessing sessions, no one received feedback. Subjects were divided into five treatment groups of 20 each according to the type of feedback they received. Subjects in Group A were told they were wrong whenever they made a mistake, and were given a metalinguistic explanation about why their response was wrong. Subjects in Group A were told that they would receive such an explanation if they made a mistake. Subjects in Group B were told that they would simply be informed if they made a mistake so they got negative feedback of the “That’s wrong” sort only. We called this explicit utterance rejection. Subjects in Group C were told that if they made a mistake, the sentence would be repeated with the correct form of the noun. This is modelling of the correct form and is comparable to that studied by Tomasello and Herron in their experiments. Subjects in Group D were told that if they made a mistake, they would be asked if they were sure that their response was correct. This type of indirect feedback is comparable to that which the interactionists previously reviewed have suggested can cause grammatical restructuring. The groupings thus correspond to a hierarchy of explicitness in the feedback received,
THE EVIDENCE FOR NEGATIVE EVIDENCE
329
with Group A receiving the most explicit form of negative feedback and Group D receiving the most implicit form of negative feedback. (8.31) Subject groupings Group A (19 subjects) — explicit feedback plus a metalinguistic explanation Group B (20 subjects) — explicit utterance rejection (e.g., “No, that’s wrong”) Group C (20 subjects) — modelling (correction) Group D (20 subjects) — indirect metalinguistic feedback (e.g. “Are you sure?”) Comparison Group (20 subjects) — no feedback.
3.3 Predictions The null hypothesis was that there would be no significant differences among the different groups. If, on the contrary, feedback did help learners learn the semantic constraints differentiating conversion and -ing suffixation, then Groups A-D ought to have performed significantly better than the Comparison Group. Should it be the case that some but not all forms of feedback assist learning, then variation among the treatment groups should be observed. If explicitness of feedback is what assists learners, then Groups A and B should do better than Groups C, or D. If it is the actual content of the information which helps learners to locate errors, then Group A should do better than all of the rest. If modelling a correct form is what helps, then Group C should do better than the others. If à la Craik and Tulving (1975), it is degree of processing which aids in acquisition and recall, then Group D should do best. 3.4 Major findings Figure 8.2 shows the mean percent correct responses for the feedback items by group and by session. These along with standard deviations are shown in tabular form in Table 8.1.
330
INPUT AND EVIDENCE
90%
Group A Group B Group C Group D Comparison
80% 70% 60% 50% 40% 30% 20% 10% 0% Initial
Test 1
Test 2
Figure 8.2. % Correct — Feedback items Table 8.1. Means and standard deviations by group and session for the feedback items Initial Feedback Session Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.74 .71 .68 .77
.08 .11 .11 .11
Comparison
20
.54
.12
Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.85 .79 .84 .81
.11 .12 .11 .10
Comparison
20
.53
.10
Test 1
331
THE EVIDENCE FOR NEGATIVE EVIDENCE
Test 2 Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.77 .74 .77 .81
.11 .11 .10 .10
Comparison
20
.55
.12
*Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
Table 8.2 shows the result of a two-way repeated measures ANOVA run on these data. It indicates a main effect for between-subjects differences based on group. This is significant (p < .002). It also reveals a main effect for session. Within-group differences based on session are also significant (p = .000). There is a significant group by session interaction (p < .004). Table 8.2. Results of repeated measures ANOVA for the feedback items Source
SS
Between Subjects: Groups Subjects within groups Within Subjects: Session Groups by session Residual
2.97
DF
MS
F
F prob.
004 095
.12 .03
04.71
.002
002 008 190
.20 .01 .00
46.44 02.96
.000 .004
099 0.49 2.48
1.35
200 0.41 0.10 0.84
Multiple between-group comparisons of means (using least square differences) were made to determine which groups were significantly different from each other. Results are shown in Tables 8.3, 8.4, and 8.5.
332
INPUT AND EVIDENCE
Table 8.3. Between-group comparison of means of feedback items on the initial feedback session Mean
Group§
Group Comparison A
.54
Comparison
.68 .71 .74 .77
C B A D
* * * *
– – – –
B
C
D
– – – –
– – – *
– – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
On the initial feedback session, all groups were already performing significantly better than the Comparison Group. In addition, Group D (the group receiving the most implicit form of feedback) performed significantly better than Group C (the corrected group). On first test following the feedback session (Test 1), all of the experimental groups performed significantly better than the Comparison Group. Table 8.4. Between-group comparison of means of feedback items on Test 1 Mean
Group§
Group Comparison A
.53
Comparison
.79 .81 .84 .85
B D C A
* * * *
– – – –
B
C
D
– – – –
– – – –
– – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
At Test 2 (one week later), all groups are still performing significantly better than the Comparison Group. All the experimental groups except Group D, however, decline in performance. At this point, the difference between Group D and Group B becomes significant.
333
THE EVIDENCE FOR NEGATIVE EVIDENCE Table 8.5. Between-group comparison of means of feedback items on Test 2 Mean
Group§
Group Comparison A
.55
Comparison
.74 .77 .77 .81
B A C D
* * * *
– – – –
B
C
D
– – – *
– – – –
– – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
When we consider the items which are repeated across feedback and test session (the feedback items), results once again indicate that feedback helps adult SLA learners learn and recall the forms of individual words (see Carroll et al. 1992; Carroll and Swain 1993), a now unsurprising result. Let us therefore turn to the discussion of the guessing items which alone indicate if learners have extracted a generalisation about conversion and the interpretation of the -ing forms. 70%
Group A Group B Group C Group D Comparison
60%
50%
40%
30%
20%
10%
0% Initial Guessing Session
Test 1
Figure 8.3. % Correct — Guessing items
Test 2
334
INPUT AND EVIDENCE
Figure 8.3 shows the mean percent correct responses for the guessing items by group and by session. These along with standard deviations are shown in tabular form in Table 8.6. Table 8.6. Means and standard deviations by group and session for the guessing items Initial Guessing Session Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.60 .55 .59 .62
.12 .11 .11 .13
Comparison
20
.52
.10
Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.61 .59 .58 .64
.11 .13 .11 .14
Comparison
20
.53
.14
Group*
N
Mean
Standard Deviation
A B C D
20 20 20 20
.57 .60 .58 .67
.13 .14 .10 .13
Comparison
20
.56
.10
Test 1
Test 2
*
Group A = group getting explicit feedbeack plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
A two-way repeated measures ANOVA were run on these data. There is a main effect for group; between-subjects group differences just reach significance (p < .05). This time there is no within-subjects effect for session but a group by session interaction appears, just reaching significance at the .05 level. See Table 8.7.
335
THE EVIDENCE FOR NEGATIVE EVIDENCE Table 8.7. Results of repeated measures ANOVA for the guessing items Source
SS
Between Subjects: Groups Subjects within groups Within Subjects: Session Groups by session Residual
3.77
DF
MS
F
F prob.
004 095
.09 .04
2.50
.048
002 008 190
.01 .01 .00
2.03 2.05
.135 .043
099 0.36 3.41
0.86
200 0.02 0.07 0.77
Multiple between-group comparisons of means were made to determine which groups were significantly different from the others. Results are shown in Tables 8.8, 8.9, and 8.10. Table 8.8. Between-group comparison of means of guessing items on the initial guessing session Mean
.52 .55 .59 .60 .62
Group§
Comparison B C A D
Group Comparison A
B
C
D
– – – * *
– – – – –
– – – – –
– – – – –
– – – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A= group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
Groups A and D again perform significantly better than the Comparison Group on the initial guessing test. Now Groups B and C do not perform significantly better than the Comparison Group.
336
INPUT AND EVIDENCE
Table 8.9. Between-group comparison of means of guessing items on Test 1 Mean
.53 .58 .59 .61 .64
Group§
Comparison C B A D
Group Comparison A
B
C
D
– – – * *
– – – – –
– – – – –
– – – – –
– – – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
Table 8.9 shows how well the groups did relative to one another on the first test of the guessing items. Once again, Groups A and D outperform the Comparison Group. In contrast, Groups B and C do not perform significantly better than the Comparison Group. One week later, this test was repeated. Results are shown in Table 8.10. Table 8.10. Between-group comparison of means of guessing items on Test 2 Mean
.56 .57 .58 .60 .67
Group§
Comparison A C B D
Group Comparison A
B
C
D
– – – – *
– – – – –
– – – – *
– – – – –
– – – – *
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
Most groups either maintain their performance or decline somewhat. In particular, Group A is performing less well so that only Group D (whose performance actually improves slightly at Test 2) is still performing significantly better than the Comparison Group. At this point, because of the declines in the other experimental group results, Group D is also performing better than Group A and Group C.
THE EVIDENCE FOR NEGATIVE EVIDENCE
337
3.5 Discussion These results clearly show a learning effect for the feedback items. As noted, this was an expected result based on previous findings. The results for the guessing items were less impressive but far more interesting. The betweengroups difference reaches significance at the .05 level from which we conclude that our learners are learning a generalisation on the basis of the feedback given.23 Moreover, these effects are variable according to the type of feedback given. While all types of feedback helped learners to learn the feedback items, only the explicit metalinguistic explanation (feedback type A) and the indirect prompting (feedback type D) helped learners towards a generalisation. This result does not literally run counter to my predictions, but is difficult to interpret. The fact that Groups A and B did not both do significantly better on the guessing items suggests that the explicitness of the feedback is not helpful in this instance. Rather, it would appear that it is the metalinguistic information itself which is leading Group A to perform in a superior way. But if that is true, then Group D’s results are really surprising since they have been given no explicit information at all. The fact that group D continues to improve might mean that we may be observing an effect of depth of processing but then why do we not observe a gradation from most inferencing to least inferencing?24 Such a gradation would be expected if the degree of processing was leading to the learning of the relevant generalisation. Finally, it is noteworthy that modelling/correction (type C feedback) does not help the learners towards a generalisation. At this point, it is perhaps useful to contrast these results with those from the previous experiment with the same subjects learning the constraints on the English double object alternation. In that experiment, there were also significant between-groups differences on the guessing items (p = .000). The effects of feedback were again variable but differed from the results shown here. In the previous experiment, Group A (the group getting an explicit metalinguistic explanation) outperformed the Comparison Group and Groups B and D at the initial guessing session. Group C (the corrected group) also performed significantly better than the Comparison Group. Group D did not. At Test 1, all of the treatment groups did significantly better than the Comparison Group, a result maintained one week later (see Tables 8.11, 8.12, and 8.13).
338
INPUT AND EVIDENCE
Table 8.11. Between-group comparison of means of guessing items on initial guessing session (from Carroll and Swain, 1993) Mean
.42 .55 .55 .60 .64
Group§
Comparison B D C A
Group Comparison A
B
C
D
– * * * *
– – – – *
– – – – –
– – – – *
– – – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
Table 8.12. Between-group comparison of means of guessing items on Test 1 (from Carroll and Swain, 1993) Mean
.39 .55 .55 .60 .61
Group§
Comparison D B C A
Group Comparison A
B
C
D
– * * * *
– – – – –
– – – – –
– – – – –
– – – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
339
THE EVIDENCE FOR NEGATIVE EVIDENCE
Table 8.13. Between-group comparison of means of guessing items on Test 2 (from Carroll and Swain, 1993) Mean
.34 .52 .55 .55 .63
Group§
Comparison B C D A
Group Comparison A
B
C
D
– * * * *
– – – – *
– – – – –
– – – – –
– – – – –
“*” denotes pairs of groups significantly different at the .05 level § Group A = group getting explicit feedback plus a metalinguistic explanation; Group B = explicit utterance rejection group; Group C = modelling (correction); Group D = indirect metalinguistic feedback.
There are various ways to interpret these differences in the results by our subjects across the two experiments. One might hypothesise that the conversion contrast was much harder to learn and, therefore, that the feedback was also much harder to exploit. A careful comparison of the mean scores of each group on the guessing items across the two experiments, however, reveals comparable scores. In addition, one can see a small range of scores on the conversion contrast (range .52 to .67) vs. a much larger range (.34 to .64) for the double object alternation results. In other words, performance on the conversion was less variable and the lowest mean scores achieved were much higher. Performance could thus be said to be “better” on conversion.25 Means on the feedback items show a similar behaviour (ranging from .65 to .84 for the conversion experiment vs. .35 to .60 for the double object alternation experiment). The scores for the conversion feedback items are much higher than the scores for the double object items, but this might reflect the fact that in learning the double object construction one is learning the subcategorisation properties of the verb which require that one be attentive to and process the syntactic context in which the verb occurs. Learning to make the correct selection between the bare root conversion item and the -ing variant does not hinge on the syntactic representation of the stimulus sentence since both are nouns and the syntactic information in the stimuli are the same for both. Alternatively, the improvement might be attributable to familiarity with the design of the experiment, and have nothing to do with the learning of the double object construction. In any event, the affixation/conversion contrast was not harder to learn (even if we assume a compensatory effect for familiarity with the design of the experiment) so we can reject the idea that the feedback on the second experiment was harder to exploit. Another interpretation is that feedback regarding the constraints on the
340
INPUT AND EVIDENCE
double object alternation was more usable than feedback about conversion independently of the relative difficulty of learning one or the other phenomenon. The problem with this hypothesis is to locate the difficulty in using feedback. In both experiments, subjects had to acquire and manipulate semantic information (the semantic constraint limiting double object alternation to verbs expressing transfer of possession in the one case and the interpretation of the -ing suffix as expressing events in the other case). It therefore cannot simply be true that feedback is easy or hard to use when it is directing the learners’ attention to semantic distinctions. At this stage of our knowledge, the differences in the effectiveness of the different types of negative evidence is quite mysterious and must remain unexplained. I must be content merely to have shown that negative evidence can lead adult L2 learners to draw generalisations about the relationship between morphological form (or, alternatively, word-building processes) and word meaning.
4.
Summary
In this chapter, I have reviewed a number of studies examining the ability of different types of metalinguistic information (grammatical instruction, input processing, feedback and correction) to cause the restructuring of learners’ psychogrammars. The research on metalinguistic instruction provides mixed results that such instruction can work and work with some fairly abstract linguistic generalisations. Results are far from conclusive and we still need some plausible psycholinguistic models of how instruction functions. Much more research in this area needs to be done. The studies on input processing involve factors which are easier to control and provide us with clearer results, namely that being forced to process stimuli that have previously not been parsed leads to grammatical restructuring. Nonetheless, here too more experimentation is needed. I have also reviewed research which demonstrates that learners can learn exceptions to generalisations, and indeed induce the generalisations themselves on the basis of both direct and indirect forms of feedback. This conclusion is on firmer empirical grounds, although further studies with other language pairs would be useful. This caveat in place, it seems safe to conclude that adults can learn exceptions to generalisations on the basis of negative evidence. I noted, however, that positive evidence could in principle also produce the same result, meaning that if this were all that feedback and correction were accomplishing, their role in learning theory would be minimal. Results from the learning of artificial languages, however, suggest that negative evidence can lead to the implicit learning of various types of abstract structural information. My research
THE EVIDENCE FOR NEGATIVE EVIDENCE
341
has attempted to show the same thing, using real languages. Finally, I presented new experimental results which confirm the results of my earlier studies. Adult learners can learn abstract linguistic generalisations on the basis of various types of explicit and implicit feedback and are not restricted to instance-based learning or modelling. How do we relate this to the broader issues discussed in previous chapters? I have concluded from these various studies that they provide an empirical basis for rejecting the Schwartz vision of modularity insofar as it makes claims relevant to second language acquisition. This empirical basis can be added to the arguments and facts reviewed in Chapter 7. However, this research is still quite limited in the range of linguistic phenomena investigated. We have no idea as yet if any of the phenomena such as the distribution and constraints on null subjects, the distribution of reflexives or reciprocals and the constraints on the domains of their binding, or the relationship between finiteness, tense, agreement and V2 position, which L2 research tells us develop differently from patterns of L1 acquisition, can be influenced in the right direction by feedback and correction. Both the Schwartz model and the Autonomous Induction Theory predict that the basic properties of these phenomena will not need to be acquired because they are already represented in the learners’ representational systems. However, the Autonomous Induction Theory predicts that to the extent that these phenomena involve inductive learning, then their acquisition can also be facilitated by the provision of feedback and correction. Only further research will show if that assumption is correct. My theory, moreover, predicts that a number of phenomena connected to categorisation could be learned on the basis of feedback since categorisation involves the acquisition of preference rules. Recall that the theory also places certain restrictions on where feedback and correction can possibly have effects: (i) it cannot cause the learner to learn basic features of any level of grammatical representation. It therefore cannot cause learners to learn, say, an articulatory feature or a morphosyntactic feature not present in the L1. Abstract systemic properties of V2 phenomena, such as the putative triggering of verb-raising by the features of the inflectional system, will not be learnable from feedback and correction. I predict that no systemic properties of reflexivisation will be learnable on this basis either. (ii) Feedback and correction cannot cause learners to learn distinctions “too far removed” from the conceptual system in the sense that there is no possible correspondence between the conceptual units and the grammatical categories. I have suggested that phonetic categories are “too far removed” from the conceptual system in just this way. This places some pretty severe limitations on the role that feedback and correction can play in a theory of i-learning. Other restrictions will be discussed in the next chapter.
342
INPUT AND EVIDENCE
Notes 1. That explicit correction and feedback could play a major role in acquisition based on conversations between NSs and NNSs has already been questioned. Chun, Day, Chenoweth, and Luppescu (1982) found little direct or explicit feedback in conversations between Englishspeaking NSs & NNSs. Day, Chenoweth, Chun, and Luppescu (1984) confirmed this result. It is possible that the provision of explicit feedback is regulated by cultural or sociolinguistic norms. This question has not, to my knowledge, been investigated and should be. It is possible that some cultures may be relatively more indulgent of learners’ errors and therefore less likely to provide correction or feedback. Of course, if the provision of feedback is not universal, it could hardly explain the representational problem of SLA. 2. But not necessarily on-line. Loschky’s study did not use on-line measures of comprehension. 3. Presumably, this is also where Hamlet goes if you’re Kenneth Branagh. Given the Intermediate Level of Awareness Hypothesis, Hamlet gets stored as sequences of phonological forms, not as sequences of morphosyntactic representations, since the parts in Hamlet get learned consciously. 4. As far as I can determine the proponents of negotiation are committed to non-UG assumptions. Needless to say, negotation of meaning will not solve the representational problem of SLA. My comments here have focused therefore exclusively on how it might explain developmental change, assuming the relevant properties of the representational systems are given. 5. The picture is actually more complicated than this. As Cadierno (1995: 179–80) points out, the relevant research asks much more specific questions, such as: Does grammatical instruction alter the developmental paths learners take? Does it lead to greater grammatical accuracy on various tasks? Does it speed up the developmental processes? Does it ultimately lead to greater knowledge? Individual researchers have responded positively to one of these questions and negatively to others. 6. In contrast to most of the other studies reported on below, this study examined child L2 learners (approximately 10–11 years of age). 7. Notice that this claim is different from the claim that the learners transfer the acceptability patterns of (8.6) to English. Transferring the parameter-setting leaves open the status of (8.5c); learners might produce and accept such sentences or they might not. Transferring the acceptability patterns of (8.6) entails that francophone learners should reject (8.5c). 8. White (1991a: 140) notes that she had assumed that this group would get positive evidence about adverb placement in English. She discovered, however, that adverbs occur rather infrequently in classroom discourse. 9. Sentences of the sort John fell slowly himself/John walked late nights demonstrate conclusively that the constraint on the distribution of adverbs is truly one on the type of complement receiving Case rather than on the adjacency of an NP. Adjunct NPs can be separated from the verb by adverbs. (That nights is truly an NP is shown by its plural marking.) 10. How does the Autonomous Induction Model account for this data? Unconscious i-learning processes would allow learners to extract information about the ordering of all units at each level of analysis. In addition, learners can extract information about the hierarchical structure of adverb phrases and their place in the morphosyntactic representation to the extent that expressions can be put into correspondence with relevant units represented in Conceptual Structures. In the theory of correspondence adopted here, arguments map onto specific positions in surface structure, as do adjuncts. Thus, Agents, Experiencers, Themes preferentially map onto NP subjects (e.g., daughter of IP). Patients preferentially map onto NP direct objects (e.g.
THE EVIDENCE FOR NEGATIVE EVIDENCE
343
daughter of V′). Functions expressing manner of action or other similar adverbial concepts must map onto a position in the morphosyntactic representation whereby the adverb will have the VP in its scope. This will preferentially be a Spec of VP position. Thus, learners can, given these mapping preferences get from a Conceptual Structure representation to a minimal morphosyntactic representation on the basis of UG-consistent correspondences. Finally, francophones initially seem to be mapping semantic roles and functions onto surface structures positions based on the properties of French verbs and their Conceptual Structures and placing adverb phrases in terms of French surface patterns. In the constraints-based theory of grammar adopted in the Autonomous Induction Theory, differences in surface structure cross-linguistically will be accounted for in terms of differences in the well-formedness constraints of the grammars of the native French and the target language English. In the example under consideration, the relevant constraints hold of the morphosyntax and regulate the ways in which AdvPs can be attached through the operation of unification to higher nodes. 11. Unfortunately, VanPatten and Cadierno are very vague about what the processes involved are, contenting themselves with talk of “strategies and mechanisms that promote form-meaning connections during comprehension” (VanPatten and Cadierno 1993a: 226). For us, intake is that subset of the input [understand here “stimuli”, SEC] that a learner comprehends and from which grammatical information can be made available to the developing system. (VanPattern and Cadierno 1993a: 227). This definition presupposes that information is represented in the conceptual system first (as a consequence of parsing) and then can be derived from it and made relevant for the grammar, which entails at least a partially interactive functional architecture. 12. The authors suggest that the traditional grammar instruction led to the development of Krashenien “learned competence” or Schwartz’ Learned Linguistic Knowledge. Given the nature of the instruction in this study, it is indeed possible that the learners derived encyclopaedic knowledge of a linguistic sort. It is, however, inaccurate to characterise the information derived from input processing instruction as part of linguistic competence. For Schwartz at least, any metalinguistic information is part of Learned Linguistic Knowledge. 13. I prefer to characterise this tendency as a preference for expressing temporal notions with prosodic words, as opposed to morphosyntactic words which are prosodically cliticised. I hypothesise that the preference in fact reflects the greater ease in extracting prosodic words from the speech stream. 14. None of the post-tests were, however, significantly different from one another, the main effect for test being due to the three post-tests being significantly different from the pre-tests (Cadierno 1993: 186). 15. I do wish to take issue with one interpretation of the results which is that they can be said to provide support for the distinction between Linguistic Competence and Learned Linguistic Knowledge à la Schwartz. This is clearly not so since it is obvious from the characterisation of these experiments that all tasks are off-line. They therefore do not address the question of whether grammar learning is modular or interactive. In particular, they must not be interpreted in such a way as to equate the “input” group with bottom-up processing and the “explanation” group with top-down processing. This construal would be completely unmotivated and a gross distortion of the way in which the experiments were conducted. It cannot be excluded that in processing the structured input semantic information was interacting with syntactic processing. See Cheng (1995), VanPatten and Sanz (1995) and VanPatten (1996) for further discussion.
344
INPUT AND EVIDENCE
16. Given the emphasis on UG in SLA research, our choice of linguistic problems may seem odd. They focus on different aspects of lexical organisation, as opposed to phenomena associated with properties like structure-dependency, c-command, or parameters. The original motivation in selecting our problems was empirical rather than theoretical. We wanted to attempt to fill a very large empirical gap; at the time we knew almost nothing in SLA about the acquisition of structural aspects of the lexicon (derivational morphology, argument structure, lexical constraints on syntactic operations, etc). In the time since we began our experiments, other studies have also addressed morphological acquisition, in particular Lardiere’s (1995) work on derivational morphology and Juffs’ (1996) work on argument structure. 17. Our objective was merely to exclude subjects whose knowledge of English was too rudimentary to do the test, or very advanced. The minimum and maximum scores used for subject selection were 14 and 34 correct responses out of a possible 50 sentences, the exact cutoff points being arbitrary. At the beginning of the study, we had selected a much narrow range of possible scores. The range actually adopted was greater because our experimenter’s subjective evaluation of the subjects during preliminary discussions suggested to her that there were mismatches between the learners’ auditory comprehension and verbal abilities, on the one hand, and their scores on our written test, on the other. In other words, she felt that some of the subjects getting higher scores were not “advanced” learners of English, while some of the subjects with lower scores were well beyond “beginner” status. In the end, the group turned out to be fairly homogeneous, at least as far as their performance on the selection test is concerned. The mean score was 23.5 and the median was 24. Standard deviation was calculated: Sx = 3.86. Obviously, with such a low standard deviation, most subjects’ scores fall within 1 s.d. of the group mean. Not all scores however do. Subjects whose scores exceed 1 s.d. from the mean are grouped in the following way: Group A, 6 subjects’ scores are less than 2 s.d. from the group mean, 3 of these are relatively high scores on the pre-test. One subject had a score less than 3 s.d. from the group mean. Group B, 4 subjects’ scores are less than 2 s.d. from the group mean, 2 of these are relatively high. One subject’s score is less than 3 s.d. from the group mean. It was the maximum of 34. Group C, 5 subjects’ scores are less than 2 s.d. from the group mean, 4 of these scores are relatively high. Group D, 5 subjects’ scores are less than 2 s.d. from the group mean, 2 are high. One subject’s score was less than 3 s.d. from the group mean. Group Z, 5 subjects’ scores are less than 2 s.d. from the group mean, 2 of these are relatively high scores. These data show that the scores that are relatively high and relatively low are more or less evenly distributed across the five groups. 18. There are also -ing nouns which appear to have a noun base. Quirk, Svartvik, Greenbaum, and Leech (1991: 1548) claim there are two principal interpretations for such nouns (i) noncount concrete aggregates referring to the material from which an object is made, e.g. tubing, panelling, carpeting, etc., (ii) the “activity connected with”, e.g. cricketing, farming, blackberrying. These cases may turn out to be a subset of the cases we wished our learners to learn, since in most instances conversion can turn the noun stem into a verb, which can then undergo -ing affixation. Hence to panel a room, to carpet the floors, to farm 100 hectares. I do not recognise to tube a construction site, but it does not seem so bad. On the other hand, to cricket and to blackberry, also not part of my idiolect, seem much worse and would always be replaced by the idioms to play cricket (compare *to hockey, *to baseball, *to football, *to basketball) and to pick blackberries/to go blackberry picking. to bowl/bowling, to hurdle/hurdling, to hi-jump/hi-jumping, etc. fit the characterisation given in the text. 19. Whether the resulting noun denotes an action, activity, accomplishment, or state event depends on the semantics of the noun phrase in which it occurs, and this depends in turn in part on the semantics of the base verb plus the theta role it assigns to a complement. writing can be an
THE EVIDENCE FOR NEGATIVE EVIDENCE
345
action as in his writing of the message took hours but it can also be an accomplishment as in his writing of the novel took years. I can find no obvious semantic constraints on -ing formation since there are instances involving the full typology of events: his knowing the facts/his sitting on the chair (a static state), his sitting onto the chair (a dynamic action), his singing (a process), his dying (a momentary event), his writing a novel (both a protracted event and an accomplishment). These examples also illustrates the complexity in the lexicon since there are product or result nouns knowledge, song, and death, but the product or result noun for write is also writing, as in He gave all of his writings to the university, and the state resulting from the process of sitting is rendered by the same form in -ing. 20. I should take the opportunity to clearly distinguish deverbal nouns of the sort we are interested in from the gerundive, e.g. John’s chainsmoking cigarettes drove us all crazy. Chomsky (1972: 16) observes that gerundives are fairly freely formed from subject-predicate type propositions, that the semantics of the gerundive is transparently related to that of the corresponding sentence, that John’s cannot be replaced by determiners (*the chainsmoking cigarettes drove us all crazy), and that adjectives cannot appear in the nominal (*John’s sickening chainsmoking cigarettes drove us all crazy). In addition to the gerundives, which Chomsky derives transformationally, i.e. via a syntactic rather than a lexical process, he discusses derived nominals, such as arrangement, transformation, referral which are created by lexical processes, and a third mixed type, e.g. John’s refusing of the offer, the growing of tomatoes. This third class can co-occur with determiners. Chomsky rejects adjectives in these cases, e.g. *John’s apparent refusing of the offer or the eccentric growing of tomatoes by northern gardeners. I, on the other hand, find these okay, and far more productive than Chomsky was willing to allow. 21. Some of the guessing sessions involved ambiguous stimuli or stimuli which contradict this claim. Since these were not corrected, they could not confuse the subjects, and provided essential evidence as to whether or not a generalisation had been made. 22. It should also be kept in mind that several months passed between running the first experiment and the second, and that the types of learning problems investigated were quite different. Consequently, there is no reason to suppose that having done the first experiment would lead to demonstrably better results on the second. Furthermore, there is no reason to assume that even if there were a learning effect, it wouldn’t be the same for all groups. 23. The means for the Comparison Group are .52, .53 and .56 on the initial guessing, Test 1 and Test 2 sessions respectively. The means for the experimental groups are not a great deal higher than the .56 result. If any of the experimental subjects had been using an arbitrary response pattern, such as giving only -ing or only bare stem responses, their mean scores on all sessions would have been .50. If subjects had chosen an equally arbitrary strategy such as alternating -ing/0-stem responses, their mean scores on the initial session and test sessions would have been around .79 and .50 respectively. An arbitrary response strategy of 0-stem/-ing would have led to mean scores of around .41 and .50 respectively. To preclude the possibility that this was in fact what our experimental groups were doing, we conducted a protocol analysis of all subject responses. No such response patterns were observed. Three subjects in the experimental groups, however, exhibited a marked tendency to respond with a given type of noun on the guessing sections of the feedback session. One subject in Group A had only -ing responses (= 12) in the first guessing section of the experiment, another subject in Group A had only one 0-stem response in response to the same stimuli (11 -ing responses), a third subject (this time from Group D) had 11 -ing responses on the first guessing session and 12 -ing responses on the second guessing session. These three subjects, then, exhibited a behaviour pattern probably attributable to the experimental design: “Carry on with a given response until someone corrects you.” Why the -ing form predominated is not obvious at the moment.
346
INPUT AND EVIDENCE
24. One possible response is that our typology does not, in fact, correspond to a typology of least inferencing (Type A feedback) to most inferencing (Type D). I think this is quite plausible, but have no ideas at the moment how to measure the extent of inferencing involved in the interpretation of feedback types. 25. Note that the highest scores are less than 70%, meaning that subjects still have considerable room for improvement. Consequently, we do not want to argue that our conversion subjects were performing so well on the distinction, independently of the provision of feedback, that it could not have made any difference.
C 9 Feedback in the Autonomous Induction Theory
1.
Introduction
In the last chapter, we saw some limited forms of evidence that metalinguistic instruction, and both direct and indirect forms of feedback and correction can cause restructuring of the learners’ psychogrammar. In particular, we saw that explicitly drawing the learners’ attention to cues to grammatical function and linear order can cause them to reanalyse stimuli in new, more native-like ways. We also saw that feedback and correction can lead to encoding of lexical forms, and form-meaning associations, to learning that certain forms are exceptions to learned generalisations, that learners can learn generalisations themselves and constraints on generalisations on the basis of both direct and indirect forms of feedback. In earlier chapters, I noted some of the sorts of restrictions that the theory of Representational Modularity places on a theory of input. These restrictions are an important part of the constraints on i-learning adopted in the Autonomous Induction Theory. In this chapter, I want to develop in more detail the answer to the question: “How might it work?” To take seriously the empirical studies of native-speaker/learner interaction, discussed in Chapter 8, as well as the other types of evidence on behalf of an explanatory role for feedback and correction, this question must be answered. In this chapter, I will rework this as the question: “Can feedback and correction lead the learner to detect the fact that an error has been made?” I also discuss how they might help the learner to locate errors. I then go on to discuss these matters using the data from the double object study discussed in the preceding chapter. This analysis will lead us in Chapter 10 to a discussion of the problems of interpreting feedback and correction. There we will see that even more restrictions will have to be placed on the theory of feedback and correction, limiting its range of functions in a theory of i-learning but increasing thereby, in my view, the explanatory value of the Autonomous Induction Theory.
348 2.
INPUT AND EVIDENCE
Focused attention and detectable errors
2.1 Focused attention Feedback, correction, and negative evidence are value-laden labels for particular sorts of utterances that a learner must interpret. In ways that I will make precise in Chapter 10, an utterance can count as feedback or correction only if a learner is willing to construe some bit of language as expressing a corrective intention on the part of some speaker. We as researchers may choose to identify certain utterances in specific contexts of communication as feedback but we must not lose sight of the fact that no utterance is intrinsically corrective in nature. This means that we must categorise the potential functions of feedback, and correction in terms of the relative information value of an utterance in a given context for a given learner. We must relativise feedback in terms of the consequences some utterance has on the reorganisation of information in the learner’s grammar.1 In Chapter 1, I distinguished between the negative evidence and modelling functions of correction. The learner can infer from correction that a particular form is not part of the L2 grammar but she also gets some stimulus to be analysed bottom-up in normal fashion by the linguistic processors. From this stimulus, the processors can derive important new structural information. How does this happen? The weakest effect would merely be to draw the learner’s attention away from the content of her own utterance (her communicative intention and/or her conceptual representation), after producing it, to its form. While the exact function of attention in second language acquisition is still unclear (Schmidt 1990, 1993, 1994; Tomlin and Villa 1994; for a broader focus see Jackendoff 1987: Ch. 13; Velmans 1991), I will assume here that the acquisition of formal properties of language from feedback does requires attention. Recall from Chapter 4 that the model of the language faculty adopted here assumes the Hypothesis of Levels, namely that each faculty of mind processes information from the environment via a chain of levels of representations. Working memory can contain multiple sets of representations, which arise during the processing of stimuli when there are “indeterminacies” in the analysis of the input (Jackendoff 1987: 279). Working memory, recall, also contains a selection function.2 This function … at each moment designates one of the sets as most coherent or salient… it is involved constantly throughout processing in resolving low-level nitty-gritty details of linguistic structure (Jackendoff 1987: 279).
Jackendoff (1987) therefore describes attention as more detailed processing at a particular level of analysis. What this means is that at that particular level of
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
349
representation, more richly specified structures are built. If one function of feedback and correction is to shift attention from conceptual processing to form, it would entail a shift in the level of detail of processing from conceptual representations to one of the levels of form. In principle, this shift might permit more detailed processing at that level of representation where encoding is inadequate. However, I have also adopted the Intermediate Level Theory of Awareness, which has as a consequence that the shifts in question would be from the conceptual level to phonological form.3 The theory thus precludes a shift to a more detailed analysis of s-structures or phonetic representations insofar as feedback and correction raise the learner’s conscious awareness of an error. The Intermediate Level Theory of Awareness would lead us to hypothesise that feedback and correction could only draw the learner’s attention to distinctions encoded or encodable in prosodic structures. This would include stress shifts, order of syllables, rhymes and differences in the tones of intonation contours or tonated words. I noted previously that all of the literature on the topic of interaction and focus on form (e.g. Sharwood Smith 1991, 1993) presupposes that all levels of representation relevant for the learning of a grammatical distinction are accessible on the basis of activities which focus on form. This presupposition has not been investigated and merits empirical confirmation. Certainly the theory elaborated here is far more constrained than, e.g. Sharwood Smith’s, and significant differences are predicted in what empirical phenomena we should expect. If attention leads to more detailed processing of an input, at some given level of analysis where ambiguity arises among possible analyses, then correction involving repetitions of a given stimulus may permit just this kind of more detailed processing. Consider in this regard the observation that correction leads to the establishment of form-function pairs. Part of that learning involves the detailed encoding of a phonological representation. We might hypothesise that initial exposure to a word leads to the encoding of acoustic representations, indeed, we should hypothesise that the learner is encoding multiple acoustic representations very close to the actual acoustic properties of the stimuli. This is necessary because of the nature of co-articulation effects. The acoustic properties of skimmia in The skimmia in the garden blooms as early as December are different from the acoustic properties of I bought a healthy skimmia plant four years ago. The encoding of acoustic properties while permitting discrimination and possibly even word recognition on the parsing side of things will not permit the encoding of a prosodic word on the production side of things. For that, the learner needs a phonological encoding of the word. We might therefore hypothesise that correction involving the repetition of a word under conditions likely
350
INPUT AND EVIDENCE
to draw the learner’s attention to its prosodic form leads to more detailed analysis of the phonological properties of the word: fixing strong and weak syllables in a foot structure (for stress encoding), encoding the internal structure of syllables as onsets, nuclei and coda, and the internal structure of segments in terms of a phonetic feature analysis. This kind of detailed structural representation of the word might then permit the articulation of the word since it would serve as an input to motor-articulatory processes. A second benefit of correction in the form of modelling is that repeated exposure to a stimulus can strengthen the parsing procedures needed to analyse it, making comprehension faster and more reliable. Strengthening parsing procedures includes faster activation of specific schemata, and the integration of separate schemata into larger ones as learned representations are recombined in novel ways. To sum up, correction involving modelling of forms could lead to repeated exposure to stimuli under conditions likely to permit the more detailed analysis of the stimuli. This leads to the re-representation of lower-level representations into higher order, more abstract representations. Correction with modelling can also lead to improved processing of a given stimulus. So much for correction. Other forms of negative evidence do not model the language. The question thus arises as to the evidence for a particular grammatical analysis that they could in principle provide. Two questions must be answered by any theory of feedback and correction: (i) What kinds of evidence can feedback without modelling provide? (ii) What processes actually affect the restructuring of the learner’s grammar on the basis of these other forms of feedback? 2.2 Detectable errors Let me deal first with the question: What kinds of evidence can feedback provide? There are various ways in which this shift from attention to meaning to attention to form might be initiated in discourse, ranging from explicit comments on the form of the learner’s speech, e.g. You can’t say THAT in English to the indirect forms of feedback discussed in Chapter 8. These are indirect because they do not explicitly state the information that some utterance is defective, rather the learner must infer this information from the fact that there has been an interruption in the normal flow of information. The on-going debate in the first language acquisition literature about the potential role of indirect negative evidence turns precisely on the ability and necessity of learners making the intended inferences (Gleitman, et al. 1984; Hirsh-Pasek, Treiman and Schneiderman 1984; Demetras, Post, and Snow 1986; Bohannon and Stanowicz 1988;
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
351
Pinker 1989: 9–14). I have discussed in several places in this book the idea that learning is initiated when some difference arises between input to be encoded and what the system can currently analyse. If we refer to unanalysable input as errors or system discrepancies, then we can say that learning begans when the system detects an error. Let us call this the Detectable Error Hypothesis (Wexler and Culicover 1980: 122, Culicover and Wilkins 1984; Wilkins 1993/1994). Standard learnability assumptions are that systems detect discrepancies based on input arising bottom-up. This means that an input at a given level of representation cannot be analysed with extant procedures at the next level up. This is because either the right correspondence rules are not in place mapping, e.g. from a phonetic cue to a prosodic category, or else because the right integration procedures are missing at a single level of analysis. In this theory, we are assuming that information relevant to solving a parse problem might come from a higher level up. The same point can be made: detectable errors arise because the procedures are missing to map a cue or constituent at a higher level up onto the relevant constituent at the next lowest level down, or the appropriate integration procedures at that level are missing. The Detectable Error Hypothesis can now be formulated: (9.1) The Detectable Error Hypothesis: Learning requires detectable errors.
The Detectable Error Hypothesis is a central part of an account of why i-learning starts, and why it stops in the face of infinitely variable stimuli. The basic idea is simple; the learner is, when parsing speech, not merely constructing representations, the system is also constantly checking the computed representations for their adequacy in analysing the novel input. When there is a “good fit” between an analysis and the information stored in the system, everything is fine and the input is identified. When the fit is not good, learning is initiated. The point with respect to feedback can now be rephrased as: Does the provision of feedback necessarily lead to the detection of errors? While the claims inherent in (9.1) seem to be straightforward, the notion of what counts as an error for the learner must be clarified. In particular, we have to differenciate between detectable errors defined on stimuli and detectable errors defined on the learner’s own speech production. (9.2) The Detectable Error Hypothesis (parsing) Learning requires detectable errors which arise when the parsing systems cannot parse an input. In the absence of discrepancies between input to the parsers and the parsing procedures, the learner’s system reaches stasis.
352
INPUT AND EVIDENCE
It is this sense of detectable errors which has led me to say, simplifying somewhat, that learning takes place on the parsing side of performance, when the procedures in place (which in turn reflect the learner’s grammatical competence) cannot analyse a stimulus. Feedback will have no effect on the learner’s grammatical competence if he has already constructed procedures for analysing an input. In this case, feedback will be redundant. The greater proficiency that many learners have in parsing and comprehending speech over producing speech shows that detectable errors in the sense of (9.2) are not based on what the learner can produce himself. For an explanation of control in production, we need a different notion of the Detectable Error Hypothesis, relativized this time to production. (9.3) The Detectable Error Hypothesis (speech production) Gaining control of the grammar in speech production requires detectable errors for these indicate differences between the learner’s knowledge state and/or proficiency and that of others the learner wishes to emulate. In the absence of detectable errors, the system reaches stasis.
The Detectable Error Hypothesis in (9.3) states that the learner requires evidence of detectable errors in her speech production to be induced to re-encode production schemata. I postulate this as a first stab in the direction of explaining why “practice makes perfect.” But notice that it cannot be the case that hearing just any stimulus deviating from some internalised tacit set of norms (in the unconscious “feedback loop” assumed to regulate speech production) could serve as the basis for the detection of errors in production. If that were the case, hearing speakers of another dialect would cause all of us to categorise our own production as erroneous. Flege (1984) and Flege and Munro (1994) have shown that listeners can detect small variations from L1 norms in foreign-accented speech. This ability is what lies behind our capacity to recognise “foreign” and “dialect” accents. Our perceptual and linguistic systems remain extraordinarily sensitive to deviations from the central tendencies of our own community’s system. This sensitivity, however, does not trigger endless readjustment of those norms. This fact is quite mysterious. It has been clearly demonstrated, for example, that bilinguals living in an environment where the L2 is normally used for communication purposes (e.g. as immigrants) can develop intermediate phonetic categories for particular segment types of the L1 (Caramazza, Yeni-Komshian, Zurif, and Carbone 1973; Flege 1987), showing that adjustment within the L1 grammar can continue well after the language is normally said to have been learned. The idea, therefore, that children reach a steady state after which no more restructuring of the L1 system occurs is clearly an idealisation that must be relaxed to account
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
353
for the facts. There is no sensitive period for the acquisition of new phonetic categories in perception. It remains nevertheless true that no wholescale restructuring occurs; the acquisition of a second language or of a second dialect does not inevitably lead to the attrition of the first.4 Recall from earlier chapters that this was one of the criticisms of the hypothesis that parameter-setting has a strict neurological basis. We can maintain our native variety more or less intact during second language acquisition. It follows that we do not necessarily treat the differences between our own speech and that of a host group as errors. It must therefore be true that each speaker has a tacit theory or mental model of the speech community he wishes to belong to and takes the speech patterns of this group as his norm. This mental model can change if the speaker/hearer becomes a member of a new speech community. It is for this reason that the definition in (9.3) includes some specific references to the speaker’s belief systems and attitudes. The L1 speaker believes that the patterns of his L1 idiolect constitute an acceptable version of the native language, acceptable with respect to the tacit norm. Input deviating from this norm will be attended to, processed, and classified as structured variation — as a different dialect, or as a foreign accent. It will not, however, be used to redefine the speaker’s current state of knowledge of the L1. In the case of the L1, the speaker/hearer appears to function with the motto: “I’m okay; you’re okay!” In the case of second language acquisition, however, the speaker believes that differences between her output and the input reflect her own lack of knowledge. Her mental model appears to function with the motto “You’re okay; I’m not so okay!” In this case alone, observed differences could lead to detectable errors likely to lead to the restructuring of the grammar.5 According to the Detectable Error Hypothesis (9.2), learning is initiated when the listener cannot parse a given input at a given level of parsing. Learning ceases when the combined operations of the processing system permit the learner to arrive at an interpretation of a stimulus. If we can judge from the numerous comments about adult migrants and immigrants “failing” to acquire the L2 (and I’m not sure that we can), then it would appear that the adult can cease to detect errors of type (9.2) even when his production is severely flawed. The Detectable Error Hypothesis (9.3) states that learning is initiated when errors are detected by the learner in her own speech production. Learning ceases when the speaker/ hearer’s output is compatible with her reference set or norms, encoded in some mental model of the speech community.6 Let us return to the issue of feedback: Does the mere provision of feedback guarantee that errors, in this technical sense, will be detected? No, for there are many conditions on the provision of feedback which must be met for it to have effects on cognitive systems. Watson
354
INPUT AND EVIDENCE
(1984: 153), for example, pointed out that in experimental studies of the perception of causality, a delay of three seconds in the provision of behaviour-reward contingencies is enough to prohibit learning in infants younger than six months old. He goes on to point out the significance of this for learning theory, which is that infants rarely experience causal structures fitting the appropriate time frame. Studies of mother-child interactions reveal that if this temporal constraint is imposed on learning, a child would perceive that only one quarter of his vocalisations were being responded to. This type of research has important implications for investigations of feedback in general, and SLA in particular. To date, studies of discourse and interaction have assumed that the mere presence of feedback somewhere in a discourse is sufficient to explain learning. But we can see that this is clearly false. The provision of feedback must fall within a given time span to be processed with respect to some linguistic phenomena which constitute errors. As Watson (1984: 155) puts it: “Noting occasional contingencies is not enough.” The exact time frame needed for adults to learn from linguistic feedback must be determined experimentally. Such research will tell us what the minimal attentional constraints are limiting the detection of errors. 2.3 Error location, or the blame assignment problem. Consider now (9.4). (9.4) (conversation is about orthodontists in Toronto) A: ..aber sehr oft sind diese Leute Juden und sie leben in der Nord ..but very often are these people Jews and they live in the north der Stadt of the city ‘but very often these people are Jewish and they live in the north part of the city’ B: im Norden in the+ north ‘in the north’ A: der Norden the+ north B: der Norden im Norden It’s masculine Der Norden (August 7th, 1993)7
The initial correction by native speaker B fits a common pattern, namely the reformulation or recasting of an error with a lexical item or phrase. The learner’s response to the feedback is also commonplace; she repeats it in part. Now this exchange is revealing because the learner does not repeat the feedback verbatim
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
355
but rather uses the nominative determiner, which in German noun paradigms is a clear indicator of gender. She might have contrued the feedback as simply a correction of her pronunciation of the word meaning NORTH. However, the form der Norden is itself ambiguous since it could also fit the paradigm for a genitive or dative feminine noun, e.g. in der Norden. Speaker A either construes this alteration in the form as a kind of request for additional information about the gender of the noun, and provides it in the final utterance, or else as a new error. He apparently wants to hear im Norden der Stadt and not *in der Norden der Stadt. The focus of the feedback thus becomes an explicit discussion of the gender of the noun, in addition to the modelling of the correct pronunciation. This case of feedback thus nicely illustrates the location of a problem and the explicit provision of information deemed by the native speaker to be relevant for the learner to revise her grammar. Many instances of feedback do not. In particular, most of the indirect forms of feedback cited in the literature — failures to understand, repetitions of erroneous utterances and even recastings of errors do not locate the error. It is therefore not obvious that they can or do lead to the detection of errors on the part of the learner. In Chapter 10, I will present several examples of such cases. This failure is what leads Pinker (1989: 12) to question the usefulness of correction and feedback in first language acquisition where explicit forms of feedback and metalinguistic information are virtually absent. The usefulness of the information that a kind of sentence is ungrammatical is highly questionable too. Sentences are generated by large numbers of rules and principles that vary cross-linguistically, not just one. So even a child who is able to make a binary good/bad decision faces a formidable example of what artificial intelligence researchers call the “blame assignment” problem: figuring out which rule to single out for change or abandonment… in practice the problem is even worse because the child may have no way of distinguishing “errors” that are due to syntax from those due to defective word meanings, bad pronunciation, or conversational maladroitness.
Discovering detectable errors is not, in short, limited to finding a problematic form in one’s own speech on the basis of feedback or correction. The blame assignment problem, namely deciding which rule to single out for change or abandonment, has to be dealt with by “localising” errors. In the theory of feedback proposed here, locality is defined in terms of the point of interruption of the on-going discourse. As Carroll (1995) has pointed out, and as we shall see in some detail in Chapter 10, feedback and correction are metalinguistic commentaries on speech which has normal communicative goals. The learner in (9.4) is trying to communicate about the non-linguistic world; the corrector is trying to communicate about the learner’s utterance. Correction and feedback are thus
356
INPUT AND EVIDENCE
always a sort of interruption of the normal flow of information. Within the theory, the fact that feedback is a form of interruption is thus very important. It defines a “mental space” to which the learner’s attention can be drawn and within which additional processing can occur. This is not sufficient, however, because detected errors could arise from a large number of causes. It is essential therefore that a theory of feedback state clearly how feedback helps learners to locate the problem. Minimally, this means that the learner must conduct reanalysis at the right level. Is the problem phonological? Morphosyntactic? Articulatory? Pinker is quite right to stress that arbitrary restructuring of the grammar would make the learner’s task harder, not simpler. In my view, this important function should not be attributed to feedback; it should follow from the organisation of the language faculty and from the way in which attention interacts with the autonomous representation systems. In the example discussed here, the rules and forms the learner might restructure on the basis of feedback are going to be limited to what the learner can attend to and become aware of — namely the rules and forms generating in production the string in der Nord, i.e. these three lexical items and their features as realised in an intonated phonological phrase. The on-going feedback localises the error in the phonological forms of the Preposition + Determiner + Noun string. While this still leaves a number of options open for revision, it greatly reduces the blame assignment problem. There can be no question, therefore, of the feedback being used to revise the learner’s rules of question formation or passive morphology here. Restructuring is thus not random. On the contrary, it is focused by attention, by the point of interruption of the discourse, and by constraints on the levels of representation which can be the object of attention. In addition, since conscious awareness is linked to projected phonological representations, the learner could only repair the phonological properties of in der Nord, in particular, the articulation of the string. It should also be pointed out that learners can develop “theories” of likely problems. In other words, learners will develop mental models of typical errors they make. Gender and case errors (again as realised in phonological forms for the relevant gender- and case-bearing words) and errors in the location of verbs are typical problems for L2 learners of German (Clahsen et al. 1981), and typical problems are more likely to become the focus of corrective efforts. Thus, previous corrections might already have sensitised the learner in (9.4) to her difficulties in selecting the right form of the determiner to use in a given construction. The development of tacit or explicit mental models of one’s own difficulties may further help to locate errors by directing the learner’s attention to particular aspects of her production known to be likely trouble spots. Howev-
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
357
er, Pinker’s problem of locating the relevant level of analysis remains. The learner cannot know simply from the provision of a form or even of a phrase if his pronunciation is flawed, or if it’s his morphology, his syntax or his semantics which is causing the trouble.8 But then, neither can the corrector. To sum up, the claim is that information provided by feedback and represented ultimately in the learner’s conceptual system (either directly via decoding or indirectly through inferencing) can have an affect through focused attention on lexical selection (of phonological forms) for insertion in a phonologically realised s-structure, on the ordering of lexical items in phonologically realised s-structures, or on the phonological and semantic contents of lexical entries. The Intermediate Level Theory of Awareness hypothesises that only prosodic representations can project into awareness. To the extent that feedback is designed to draw the learner’s attention to some error in her speech, it follows that we ought to find effects of feedback only at these levels of representation. 2.4 Categorisation, i-learning, and feedback I have now introduced a model of feedback and explicit negative evidence which restricts its activity to the prosodic level of representation. This still leaves open the possibility that feedback in conjunction with instruction and correction (modelling) can influence the creation of categories in the various representational systems of the processing systems and of the grammar. I have already mentioned the puzzle of why native speakers of English find it so difficult to acquire and control the gender system of languages like French or German in the face of plenty of positive and negative evidence about the regularities and irregularities in the system. Although the puzzle itself is well known, certain rather obvious aspects of the problem are seldom discussed and remain unexplained. The first is that anglophones can readily recognise what they cannot always readily reproduce. When an anglophone who is, say, an intermediate or advanced learner of French, hears, for example, the sentence in (9.5), he can assign a correct interpretation to it. (9.5) Maryse a proposé qu’on fasse un tour de la ville. ‘Maryse has suggested that we do a walkabout of the city’
Assigning a correct interpretation to (9.5) involves minimally recognising the words and activating their particular conceptual representations, integrating these lexical conceptual representations into a morphosyntactic representation of the sentence and deriving some sort of propositional representation, and integrating this conceptual representation of the sentence into an ongoing mental model of
358
INPUT AND EVIDENCE
the discourse. This means that the anglophone must assign a representation to the DetP un tour de la ville and, moreover, represent the semantic contributions of the Det un. This apparently is not a problem for anglophone learners of French. Extensive studies of anglophones in French immersion programs in Canada, learners who achieve advanced levels of knowledge and considerable proficiency in the L2, suggest that their ability to comprehend speech can attain levels where it is indistinguishable from that of native speakers (Swain and Lapkin 1982; Harley and Swain 1984). This ability requires the acquisition of the distinction between definite and indefinite readings of DetPs. What does this prove? It shows that learners have representations of determiners, one of the most important cues to gender attribution. The same research shows, however, that proficiency in the use of the gender system is anything but native-like (Harley 1979). Years after beginning their immersion programs (at approximately age 4–6), anglophones continue to make gender errors in production with even common and frequently heard nouns. Learners who begin learning French as adults report similar difficulties. For example, when composing the example in (9.5), my first reflex was to write une tour de la ville, although I have read and heard the expression dozens of times.9 Why is recalling the forms which mark gender such a problem? Or, to put the matter more perspicaciously, why is it more of a problem to recall that tour de la ville goes with un than it is to remember what tour de la ville means, or how it is pronounced? Or indeed that it is tour de la ville and not tour à une ville? If all that is at stake is the recall of associated material, or memorised information, the fact that recalling the correct determiner is more difficult than recalling the noun itself is quite mysterious. One answer to the question is simply that the French gender system is more than recalling strings of associated items. One can argue instead, as I have done elsewhere (Carroll 1989, 1995) that the acquisition and deployment of the system is quite complex. While we can treat the gender feature as a basic attribute of a noun, just like its conceptual representation and its phonological form, the cues for the correct classification are to be found on specifiers of the noun located elsewhere in the sentence.10 In the words of Kail (1989), the cues are typological and not local. Another way to express the same idea is to say that learning that specifiers are cues for a formal relationship like gender must be derived from the effects of sentence processing and not merely from memorising and analysing the internal forms and functions of lexical items. This means that learning that French has a gender system involves the operations of lexical segmentation, and parsing, in addition to whatever perceptual operations are required to discern formal features of words. These operations also interact with constraints on working memory. We might therefore anticipate that learning a gender feature
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
359
for a given noun would be more complex than learning its meaning or its phonological form. Moreover, the cues for gender in French seem to be largely arbitrary. It is an arbitrary fact about French that there are two forms tour one of which is masculine (meaning only that it co-occurs with words like le, mon, petit or blanc) and refers to a motion or path, while the other is feminine (meaning that it co-occurs with words like la, ma, petite or blanche) and refers to a particular type of building. By arbitrary I mean that the facts could be just the opposite and nothing else in the language would be affected. However, the complexity of the system does not explain of itself why learners have difficulty learning even with feedback and negative evidence. After all, French contains any number of complexities for anglophones, including the placement of clitics, the tense and aspect system, the expression of causation, auxiliary selection and so on. Both immersion students and adult learners of French do make observable progress in these areas, with or without negative evidence, and can reach nativelike proficiency in these areas of the grammar. In addition, despite the arbitrariness of particular gender assignments, there are obvious patterns within the system. Once one represents the suffix -tion as feminine, one tacitly knows the gender of every noun of which this form is the lexical head. See Carroll (1989) for more detail. It has also been claimed many times that gender systems display partial generalisations within sub-groups of the morphologically underived items (see Müller 1990; Surridge 1985, 1986 with respect to French; Köpcke and Zubin 1983, 1984; and Zubin and Köpcke 1981, 1984, 1986 on gender in German). While I do not wish to question that one can find interesting correlations of the sort examined in these papers, their existence makes the problem more mysterious. Why do English speakers not find it easy to “see”, represent and recall the relevant subgeneralisations? A second possibility, raised by Harley (personal communication 1984) is that anglophones do not perceive the acoustic-phonetic differences between the gender markers, namely le/la. It might be the case that anglophones can represent the differences between definite and indefinite determiners with a phonologically underspecified representation of each, perhaps something along the lines of (9.6). (9.6)
Det σ X [+lateral]
Y [+vocalic]
Harley’s claim is consistent with the idea that processing is constrained “from below” by the relative perceivability of a stimulus. Forms cannot mark functions
360
INPUT AND EVIDENCE
within the learner’s system if they cannot “break through” lower-levels of processing. Discrimination studies of immersion students (Taylor-Browne 1984), however suggest that they can in fact make the relevant distinctions. Harley’s suggestion would also be explanatory only if other markers of gender, e.g. mon/ma, ton/ta, petit/petite, were not available to signal gender attribution. Even if learners do not find salient the difference between le and la or even between un and une, which in Canadian varieties both tend towards [˜7], there still are other gender markers which are available, are frequent and potentially salient. Why can’t an accurate system and proficient mastery be rapidly derived on the basis of these cues? I have suggested (Carroll 1989) that anglophones, even young children in immersion programs, exhibit a kind of disassociation between their receptive and production abilities which reflects the fact that their psychogrammars of English and French contain no specification for gender features. These learners, it is hypothesised, perform at the morphosyntactic level a kind of equivalence classification, mapping the relevant categories Noun and Det from English onto the relevant phonological forms in the input, and storing these categories in the interlanguage lexicon. I hypothesised, in conjunction with standard treatments of morphosyntactic categories that the features for the English noun and determiner are at least [+N, −V] but specifically do not include any specification for gender. In contrast, the features of the noun in the target francophone system are [+N, −V, +fem.], or [+N, V] with a default rule filling in a [−fem.] feature specification.11,12 As noted above, the cues for gender attribution are exhibited in the agreeing categories, but the gender attribution involves categorising the noun. In formal terms, this is expressed in terms of features borne by the noun in longterm memory. Anglophones who make an equivalence classification at the morphosyntactic level transfer into the interlanguage grammar, however, the features of the relevant corresponding English lexical category. Only these are available for parsing decisions. Since gender agreement is not marked in English, there are no decision-procedures based on attending to formal variation available to be transferred from the English parser to the developing interlanguage parser. Moreover, there are no formal gender features to be transferred to the interlanguage grammar. Gender agreement is instead treated by the parser as irrelevant variation — as “noise”. To reiterate: expression of gender attribution must be made on the noun itself and this attribution is not made by the parser. At this point one can argue that feedback, instruction, and negative evidence about gender attribution become essential for learning the system. Without them, the learner would never arrive at the correct categorial classification. Let us grant that this is so. What, then, does current evidence, admittedly largely anecdotal, suggest?
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
361
The evidence suggests that instruction and feedback create awareness of the existence of a gender system, that they draw the learner’s attention to the commission of detectable errors, but they do not lead to the development of native-like control of the expression of gender in production. My account of this is that neither instruction nor feedback result in changing the specification of the noun categories in the lexicon. The idea is that if gender were represented in the learner’s lexical entries, it could simply be activated during production and agreement processes would ensure that the spellout of the determiners and other specifiers took place. If, in contrast, a gender specification is not part of the lexical entry of a noun, it can never simply be activated by activating the relevant noun. Since the claim is that instruction and feedback do not achieve dramatic improvements in production proficiency in the case of gender, the conclusion must be that they do not alter the feature specifications for gender in the lexicon.13 Do instruction and feedback have no effect? I cannot exclude, under this analysis that they might not lead to the piecemeal association of phonological forms, e.g. ton tour, mon tour, un tour, le tour, etc. It might be the case that these strings are stored in lexical entries as ready-made expressions to be activated when needed. Notice that this does not entail learning gender attribution if we define that as learning the feature of the noun. Rather, the learner has an emergent gender which comes from having stored all of the correct strings and being able to activate the right one (something which could be done on the basis of the semantic features of the determiner). I suspect that this is precisely what happens as anglophone learners develop greater control over the vocabulary that they have mastered. The difference between people like me and those adult anglophone learners who actually achieve nativelike proficiency in producing gender-marking (and they do exist) would then be that they are better at storing and activating such multiple sets of strings. It follows that the contents of lexical entries would vary considerably among proficient and unproficient users of gender, not in terms of the specification of a gender feature, but in terms of the number and types of these “pre-fab” strings. This is a testable claim. Although gender acquisition is a relatively well-studied phenomenon, my analysis shows that even more empirical investigation is required to attempt to falsify the claim that the gender learning problem exhibited by anglophone learners is commonplace and derives from a dissociation of parsing and production processes which leaves the noun specification transferred from English intact. Investigation should focus on four claims. The first claim has been that even advanced anglophone learners can remain unsure about the correct classification of given nouns in production, although they will readily recognise that a
362
INPUT AND EVIDENCE
string is correct when produced by someone else. If additional studies confirm this, we can take it as evidence of a disassociation between the L2 learners’ abilities on the recognition side and their abilities on the production side.14 The second is that this disassociation is based on a lexical equivalence classification, a form of correspondence mapping. It should be possible to find evidence for or against this.15 A third claim was that the effects of feedback and instruction in leading anglophones to the correct performance of gender marking appear to be variable. Many learners of French, the author included, feel they have not been greatly aided by being told “It’s LE tour de la ville, not LA tour”. We keep making the same mistakes year in, year out. We keep hearing the same corrections year in, year out. On the other hand, I have been told on several occasions that certain anglophone learners do not share this problem.16 Taking this claim at face value, it points to significant differences in the processes involved in producing gender. Some anglophone learners may find learning and producing gender unproblematic, others clearly find it a travail. This also points to significant differences between L1 and L2 acquisition for the literature suggests that gender acquisition is relatively unproblematic for infants. Gender attribution patterns are acquired in between two and three, gender agreement involving determiners and attributive adjectives (specifiers within NPs) is acquired rapidly, and is based on formal properties of the system rather than evolving from semantic or functional properties of the nouns (Karmiloff-Smith 1979; Levy 1983; Müller 1990).17 The fourth claim is that whatever feedback and correction are doing, they are not normally helping learners build associations between determiners and nouns. We saw in Chapter 8 that feedback can function to create associations between recognition representations and lexical conceptual representations. If one presents learners with either an acoustic-phonetic or graphemic form of a word and a translation of that word in the L2, and provides feedback about some aspect of the form or the meaning, then learners will readily learn those specific lexical items. Our experimental studies did not reveal what kind of mental representations learners have stored but minimally it is an association between the stimulus and a concept as expressed through the L1 translation, e.g. blanchissage–bleaching/BLEACHING. Even if we assume that our studies show no more than this minimal level of association, we can nonetheless conclude that feedback facilitates associative learning.18 The question then arises as to why feedback and correction are not functioning to encourage associative learning between determiners and nouns in a stimulus. At this stage of the game, I am reluctant to claim that it never does, so I will go out on a limb and suggest that this is precisely what happens among exceptional learners. Exceptional learners, then, are storing sets of strings of the phonological exponents of determiners and
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
363
the phonological exponents of nouns, perhaps based on exceptional memory capacity. The normal function of the parsers, however, is to identify and classify lexical categories in the speech stream and to store them separately in the lexicon. Feedback appears not to be able to interfere in this process.19
3.
Learning the double object construction
Let us now return to the empirical evidence in Chapter 8. We saw there that feedback and correction helped our Spanish learners learn the phonological and semantic constraints on the double object construction. How might feedback have helped? To answer this question, we must first ask what kind of representations the learners might have constructed on the basis of the training sessions. Recall the nature of the stimuli. (9.7) Training example (on card) Peter wrote a letter to Theresa. Peter wrote Theresa a letter.
Subjects were asked to look at these cards and note any differences between them. There are obviously several. First of all, there are the differences in the sequencing of the complements, secondly there is the presence of the prepositions to or for in the first sentence and their absence in the second. Analysis of the errors made by individual subjects clearly suggests that the first property was more readily noticeable than the second, and was detected independently of it. Many subjects made errors of the following sort. (9.8) a. b.
Stimulus: The students pronounce a new word for their teacher. Response: The students pronounce for their teacher new word. Stimulus: You always say the same thing to everybody. Response: You always say to everybody the same thing.
We can hypothesise that these errors arose because subjects noticed the difference in the order of the complements and simply reproduced a sentence with a “reversed” sequence of phonological phrases following the verb. Feedback and correction could then draw the learner’s attention to the absence of the preposition when the order is reversed. This information, while it leads to a better output with verbs that do alternate, does not tell the learner which verbs these are. Recall that this was the purpose of the experiment — to lead the learners to classify the verbs correctly into subsets — those which alternate and those which do not.
364
INPUT AND EVIDENCE
3.1 The metalinguistic feedback In the theory adopted here, this classification will involve developing preference rules (default hierarchies). A verb will be more likely to alternate when it meets certain conditions. The conditions were provided by the feedback in the case of the explicit metalinguistic form, and were apparently inferable under the other experimental conditions. When experimental subjects in the metalinguistic feedback group made errors on verbs that violated the phonological constraints on the double object alternation, they were told something like the following. (9.9) Student says The students pronounce their teacher a new word The student is then told: You cannot say The students pronounce their teacher a new word. Pronounce does not alternate; this is because it is a long verb that is louder at the end and such verbs don’t alternate. Pronounce is louder at the end than at the beginning.
Now it is important to understand that this account of the stress shift is highly inaccurate. First of all, while there is some correlation between the presence of stress and articulatory force, perceived loudness is not what stress marking consists of. Rather it consists of a variety of cues including pitch, vowel quality, and vowel duration (Clark and Yallop 1990: 287). So the information provided in the feedback is not strictly speaking true, pronounce is not necessarily (or not only) louder at the end than at the beginning and neither are the other verbs which do not alternate. Nevertheless, loudness is an aspect of sound stimuli (both linguistic and non-linguistic) which people have awareness of, and it does correspond phenomenologically with the degree of perceived loudness of syllables (Trager and Smith 1951). Our subjects were apparently readily able to put this labelling into correspondence with the actual stress properties of the stimuli. Secondly, the terms “end” meaning the right end of the word and “beginning” meaning the left end are extremely vague. Taken literally, it could refer to any feature or unit located before the cessation of phonation or just after the beginning of phonation. What is relevant for learning the constraint on the double object construction is not the “ends” of words, but rather the location of stress on a given syllable. The feedback was successful in directing the learners’ attention to a phonological property of the relevant verb pairs: stress on the first syllable vs. stress on the second. In the theory elaborated here, I assume that this means that the learners are capable of putting a conceptual representation in correspondence with phonological
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
365
properties of the verbs. We might propose therefore rules of the sort in (9.10) as a first model of what takes place. (9.10) If a [STIMULUS VERB IS LOUD AT THE BEGINNING] → the phonological representation corresponding to the [STIMULUS VERB] bears stress. If the phonological representation bears stress → the first syllable of a binary branching phonological foot must be strong.
The first rule is an interface rule and involves on the left a conceptual representation of the actual feedback given. The concept [LOUD] is put into correspondence at the level of phonological representations with the phonological construct stress. The second rule is internal to the phonological representation; if the word has one syllable, that syllable will, of course, bear stress but if the word has two syllables, then the first syllable will bear stress as a result of that syllable being marked strong. On the basis of these rules, a learner could hear the word donate, which consists of two feet with primary stress on the final syllable, and decide by inference that it does not meet the criterion. Making this inference presupposes that correspondences work “in the other direction”, i.e. from a phonological representation of donate to a conceptual representation. The theory in fact permits this. The second aspect of the learning problem involved learning the semantic constraint on the alternation. Verbs of motion which select a PP complement headed by to alternate in the double object construction only when the object of the preposition to expresses a recipient and not merely a goal. Recall that (9.11) is not possible. (9.11) *John sent the package to the border.
The alternation also appears with verbs that select a PP complement headed by for. The sentence in (9.12) is unacceptable. (9.12) *Close me the windows.
The feedback the learners heard for these two distinct cases was combined and followed the model in (9.13). (9.13) Student says *She sent that address the letters. Student hears: You cannot say She sent that address the letters because with alternating verbs there is always an exchange of possession. Here the letters go to the address but they do not belong to the address.
366
INPUT AND EVIDENCE
The critical information here is exchange of possession. This was how we chose to render the “possessor effect” illustrated above (Pinker 1989: 48). Once again, it must be stated that this is hardly an easy concept to grasp, since it involves a metaphorical extension of the motion of two items along a path (Jackendoff 1983, 1990b). Nonetheless, it presented no difficulties for our learners. I hypothesise that they constructed rules something like (9.14). (9.14) If a [STIMULUS VERB IS TO ALTERNATE] → then it must express [EXCHANGE OF POSSESSION]
In this case the relevant level of representation involves only the conceptual structure. The learner can compare her representation of the meaning of the stimuli with rule (9.14) and base her categorisation judgement on that. 3.2 The other forms of negative evidence The second form of feedback that we used in our experiments with the Spanish learners of English is something we have called explicit utterance rejection. This refers to any statement which says that the utterance the learner has produced is wrong. It could be instantiated in a variety of ways, by saying “That’s wrong”, or “You can’t say that”, or “That is not good English”, or by putting red crosses on an answer sheet, etc. Explicit utterance rejection is perhaps the most common form of negative evidence. It tells the learner that some string is not a part of the L2. Since it provides no additional information, it cannot be used as the basis for a direct inference needed to restructure the learner’s linguistic knowledge. At best it can force the learner to re-pronounce (out loud or sub-vocally) his own production looking for detectable errors. Discussion of the nature of the inferencing involved will be deferred until Chapter 10 where I will also talk about modelling again and the fourth form of negative evidence (the query) we used in more detail.
4.
Summary
In this chapter, I have attempted to delineate a function for feedback and correction in terms of directing the learner’s attention to detectable errors. In the Autonomous Induction Theory, attending to some aspect of an utterance involves developing more detailed representations of its structure. This is obviously the first step in restructuring the psychogrammar. I noted, however, that the provision of feedback does not alone resolve the blame assignment problem —
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
367
finding the place in the grammar where it fails to accurately represent the phenomenon in question. I have argued here that blame can only be assigned, on the basis of feedback given, at points in the language faculty where the conceptual representations make contact with the others. In this theory, that means at the level of phonological representations since I have adopted the Intermediate-Level Theory of Awareness of Jackendoff (1987) and am assuming that the deployment of feedback and correction presupposes awareness on the part of the learner. I then tried to show, on the basis of an analysis of gender learning, that evidence can be found that feedback does not effect basic features of representations — in this instance the basic features of the noun category — which can be readily explained by the model. Finally, I presented a detailed analysis of the provision of feedback given in the double object experiment of Carroll and Swain (1993), an instance where feedback was shown to work. In ending this chapter I would like to return to some of the ideas discussed at the beginning of the book. It should now be apparent that the evidence for learning the phenomena discussed does not come via the processing of the stimuli alone, but rather through the elaboration of complex input representations. The critical input in some of the cases discussed comes from conceptual structures. The interpretation of the feedback requires that the learner do two things — analyse what is not possible and then attempt to figure out what is. Figuring out what is possible requires making use of the categories, rules and constraints of the relevant autonomous representational system. In the first instance we are dealing with negative evidence and in the second, with positive evidence. It could be argued that the information that the learners make use of is “analysed knowledge” in the sense of Bialystok but obviously the constraints in the system come precisely from the constraints on the interaction of the “analysed” and the “unanalysed” representations. Not all types of information represented in one representational format can be “redescribed” in other representational formats. There is clearly loss of information when representational systems interact. The notion of control has been important too since the theory makes the claim that explicit feedback will only be effective if the learner can be aware of some feature of language. Finally, I have assumed throughout that the behaviour of our subjects reflects reorganisation of their psychogrammars, but I have not actually demonstrated this. The basic data used reflects only a single sort of performance. To make a convincing case it would be desirable to have other types of performance data from the same subjects, all of it converging. I have also used group data, which, as is well known, can disguise important differences among the subjects, some of whom may have acquired the relevant distinctions while others did not. The demonstration must await an analysis of
368
INPUT AND EVIDENCE
the individual results, just one of many future studies which seem necessary to me as a consequence of elaborating this theory.
Notes 1. Although feedback and correction undoubtedly play a role in the reorganisation of conceptual representations (see Marshall 1965, and Barringer and Gholson 1979, for literature reviews on the subject of feedback and its effects on concept formation and concept identification; see Kulhavy 1977, and Lysakowski and Walberg 1982, for literature reviews on the subject of feedback and its effects on memory for concepts), in what follows I will limit myself to a discussion of feedback and grammatical representations encoded in autonomous systems. In the next chapter, I will address the issue of conceptual representations of grammatical concepts and how they might be mapped onto autonomous representations. 2. Jackendoff (1989: 279) argues that there is a phenomenological consequence of the operation of the selection function, namely that we are aware of only one interpretation for an ambiguous input. “One notices ambiguity only through the appearance of successive interpretations over time.” He then goes on to reject the usual analysis of this consequence as being that there are limitations on attention or that it is an executive organiser in the mind. See Jackendoff (1989: Sections 1.1, 2.2 and 4.4) for more discussion. 3. The relationship between attention and awareness is characterised in the following way. The distinctions of form present in each modality of awareness are caused by/supported by/projected from a structure of intermediate level for that modality that is part of the matched set of short-term memory representations designated by the selection function, and enriched by attentional processing. Specifically, linguistic awareness is caused by/supported by/projected from phonological structure (Jackendoff 1989: 298). 4. Another way to phrase the matter is to require that our theory of second language acquisition clearly differentiate cases where the L1 attrites from cases where it does not. 5. The situation is more complex in that constant exposure to stimuli of a certain kind, say a spouse with a foreign accent, might lead a native speaker to adjust in the direction of the phonetic categories of that accent. Presumably, this does not happen to native speakers exposed to the foreign accents of other L2 learners because the contact is too casual, too infrequent, and too unsustained. Note too that the example in the text is not meant to cover instances of dialect shift which occur when, e.g. Americans live for many years in London. Adjustments in the direction of somewhere in the “mid Atlantic” can be covered by a theory of SLA quite nicely. 6. This is a weaker requirement than saying that the speaker’s speech production is identical to the input. 7. This example is taken from a corpus of spontaneously provided instances of correction and feedback that I have been collecting since 1992. 8. Sharwood Smith (1991) discusses this point in the context of the salience of input, the detection of anomaly, and “enhancing the input” for SLA in the classroom. 9. I should perhaps add that I speak and write French fluently, am extremely comfortable using the language in a wide variety of situations, am occasionally mistaken for a native-speaker, and
FEEDBACK IN THE AUTONOMOUS INDUCTION THEORY
369
otherwise can demonstrate considerable knowledge of the language. My own view is that these personal attributes are irrelevant to understanding the issue which is the relative difficulty of learning gender attribution in the face of large amounts of stimuli and correction, i.e. why gender should be harder to learn than word order. 10. It has occasionally been claimed that gender features are determinable on the basis of properties of the nouns themselves, e.g. low-level perceptual properties (Sokolik and Smith 1992), phonological properties of nouns located at their “ends” (Tucker, et al. 1977), or semantic properties (Grévisse 1980; Surridge 1985, 1986). However, none of these claims can be substantiated for more than a small proportion of the nouns of the language. To the extent that one can state generalisations, these are incomplete and have many exceptions. See Carroll (1989) and Morin (1982) for further discussion. 11. In Carroll (1989), I assumed that masculine was the unmarked specification for gender. The arguments for this are that conjunctions of masculine and feminine words agree in the masculine, and nonce forms take the masculine. An argument against that analysis is the “feminisation” of ambiguous forms beginning with vowels in Canadian French (Barbeau, Ducharme and Valois 1981). Markedness can be rendered in terms of underspecification theory for morphosyntactic features (Lumsden 1992). By this approach, the specification for masculine gender would appear in lexical entries as [ ], the specification for feminine would be [+feminine], and the grammar would include a spell out rule as in (i). (i)
[ ] → [−feminine]
12. Under this analysis, French Canadian speakers would, in the face of ambiguous stimuli, be encoding vowel initial nouns for a gender feature, rather than supplying gender by default rule. Note that nothing in the explanation of learner behaviour turns on these differences in the expression of the grammatical distinctions; one can be more modern or more traditional as one wishes. I would like to state, however, that one essential motivation for underspecification theory is the psycholinguistically dubious assumption that redundant specification of information is “bad” and a stripped down lexicon is “good”. I know of no independent (theory external) evidence for this assumption and the claim that learners would complexify their grammars in the face of ambiguous input looks suspicious. 13. Expressed in these terms, it becomes clear that an alternative analysis is possible, namely that the gender specification of nouns can be affected but is never visible because correction and feedback do not affect the processes which ensure that specifiers do not exhibit features clashing with those of the nouns they agree with. Thus, a [+fem] adjective could co-occur with a [ ] noun within the same NP. It isn’t obvious to me at the moment how to distinguish empirically between these two analyses. I can only point out that correction and feedback do permit learners to correct erroneous strings. On being told LE tour de la ville, I can repeat the whole string correctly, suggesting that feedback allows me to activate and select the correct phonological (agreeing) form of the specifier. 14. This cannot be attributed to problems that anglophones have in utilising distinct cues for gender. In Carroll (1999a), I showed that rank beginners can quickly learn not only to correctly classify stimuli according to gender class but also to extract relevant cues and generalise them to novel stimuli. In other words, when stimuli present consistent and valid cues, the patterns are readily discernable. 15. One bit of anecdotal evidence I brought to bear on behalf of my analysis was that young children (4–5 years old) beginning their immersion programs have been observed to spontaneously correct adults attempting to prime them with the correct gender cues. Thus, Keren Rice (personal communication) has reported that her children, when told, e.g. that a table is /yntabl/,
370
INPUT AND EVIDENCE would respond with remarks like “No [tejbl] is [tabl].” This kind of metalinguistic commentary indicates that the children have grasped the determiner function of the forms un/une/des and that they are not prepared to map the concept TABLE onto a phonological form including the determiner. Rather, they are mapping an N onto an N.
16. In fact, what I have been told is that particular individuals “never” make gender errors. If this is an accurate observation, these individuals are simply irrelevant to a consideration of whether feedback can assist those large numbers of people who do. Gender marking is, however, so arbitrary that I suspect that individuals who have this impression about their own production are (i) simply not good observers of their own production, or (ii) are simply not using French on a daily basis or in a wide variety of communicative situations, or (iii) are truly exceptional language learners. While systematic comparisons of exceptional and average learners should be done, I do not feel obliged to construct a general model of second language learning around the abilities of the truly exceptional learner. 17. It is important to recognise, however, that particular aspects of the agreement system can show real developmental stages. Fink (1985) has noted that agreement of predicative adjectives across copular verbs, as in La tour est grise ‘The tower is grey’, is mastered only after several years. While the cues for this agreement pattern are reliable, common, and often occur in a position where saliency is guaranteed (e.g., the end of the utterance), they constitute a tiny minority of the adjectives of French, the vast majority of which do not alternate phonologically in this position, e.g. Le plancher est sale/La porte est sale. 18. This result is consistent with studies of learning outside of the language domain. See Carroll et al. (1991) for discussion. 19. In earlier work I stated that association was not effective, and was rightly taken to task by Dan Slobin, who insisted on the robustness of the research showing associations between sound forms and conceptual representations. I felt at the time that the objection, although correct, was missing the point that associations could take place between nouns and specifiers but typically don’t outside of instances we call idioms and treat as lexicalised strings. The problem is actually more mysterious when one considers that associations must occur between heads and their complements in order for subcategorisation to be learned. Why then don’t we all readily memorise mon-livre, ma-voisine, or le-tour-de-la-ville as complex “compound” words which can be inserted as is unproblematically into a string?
C 10 The interpretation of verbal feedback
1.
Introduction
In this final chapter, I intend to elaborate further how feedback and correction can become forms of evidence for SLA. I will develop and extend ideas appearing in Carroll (1995), which was, to my knowledge, the first attempt to point out some of the specific problems underlying the interpretation of direct (or explicit) and/or indirect (or implicit) verbal feedback of the sorts comprising negative evidence in SLA. Although essential to any theory of language acquisition which relies on the provision of feedback and correction to solve either the representational problem or the developmental problem of language acquisition, none of the literature proposing that feedback and correction can “fix” things, or that interaction will guide the learner to the right grammatical generalisations has attempted to show how learners interpret what they hear.1 I have adopted so far various assumptions to make the link between intake and meaning. The Uniform Parsers Hypothesis is essential to explaining various forms of transfer, and to making the claim that speech processing will be based on L1 parsing procedures until new ones are created in the face of parse failures. That parse failures are essential to i-learning results from the Detectable Errors Hypothesis. It is essential to an account of why learning begins and why it ends. I have adopted the Representational Autonomy Hypothesis which states that speech will be represented in autonomous representational systems during parsing and in LTM. The Representational Autonomy Hypothesis is essential to explaining grammatical universals. Induction may create new categories, themselves complex representations, but it cannot create new representational primitives. They are not learned. They are, rather, the basic symbols of the various “mental languages” used by both parsers and the LAD. These various assumptions have been embedded within a theory of language with autonomous representational systems; this has meant the various representational systems are “modular.” In particular, I have claimed, on the basis of an Interface Hypothesis that i-learning processes
372
INPUT AND EVIDENCE
which deploy conceptual information, cannot operate directly on phonetic representations but must operate on s-structures and phonological representations, or else on phonological representations of words stored in the mental lexicon. I have also adopted the Intermediate Level theory of Awareness which states that the aspects of language that we can be aware of are represented in phonological representations. Since the deployment of feedback and correction is tied to awareness, this means that feedback and correction can only affect information stored in phonological representations. These are perhaps not accessible in parsing except for lexical information — in the form of syllables, feet and prosodic words. This leaves us with one element of language processing to deal with — How are conceptual structures actually put to use in inferencing? In what follows, I will adopt a version of Relevance Theory (Sperber and Wilson 1986/ 1995). I will argue that correction is a type of speech act, and that the deployment of correction and feedback depend crucially on the recognition of a corrective intention on the part of the interlocutor, who I will refer to as the corrector. I will attempt to show that feedback and correction must be Irrelevant to the ongoing discourse in order for the corrective intention of the corrector to be recognised.2 I will make the case that the ability to recognise the corrector’s corrective intention requires complex mental models of discourse interaction and communicative behaviour, not to mention abilities to parse the language which go well beyond the capabilities of beginning L2 learners. I will conclude that the very complexity of the parsing and inferencing leading to an interpretation raises serious difficulties for the proposal that the provision of feedback (either direct or indirect) could constitute a central solution to SLA. Rather, feedback and correction will only play a role in SLA to the extent that they are embedded within something like the Autonomous Induction Theory. I will also demonstrate that feedback and correction are Irrelevant to SLA to the extent that they lead the learner to make inferences about information already encoded in his system. In other words, feedback and correction matter for SLA only to the extent that they cause restructuring in the learner’s grammar. If the learner already knows what the corrector is pointing out, no internal changes will take place. This further reduces the role for feedback and correction in explaining acquisition. On the one hand, beginners do not know enough of the L2 to interpret feedback and correction offered in the L2, on the other hand, advanced learners are very likely to know the information the corrector is attempting to transmit. This leaves us with a period in-between where feedback and correction can, in principle, play a role in explaining grammatical restructuring.
THE INTERPRETATION OF VERBAL FEEDBACK
2.
373
How could feedback and correction initiate restructuring in the learner’s grammar?
2.1 Feedback and correction are types of speech acts In Chapter 1 I elaborated a typology of input forms, distinguishing between positive and negative evidence, and positive and negative feedback. In the course of that discussion, I distinguished between utterances which model some aspect of the language, and utterances which inform the learner that something is or is not allowed in the L2. Feedback and correction can contain models of the correct way of saying something, but this is not the essential part of their function. That function, the negative evidence function, relies crucially on the nature of discourse in interaction, and on the learner’s ability to relate what is said to what is intended. When the corrector provides a direct type of feedback, e.g. That’s wrong, or We don’t say it that way, she is expressing her attitudes about the form of something that has been said in a given context to express some specific idea. Feedback is therefore quintessentially metalinguistic in nature (Birdsong 1989); in order for the learner to understand the corrector to be correcting, he must construe the corrector’s utterance to be a metalinguistic commentary, or construe the corrector’s behaviour as pointing to a metalinguistic commentary. That means that the learner must understand the corrector to be talking about the constituents and structures of language and not about the extra-linguistic world.3 The utterance That’s wrong, for example, is multiply ambiguous. The deictic pronoun could refer to an infinite number of objects in the world. To count as feedback and correction of the sort pertaining to grammar learning, the learner must eliminate all of those references which are irrelevant to the form-meaning pairs of his own talk. That cannot refer to the truth value of the utterance (e.g. NOT [JOHN BUY A CAR]), or to the reference of one of the determiner phrases of the sentences (e.g. MARY vs. JOHN — it was Mary who bought a car and not John), or to one of the predications made (e.g., John bought a car and did not spend his money on a boathouse). It follows that there is nothing intrinsic to such utterances which make them forms of verbal feedback and correction. They acquire this function in particular discourse contexts which encourage the learner to make a metalinguistic construal. It also follows that there is nothing intrinsic to the corrector’s utterances which make them relevant to language acquisition. This leads to the First Principle of Correction Interpretation. (10.1) The First Principle of Correction Interpretation In order to become a form of feedback and correction, the learner must construe an utterance as indicating a corrective intention on the part of the corrector.
374
INPUT AND EVIDENCE
In other words, the learner will understand the corrector as attempting to do something with his utterance. This clearly makes feedback and correction a type of speech act. It is a basic assumption of the analysis of speech in discourse (Bach and Harnish 1982; Grice 1975; Searle 1969, 1979, 1983; Sperber and Wilson 1989/ 1995, among others) that discourse is conventional and arises from communicative intentions on the part of the participants. In the unmarked case, interlocutors agree to move a conversation towards some mutually-defined goal. They cooperate and contribute by making each contribution relevant to the conversation. If correction is actually to be informative, it must be true that the learner can in fact make the right construal. The basic thesis of this chapter is that direct and indirect forms of verbal feedback are largely Irrelevant to the ongoing communicative event in which they may occur. By the use of the capital letter “I” here, I mean that they violate Grice’s (1975) Relevance Principle: (10.2) Relevance Principle The speaker makes her contribution relevant to the purpose of the talk at that point in the exchange.
It is because the exchange is Irrelevant to the ongoing talk that the hearer searches for another interpretation and a metalinguistic interpretation is one possibility. Sperber and Wilson (1986/1995) have reworked Grice’s Relevance Principle and developed a theory of Relevance as a form of inferential communication which is designed to explain how hearers recognise the interpretation of an utterance intended by the speaker. Their theory is explicitly cognitive in orientation. They provide the Cognitive Principle of Relevance shown in (10.3) (10.3) The Cognitive Principle of Relevance Human cognition tends to be geared to the maximisation of Relevance. (Sperber and Wilson 1995: 260–78)
When a hearer attempts to interpret an utterance which has been addressed to her, she is entitled to expect that this utterance will be relevant for her at that particular moment. There is a second Communicative Principle of Relevance which states that every act of ostensive communication creates a presumption of Relevance. (10.4) The Communicative Principle of Relevance Every act of overt communication communicates a presumption of its own optimal Relevance. (Sperber and Wilson 1995: 266–78)
Note that (10.4) is formulated in terms of optimality rather than in terms of maximal amounts of Relevance. This is because Relevance is constrained by the
THE INTERPRETATION OF VERBAL FEEDBACK
375
cognitive effect of communication and the processing effort required to parse an utterance and interpret it. The processing effort made by the hearer has to be low enough for her to want to make it (thus setting a lower limit on the expected degree of Relevance for the hearer), while the cognitive effect has to be great enough for the communication to be worth the hearer’s attention. Sperber and Wilson, therefore, argue for a relative notion of Relevance, which they call Optimal Relevance. (10.5) Optimal Relevance An utterance, on a given interpretation, is optimally Relevant iff: a. it is Relevant enough for it to be worth the addressee’s effort to process it; b. it is the most Relevant one compatible with the communicator’s abilities and preferences. (Sperber and Wilson 1986/1995: 260–78)
Sperber and Wilson note that there are many circumstances where the speaker might be unwilling or unable to produce the most Relevant utterance for the ongoing communicative purpose, or be able to formulate it in the most economical way from the hearer’s perspective. In the case in hand, namely metalinguistic feedback, the speaker can assume with some degree of probability that the hearer will be interested in formulating her utterances correctly in accordance with the rules of the language and the conventions of the sociolinguistic community in which speaker and hearer find themselves. In addition, the corrector is likely to assume that locating the error that a learner makes will be easier if the error is identified shortly after it has been made (rather than, say, a half an hour later). But to identify the error forces the corrector to temporarily abandon the current goal of communication and shift to a metalinguistic intent. This leads us to a certain tension: Because it is Irrelevant to the ongoing purpose of the talk at that point (which will normally be about the entity or event in the world), metalinguistic feedback and correction represent a rupture in the discourse. This rupture can only be resolved if the learner is capable of attributing some alternate purpose to it. She will search for an alternate interpretation because of the Communicative Principle of Relevance. The resolution of the tension is itself a complex matter with no guarantee that the correct inference will be drawn. This is because the interpretation of the feedback often requires that the learner apply a first-order interpretation to the corrector’s utterance — and then reject that interpretation. He must then assign a second-order interpretation to the corrector’s utterance, in so doing attributing a particular informative intention to the corrector (i.e. that he is correcting).
376
INPUT AND EVIDENCE
2.2 To count as feedback an utterance must be parsed and interpreted Pinker (1989: 10–14) observes that for feedback and correction to play any role in language learning at least three things must be true of them: they must be available in the learner’s environment; they must be usable; and they must be used. I will limit myself to some observations of what makes feedback and correction usable. I take the usability of feedback and correction to mean that they must be interpretable. In other words, the learner must be able to parse the stimulus (both phonologically and syntactically) and assign it an interpretation. It is not necessarily true that the learner must already know a lot of a given language to interpret direct or indirect feedback. In other words, the learner’s parse of the feedback-containing stimulus might be incomplete or inaccurate. It is true, however, that the more limited the learner’s knowledge of a given language, the more limited the information that can be conveyed to him in that language. The less information that can be conveyed directly in the L2, the more feedback and correction must be interpreted through inductive inferencing based on a perceptually-based interpretation of the environment, through recall of previously stored information, or (in the case of L2 learners) through communication in the L1. My point is that to the extent that the learner cannot decode and interpret the stimulus itself, then it becomes difficult to claim that it is the feedback which is causally related to changes in the learner’s grammar. It follows that feedback and correction provided in a given L2 will play only a limited role during the earliest stages of acquisition when the learner’s ability to parse and interpret utterances in that language will be the most limited. If the learner cannot parse and comprehend simple L2 sentences, the corrector will not be able to convey information about the L2 grammar expressed in complex L2 sentences. If the learner has difficulty “hearing” nouns referring to objects present in the discourse context, he will likely have extraordinary difficulty locating the referents to elements of his own speech.4 This constraint — the Second Principle of Correction Interpretation — is expressed in (10.6). (10.6) The Second Principle of Correction Interpretation To convey metalinguistic information about the L2, the utterances expressing the corrector’s corrective intention must be parsable.
A corollary is given in (10.7). (10.7) The Third Principle of Correction Interpretation
THE INTERPRETATION OF VERBAL FEEDBACK
377
To convey metalinguistic information about the L2, the utterances expressing the corrector’s corrective intention must be interpreted in such a way that they represent information at the right level of analysis.
These two principles will exclude feedback and correction as a central mechanism in certain learning situations: in the earliest stages of L1 acquisition where there can be no recourse to a “shared” medium of communication between corrector and learner (since the infant is simultaneously learning both L1 forms and semantic/pragmatic systems expressed via the L1), and in L2 acquisition contexts where recourse to the L1 is severely limited, e.g. pidgin contact situations. In other L2 acquisition contexts, we can assume that the corrector and the learner will make extensive use of the learner’s L1 to convey feedback and correction. Speaking anecdotally, my own experiences as a language learner in both tutored and untutored contexts suggests that this hypothesis is on the right track. 2.3 Irrelevance of linguistic feedback (10.8) Scenario 1: a German grocery store. An American (male) approaches a German (male) who is carrying several bottles of wine. a. American: Ah! Wo haben Sie das Wein gefunden? where have you the- wine found ‘Where have you found the wine?’ b. German: DEN! THE- c. American: … Danke, danke. Thank you, thank you End of interaction.5
What makes this feedback interpretable? Normally, when one person asks another person a question, as in (10.8) above, the asker has good reason to assume (because of the Cooperative Principle) that the person addressed will indeed cooperate by providing an answer if he can. Clearly the American in (10.8), wanting some wine in the grocery store, could expect that his interlocutor, who is holding bottles of wine in his arms, would know where to find the wine section. If he knows this, by Grice’s (1975) Cooperative Principle, he ought to reply by giving the location. But he doesn’t do this. At the same time, he does not flagrantly violate the Cooperative Principle; he doesn’t look away, refuse to say anything, or shout “Yankee, go home!” He gives a verbal response, which the learner acknowledges by thanking him. One would only thank an interlocutor for information that one thought was helpful. But it is perfectly clear that the
378
INPUT AND EVIDENCE
Respondent’s response is not a response to the question asked. It is a metalinguistic comment which is clearly Irrelevant to the objectives of the ongoing conversation. The construal of verbal feedback must therefore be based on something like (10.9): (10.9) How to construe feedback from an utterance a. The corrector does not violate the Cooperative Principle, and is not perceived by the learner to have violated it. b. The corrector’s contribution is Irrelevant to the on-going discourse. c. The learner assumes that the corrector has a corrective intention, i.e. that he wants to say something about the form of the learner’s previous utterance. d. The attribution of a corrective intention to the speaker is the optimally relevant interpretation of the utterance for the learner.
In interpreting the response to his question, the learner in (10.8) has two choices: he can assume that the interlocutor is not being cooperative and draw inferences about why that might be, or he can assume that the interlocutor is indeed being cooperative. Since the German native speaker’s response is not literally a response to the question asked, this fact and the assumption of cooperation oblige the learner to search for an alternative interpretation. Given a set of assumptions held by the learner about his own lack of proficiency and/or competence in the L2 (coupled with other assumptions, e.g. whether or not he looks like a German), one alternative analysis is that the response is not a response to the content of the question but to its form. Given this inference, the learner can then draw others about the source of the error, here, in the form of the determiner chosen to refer to wine. The invocation of Relevance Theory allows us to construct a plausible theory of the interpretation of correction and feedback based on the perceived Irrelevance of the corrector’s utterance to the discourse. I want to make a stronger claim: not only is the corrector’s utterance Irrelevant to the ongoing discourse, it must be Irrelevant in order to be construed as metalinguistic commentary. Consider in this regard a second example where an interpretive problem arises precisely because the learner construes the corrector’s contribution to the discourse as Relevant. In each example, the learner is an adult, learning the L2 largely in an untutored fashion, in surroundings where the L2 is the medium of common communication. Feedback comes from native speakers of the learner’s L2.
THE INTERPRETATION OF VERBAL FEEDBACK
379
(10.10) Scenario 2: (A = Native Speaker, B = Learner) a. A: Ich überlege seit drei Stunde was ich anziehen werde. I think-over for 3 hours what I put-on will ‘I’ve been thinking for 3 hours about what I will put on’ b. B: … ich verstehe nicht ‘I don’t understand’ c. A: Was verstehst du nicht? what understand you not? ‘What don’t you understand?’ d. B: ummmmmmm ich… der Substantiv um I… the noun e. A: Es gibt kein Substantiv there is no noun f. B: well then I don’t understand the verb g. A: réfléchis think h. B: (… is thinking hard) ummmmmmm… ich… ich… (unable to formulate a statement in German switches to French too) je ne comprends pas ‘I don’t understand’ i. A: REFLECHIS! j. B: (thinking harder) I can’t guess, what is it? k. A: REFLECHIS! l. B: oh! ça veut dire “réfléchir” (bursts out laughing) ‘oh that means “to think” ’ J’ai pensé que tu aies voulu que je devine le sens. ‘I thought that you wanted me to guess the meaning’
Certain details are necessary to understand what is going on here in order to understand why the corrector corrects the way he does and why ultimately the intended inferences are drawn. First of all, it is important to note that Speakers A and B use both French and English on a regular basis to communicate. Secondly, both switch in and out of these two languages, so there would be nothing unusual about A’s switch from German to French although B had previously been using English. Speaker B has said explicitly that he does not understand the verb in Speaker A’s original utterance. Speaker A chooses to provide him with a translation equivalent, in other words his utterance could be construed as an abbreviated version of the proposition THE VERB MEANS “THINK ABOUT.” But Speaker A is even more helpful. He provides the translation in the first person singular corresponding to his original utterance, in a form which just happens to be homophonous with the imperative second person
380
INPUT AND EVIDENCE
singular. This means that the form of his response is ambiguous between several interpretations. This discourse thus nicely illustrates the potential ambiguity of feedback, ambiguity which arises whenever the hearer can assign a literal interpretation to the utterance. Speaker B chooses to interpret the utterance réfléchis as a command, i.e. as a non-metalinguistic instruction, even though this involves him making the assumption that Speaker A is being less than direct. I would like to suggest that the impetus to interpret stimuli in a non-metalinguistic way is so strong that it could lead Speaker B to assume that Speaker A is “bending” the Cooperative Principle and wants to play a little game. One thus moves to a metalinguistic construal only when none other is possible. Since a non-metalinguistic construal is available here the learner simply fails to understand the utterance as feedback and correction. Because of the ambiguity of feedback utterances, there is no guarantee that the speaker’s informative intention (the corrective intention) will be manifest to the hearer/learner. The example in (10.10) illustrates nicely that the usefulness of feedback and correction depends first and foremost on the learner and his inferencing capacities, rather than on the corrector and his communicative intentions. A corrector may at any time attempt to provide feedback but only the learner’s construal can make an utterance a successful provision of feedback. On what basis would Speaker B finally come to reject his initial assumption about Speaker A’s informative intention in the above example? How does he come to make the right assumptions? There are a number of additional assumptions which he must make to get there. The first is Grice’s Cooperative Principle; the corrector must be seen as attempting to communicate something to the hearer. The Communicative Principle of Relevance leads the hearer to construe réfléchis in the most efficient way possible, namely as an instruction to guess what has been intended. Bur Speaker B cannot guess the meaning of überlege and says so. Now the corrector repeats the same utterance réfléchis which is not particularly helpful. Speaker B might assume at this point that Speaker A cannot provide more information, but since he is a native speaker of German and speaks fluent French, this interpretation is implausible. An alternative interpretation is that Speaker A is being intentionally uncooperative, an interpretation which B also rejects. So B searches for yet another intended meaning and ultimately lands on the right one. Speaker B concludes that the utterance réfléchis, on a nonmetalinguistic interpretation, is Irrelevant to the on-going discourse but could be Relevant on a metalinguistic interpretation. Consider another example which illustrates the same point.
THE INTERPRETATION OF VERBAL FEEDBACK
381
(10.11) Scenario 3: Speaker A is reading from a German guidebook on Portugal and translating as he reads into English. a. A: … he had a daughter which was married to/ b. B: /WHO c. A: Denis d. B: WHO! He had a daughter WHO e. A: DENIS! f. B: No! I’m correcting you. “He had a daughter WHO. Not WHICH! It’s animate.
As in the example in (10.10), this example shows quite clearly that when the learner can interpret the discourse as relevant, he does. In the example, this means construing who as a question, not as a correction. Shifting to a metalinguistic interpretation can be quite difficult since it involves more processing effort.6 Next we need what we might call the Felicity Condition for the provision of feedback and correction. Speaker B must assume that Speaker A possesses metalinguistic commentary to pass on. Attempts to provide feedback and correction will fail if the learner assumes that the corrector does not know what he is talking about, or is in no position to provide feedback, for example, if both are L2 learners at the same stage of acquisition. It follows from this that the learner’s mental models as to who is an “expert” in the language and who is not will play a role in determining whether the information inferred from a corrector’s utterance is utilised. This means too that certain environments will be more conducive for the interpretation of feedback and correction because those environments lead to the creation of mental models where experts correct the inexpert. Obviously, the language classroom is one such environment. It has often been remarked on that classroom discourse is “unnatural” because students and teachers talk a good deal about the language. While one can legitimately ask what the properties of such discourse are, and how they compare with discourses in other environments, it should be kept in mind that the mental models that are relevant to the tutored environment are propitious to the interpretation of metalinguistic talk. (10.12) The Felicity Condition on the provision of feedback The learner must assume that the corrector indeed has metalinguistic information to communicate. If the corrector is not considered to be “expert” enough, the learner will disregard the feedback and correction.
One final assumption relevant to the account of (10.10) and (10.11) is that there is a shared code which can serve as an appropriate medium for the transmission of information about the non-shared code. This assumption is implicit in the two
382
INPUT AND EVIDENCE
Principles of Correction Interpretation. Speaker B must assume that Speaker A assumes that he (B) can work back from an interpretation of a French form to the meaning of the German verb. One possible construal of the above dialogue, given these assumptions, is that réfléchis is a translation-equivalent of überlege. That Speaker B actually makes this inference is revealed by his comment in (10.10). It is worth noting that information which is relevant to the discourse will be new information, which when combined with old information gives rise to further new information (Sperber and Wilson 1986: 48). To correctly interpret the feedback, therefore, the learner must select the appropriate new information to add to current assumptions. Explicit feedback depends on the perceptions of the corrector but she will often be in a difficult position to determine if her intended correction constitutes new information for the learner. Note too that nothing in what the corrector says can guarantee that the correct new information will be identified, as example (10.13) below shows. The inferencing is non-demonstrative (Sperber and Wilson 1986: 65); it’s an iffy sort of thing at best. 2.4 The Relevance of feedback depends on its informativeness (10.13) Scenario 4: (A = the L2 learner, B = the native speaker) a. A: Willst du mir die Tasche bringen? want you to-me the bag bring? ‘Do you want to bring me my bag?’ b. B: Wirst du/ will you c. A: Wirst du/ (a self-correction based on previous error and feedback interactions involving the use of werden and wollen) d. B: Du hast “willst du” gesagt you have “willst du” said ‘You said “willst du” ’ e. A: Wirst du mir… die Tasche bringen? will you to-me… the bag bring ‘Will you bring me my bag?’ f. B: Bringst du meine Tasche? bring you my bag ‘Will you bring me my bag?’ g. A: Okay! bringst du meine Tasche! h. B: Welche? ‘Which one?’
THE INTERPRETATION OF VERBAL FEEDBACK
383
i.
A: Die ist unten die… die… Wie sagt man “sink”? it is beneath the… the… How do you say “sink”? j. B: Waschbecken k. A: (hears Waschbecke) Die ist unten die Waschbecke l. B: unter dem Waschbecken m. A: (hears unten dem Waschbecke) unten dem Waschbecke n. B: (exasperated) UNTER DEM WASCHBECKEN! o. A: (equally exasperated) well if you’d stop eating the ends of your words!
To initiate knowledge restructuring in the learner, feedback must be informative. It is not sufficient that the learner assign an interpretation to the stimulus. It must tell the learner something that she does not already know. A stimulus which does not add to her stock of knowledge is not informative, it is merely redundant. Redundant information is Irrelevant information from the perspective of Relevance Theory. However, the information value of an utterance has to be determined on the basis of the role of the utterance in the discourse, i.e. whether the information is redundant to a literal interpretation or to some other intention such as the corrective intention. At this point, it is appropriate to introduce a modification of Relevance Theory proposed by Blakemore (1987), namely the distinction between aspects of discourse pertaining to the truth-conditional meanings of utterances and aspects of discourse pertaining to the non-truthconditional meanings of utterances. This distinction is referred to in the Relevance Theory literature as the conceptualprocedural dichotomy. According to Blakemore, many expressions in a discourse do not contribute “conceptual meaning” to the utterance, i.e. do not contribute to the truth value of the corresponding proposition.7 These expressions, rather, constrain the computations determining utterance interpretation by signalling other, one might say, pragmatic aspects of meaning. For example, Blakemore uses the distinction to explain the difference in interpretation of the two utterances He is poor and he is honest and He is poor but he is honest. In both utterances, the truth conditional meaning of and and but is the same, however, the presence of but signals to the hearer that she should infer a proposition logically inconsistent with a proposition derived from the interpretation of the first clause. The following example illustrates this. (10.14) [A and B are discussing the economic situation and decide that they should a consult a specialist in economics.] A: John is not an economist. (→ We shouldn’t consult him.) B: But he is a businessman. (→ We should consult him.) (Yoshimura 1998: 107)
384
INPUT AND EVIDENCE
From A’s assertion that John is not an economist, one can make the inference that he is not a suitable candidate for a consultation on the topic of the current economic situation (inference: If John is not an economist then we should not consult John on economic matters). B’s assertion that John is a businessman might be construed as not optimally Relevant in this context, however, the use of the expression but provides a signal to the hearer that she should build a proposition which contradicts that proposition derived from the interpretation of John is not an economist. Since that proposition is NOT (WE SHOULD CONSULT JOHN), the hearer knows that she should construct the affirmative proposition (WE SHOULD CONSULT JOHN). Yoshimura (1998) uses this distinction to provide an analysis of metalinguistic negation in which negative markers are introduced into a discourse to signal a shift to a metalinguistic interpretation. See Horn (1985) who provides examples similar to ours above. (10.15) A: Esker too ah coo-pay luh vee-and? B: Non, je n’ai pas ‘coo-pay luh vee-and’ — J’ai coupé la viande. (Horn 1985: 133, taken from Yoshimura 1998: 114)
The explanatory problem of (10.15) is to explain why it is that non is not construed as a propositional negation. Yoshimura states: we would like to propose that metalinguistic negation warns the cognitive processor that the pre-negation utterance is somehow problematic and instructs the hearer to search for a means of appropriately modifying the cognitive environment. Therefore metalinguistic negation can be formalised as in (44) (44)
Metalinguistic negation (0, {…})
The pre-negation utterance (i.e., the form the utterance would assume if it were not negated) is problematic, so search for a means of appropriately modifying the cognitive environment. (Yoshimura 1998: 117)
Unfortunately, this is very vague, far too vague to be helpful as a solution to the problem of how the learner is to use correction and feedback as a means for altering her grammar, whatever its utility might be in dealing with the problem of metalinguistic negation. In addition, the analysis does not really reveal the basis on which the hearer makes the shift to a metalinguistic interpretation, as Yoshimura admits. In the absence of an explanation of how that shift occurs, it is not clear that metalinguistic interpretation can be reduced to the conceptual/ procedural dichotomy of Blakemore. I conclude from this that my approach, namely that metalinguistic feedback is Irrelevant, is the correct one. A shift to a
THE INTERPRETATION OF VERBAL FEEDBACK
385
metalinguistic interpretation is possible when the corrector produces an utterance which is Irrelevant to the propositional content (truth value) of the discourse. It may still be the case that the corrector can have a corrective intention but that a given correction will fail as correction because it is Irrelevant in that function as well. The first correction by B dealing with the correct choice of verb to express futurity did not provide the learner with any new information about the German verbal system since she had made this mistake before and been corrected on it several times. She was able to self-correct in (10c). While the corrector may intend the utterance as a feedback and correction, and the learner will discern the corrector’s information intention (because one doesn’t answer a question by repeating it), its information content as feedback is zero. The utterance fails as feedback. It seems to me that the informativeness of feedback and correction for a learner is often overestimated in discussions of the development of a general learning theory. In other words, it is assumed that all attempts at feedback by the corrector contribute new information to the learner. But this is clearly false as (10.13) shows. Indeed, the more knowledgable the learner is the more likely it is to be false. Once a learner has acquired a grammatical distinction and can reproduce it at least some of the time, feedback and correction about it will merely provide the learner with evidence that she has made a performance mistake. This is not trivial if it leads the learner to stabilise production schemata, but it will not lead to restructuring of the mental grammar. The second correction that B provides in (10.13f) focuses on the correctness of choosing the future tense to make a request as opposed to the imperative. Speaker B wants Speaker A to understand that expressing requests in German via transferred English-based structures with modals and infinitives is inappropriate. Speaker A (who had had enough correction for one communication event) acknowledges that she has made an error by repeating the proffered correction in (10.13g). The third bit of feedback and correction is the important one here. During the earliest stages of learning, Speaker A had considerable difficulty representing accurately the phonological shape of certain sounds, in particular the shape of weak syllables. Because she has not heard the distinction between her pronunciation and that of B’s, she does not grasp that it is the focus of the correction. Rather she attends to and alters her formulation of the case-marked determiner, substituting one form (dem) for another (die). It is only because B presents the correction a second time with considerable insistence, that A directs her attention to other aspects of her utterance.
386
INPUT AND EVIDENCE
2.5 The blame assignment problem Example (10.13) illustrates well the blame assignment problem (Pinker 1989: 12) discussed in Chapter 8. As we noted there, the provision of feedback and correction indicates an error, but it doesn’t say what the error is. Even when the learner clearly recognises the corrective intention of the corrector, he must analyse or guess where he has probably made a mistake in the sentence and what level of planning or execution (speech act, lexical choice, morphosyntax, linear order, articulation) is implicated. It is an open question as to whether the learner can indeed locate the problem source. Let us consider the difficulties involved. We have assumed that working memory is severely constrained. Therefore, it is implausible that the learner has representations of his utterance (at each of the different levels of analysis) in working memory after the utterance has been articulated. The learner will undoubtedly have in LTM a conceptual structure since this is critical to understanding the discourse, and the learner will have “permanent” representations of the phonological forms of words in lexical entries. The mediating representations, however, will disappear as soon as they have done their work. In the theory presented here, the learner has access to the information stored in LTM in lexical entries. He does not access the information encoded in the parsers, nor can he access the information about the grammar stored in the modules. What, then, does the learner do with the information provided by feedback and correction? One possibility is that the learner attempts to reproduce his utterance. It is not an accident, I think, that learners so often repeat correction and feedback. By repeating what they have heard, they are in effect rehearsing strings. This means encoding the same conceptual structure anew and directing their attention to those levels of analysis which the system makes available. At what level of analysis? Clearly, the learner who hears (10.13l) and repeats it as (10.13m) is encoding phonetic properties. Is that all? For learners with metalinguistic awareness, this seems unlikely. Rather, it seems that the learner can access the information that the acoustic phonetic information in (10.13l) is related to three distinct lexical items. The theory makes available to the conceptual system the lexical form-meaning correspondences and phonological representations. It is therefore possible to explain how the learner can hear modelled forms, infer that the modelling means that his production was not good, and “edit” the phonological representations stored in his mental lexicon. The theory will also accommodate feedback linking the form of the determiner (dem) to the form of the preposition. Other instances of feedback appear to have little effect. Consider the system underlying the use of counterfactual statements. Two clauses are constructed, one
THE INTERPRETATION OF VERBAL FEEDBACK
387
in the form of a condition, the other the consequent (apodosis). German grammars note that two versions are possible. (10.16) a.
b.
c.
d.
Wenn ich das Geld hätte, führe ich nach Toronto. if I the money had+ travel+ I to Toronto ‘If I had the money, I would go to Toronto.’ Wenn ich das Geld gehabt hätte, wäre ich nach if I the money had+ had+ would+ I to Toronto gefahren. Toronto travelled ‘If I had had the money, I would have gone to Toronto’ Wenn ich das Geld hätte, würde ich nach Toronto fahren. if I the money had+ would+ I to Toronto travel ‘If I had the money, I would go to Toronto.’ Wenn ich das Geld gehabt hätte, würde ich nach if I the money had+ had+ would+ I to Toronto gefahren sein. Toronto travelled be. ‘If I had had the money, I would have gone to Toronto’
Although I studied German as a dead language many years ago in high school, I learned little and immediately forgot what I learned. My acquisition has therefore been largely untutored, and based on discourse that I have with my butcher Uwe, train conductors, waitresses, and the ladies who work in Hamburg’s department stores. If anyone used forms such as those in (10.16a, c) in my presence in the early years of my exposure to German, it is news to me. I have no conscious awareness of ever having heard forms of this sort, and have certainly never attempted to produce such sentences. On the other hand, I have occasionally heard constructions like those in (10.16b, d). I have only just discovered, however, in researching this problem, that the modals I heard must have been in the subjunctive. I have always “heard” them as wurde/wurdest/ wurden which is the past tense of the modal werden. In my own attempts to reproduce these sentences, I have also been corrected, by my secretary, by colleagues and by students. If I have never noticed that the correction included the subjunctive forms, it is perhaps because my attention has usually been on the order of the verbs and the form of the past tense and I simply could not attend to both the position of the verbs and their phonetic form at the same time. Certainly in my mental model of my own German, the location of tensed verb is often erroneous. My mental model obviously also includes a representation of the “correct” position, namely sentence final. Let’s assume that I have a condition–action empirical rule for production purposes of the sort in (10.17).
388
INPUT AND EVIDENCE
(10.17) If a VERB is TENSED and appears in an EMBEDDED CLAUSE, then it appears on the RIGHT BOUNDARY of that CLAUSE.
This condition–action rule will necessitate various correspondences between the concepts , , and and the corresponding elements which actually occur in the phonological structures. Tense will only be accessible by activating the lexical representation for a specific tensed verb. Clause boundaries will only be accessible to the extent that they correspond to the edges of intonation phrases. Notice that this means that the construct will not be accessible since this construct is eliminated in the transition from s-structures to phonological structures. The learner will therefore have to resort to analyses which encode linear notions like .8 Presumably focusing attention on these correspondences between the conceptual structures and phonological structures has prevented my noticing the actual phonetic forms of the verbs involved. 2.6 Metalinguistic information, grammar teaching, and information which cannot be modelled So far I have been discussing the interpretation of forms which the learner might have occasion to hear if the circumstances were right. In other words, we have been dealing exclusively with forms which actually occur in the language. What about negative evidence? How does one interpret the information in (10.18)? (10.18) You can’t say “He explained me the problem”.
I hypothesise that information of this sort must always be interpreted in the context of plausible alternatives. Negative evidence has an effect on the learner’s output if it leads the learner to select (10.19) and to suppress (10.18). (10.19) He explained the problem to me.
The direct and indirect forms of feedback that were discussed in Chapter 8 apparently can result in an inference comparable to (10.18). And they also appear to be able to correctly influence the learner’s production behaviour in the desired way. What do we say about metalinguistic information encoded in grammatical rules of the sort one reads in grammar books? Can it not have a suitable effect on the learner’s psychogrammar? The uncontested failure of the grammartranslation teaching method would appear to provide ample evidence that learning “about” the language does not lead to mastery “of” the language. But we must try to understand these matters in psycholinguistic terms. If what we mean by mastery of the language is the ability to comprehend and produce
THE INTERPRETATION OF VERBAL FEEDBACK
389
spoken language, then reading it will not lead to the requisite abilities since reading will create visual not acoustic-phonetic representations of the language. This has nothing directly to do, however, with the question: Can you only learn the grammar of a language by parsing stimuli bottom-up? We must also acknowledge that the grammar rules appearing in most pedagogical and descriptive grammars are incomplete and inaccurate. Consequently, we must not measure the accessability of the information in terms of the accuracy of the learner’s output when compared to native speakers. In my view, the question really boils down to three others: (i) Can metalinguistic information in the form of taught grammar rules effect parsing procedures? (ii) Can such information effect speech production procedures? There is no empirically motivated answer to these questions at the moment, despite the speculations reviewed in earlier chapters. I suspect that the answer to the first question will prove to be “No” and the answer to the second will prove to be “Yes”. The third question: (iii) Does this constitute having an effect on the learner’s grammar? depends entirely on one’s theory of what a mental grammar is. In this book, we have regarded the mental grammar as the psychogrammar, that is all of the grammatical information relevant for parsing and producing speech. On the basis of that definition, grammar teaching effects the learner’s grammar. Should there be some other notion of mental grammar relevant to SLA, we might be forced to reject this position. I leave it to others to motivate such a notion. 2.7 The corrective intention and indirect forms of feedback The provision of direct feedback and correction lies in the hands (or mouth) of the corrector. In other words, she must perceive that there is a need to provide feedback and be willing to do so. Explicit feedback and correction of the sort illustrated here thus differs from implicit or indirect forms of feedback such as recasts, repetitions of an erroneous utterance, clarification questions or indirect prompting since, outside of the classroom, the corrector usually has no intention to correct.9 The corrector is instead using the repetition or clarification question to say: “Do I understand correctly what you are trying to say?” But if the corrector has no intention to correct, then there will normally be no corrective intention to attribute to the corrector. If the learner cannot attribute a corrective intention to the corrector, there is no reason to assume that in normal discourse, he will construe repetitions or clarification requests as forms of linguistic feedback. The learner will normally construe such utterances as indicating a breakdown in communication, but a breakdown in communication does not normally lead to restructuring of the speaker’s grammar. In other words, there is a critical step
390
INPUT AND EVIDENCE
missing in claims (Schachter 1986) that indirect forms of feedback can lead to language learning. The critical step is that the learner must make the inference that his grammar is at fault, and not, e.g. the radio was on and the corrector was inattentive. It follows that although indirect and tacit forms of negative evidence may be more frequent in native speaker/non-native speaker discourses than overt forms, they are less likely to be construed as feedback. Therefore, they are less likely to play a role in restructuring the learner’s grammar.
3.
Summary
In this chapter, I have focused attention on what makes linguistic feedback usable. The interpretation of feedback and correction requires first of all a metalinguistic capacity since the correct construal of the corrective intention requires treating language itself as an object of thought. Feedback and correction would thus appear to be excluded in certain language learning situations because the learner has limited or no metalinguistic capacity (early L1 acquisition, possibly L2 acquisition in illiterates). Interpretation also requires an ability to parse and interpret the stimuli, which also precludes a central role for feedback and correction in early L1 acquisition and in those cases of L2 acquisition where there is no recourse to a common code (pidgin contact situations). I have shown, through an analysis of a corpus of spontaneously provided forms of feedback and correction to adult L2 learners, that the interpretation of linguistic feedback requires interpreting an utterance as obeying the Cooperative Principle but violating Relevance. Learners, however, will make this move only when no other interpretation is possible — the move to a metalinguistic interpretation is therefore a move of “last” and not first resort. I have also shown that assigning an utterance a feedback interpretation involves attributing a corrective intention to the corrector. Since repetitions, failures to comprehend and other commonly-cited forms of indirect feedback do not require making such an attribution, the correct interpretation for such utterance-types is less likely. The more inferencing the learner must make on the basis of (non-verbal) perceivable events or information in longterm memory, the less likely she is to identify the corrective intention. Contrary to common assertions in the interaction literature, the “best” feedback and correction is probably the most explicit — which is the least likely to occur. Finally, it was observed that a given utterance can be doubly Irrelevant, namely when the corrector violates Relevance in order to provide feedback but the information construed by the learner is not new. In the first instance, the utterance is Irrelevant to the literal (truth functional) analysis
THE INTERPRETATION OF VERBAL FEEDBACK
391
of the meaning of the sentence. In the second instance, it is Irrelevant in its metalinguistic function. I conclude that for this reason the provision of feedback and correction may be comparatively useless in the advanced stages of acquisition when the corrector is less able to discern errors from “slips of the tongue” made by the learner, and when feedback and correction are more likely to be redundant. In short, considerations of an interpretive nature suggest that even if feedback and correction is “in the air”, its utility will be severely restricted in both the early and most advanced stages of language acquisition, making it implausible that it could play a central role in the development of a general theory of language acquisition. We must therefore conclude that feedback and correction could not be a substitute for universals of linguistic cognition in explaining the representational problem of language acquisition. It should be apparent that their function in explaining the developmental problem of language acquisition will also be severely constrained — to particular phases of the learner’s development and to particular types of acquisition problems. Fans of Universal Grammar can take heart from this message, and may be forgiven if they tell us “Well, we told you so”. On the other hand, those of us who are interested in learning, in variable knowledge, and in the far broader types of knowledge needed to use a language than typically get discussed in the generative literature, can say exactly the same thing. The debates will continue — this book will have resolved perhaps none of them — but it does point to one pressing issue: we need to make much further headway on the question of the uniformity and variability of language acquisition. Second language learners are not always driven to unique “solutions” to their learning problems, nor are their systems always variable in every respect. When we know which solutions are uniquely given and which not, then we will be better able to apportion the explanatory work between a theory of linguistic universals and a theory of induction.
Notes 1. This applies, for example, to the empirical studies by Long (1980, 1981, 1983), Ellis (1985), Kasper (1985), Gass and Varonis (1985a, b, 1989, 1994), Varonis and Gass (1985a, b), Brock et al. (1986), Pica (1987, 1988a, b), Pica, Doughty, and Young (1986), Pica, et al. (1987) which show that native speakers (NS) modify their utterances, restate, repeat and elaborate them in the face of communication difficulties with non-native speakers (NNS). While these studies provide invaluable data illustrating the complexity of verbal interactions in NS–NNS discourse, in the absence of sophisticated models of parsing, inferencing, and language learning, they cannot but fail to show that there is a causal relationship between comprehension, parsing and SLA.
392
INPUT AND EVIDENCE
2. I will refer to violations of the Relevance Principle as Irrelevant, using lower case spelling for the ordinary language use of irrelevant. 3. This makes feedback and correction fundamentally different from behaviour referred to as repairs which has been investigated in the context of studies of conversational structure and turn-taking (Schegloff, Jefferson, and Sacks 1977; Jefferson 1981) in which issues of accurate reference, predication and truth values are primary. 4. Consider in this regard the critique of Pica et al. (1989: 66) of their own earlier studies of negotiated interaction between NS and NNS. They observed little NNS modification of unclear language. NSs were modelling target versions of the L2. They attribute both types of behaviour to the low proficiency level of the NNS. They also observe that the NS were ESL teachers “familiar with interlanguage productions and with classroom feedback conventions [who] provided a biassed sample of NSs uniquely adept at supplying learners with target models”. It is also possible that the NS couldn’t interpret and use the NNS modifications. 5. I owe this example to the late Fernando Tarallo. 6. Abbot and Costello fans will have got the point immediately. 7. Within the Conceptual Semantics approach to meaning, it is redundant (if not incoherent) to speak of “conceptual meanings”. Meanings are encoded in conceptual structures and experienced as an understanding of the world around us. 8. Notice that the theory elaborated here actually provides an explanation for the claims of Clahsen and Muysken (1986 1989) and Meisel (1996) that adult learners are using linear performance strategies. I would dispute Meisel’s (1996) claim that these strategies are not structural — they are indeed structural, but only at the level of the phonology. My analysis is actually quite consistent with Meisel’s point that morphosyntactic structure is not available. I also must emphasise that unconscious induction is not restricted to operating over phonological structures. Feedback and correction are only restricted in this way because they presuppose awareness. 9. Repetitions which reformulate an erroneous utterance are different in that they provide primary linguistic data which the learner can then compare to a just-uttered error. Such repetitions fall under the category of models.
Epilogue In this book, I have attempted to explore the logical and empirical relationship between the stimuli that learners of a second language hear and the input to learning mechanisms which appear to be required by the grammars that they ultimately learn. I have argued that these two issues — what is in the signal and what is in the representations which feed the learning mechanisms — cannot be the same and must be investigated separately. Input to speech processing comes from the speech signal, but the input to learning mechanisms can come from whatever sources the learning mechansism can access. I have argued here, and provided evidence for the claim, that those sources include the conceptual representations. L2 learning is not, therefore, a matter solely of analysing the signal in a bottom-up manner. In addition, I have argued that a theory of second language acquisition must be built around some notion of Universal Grammar which provides us with a theory of the constraints defining natural language systems. However, the theory of Universal Grammar which I have been working with, and explicitly arguing for, is not that conception which emerges from the Principles and Parameters Theory. UG is not a “box” in the mind which the learner accesses whenever she starts to learn a new language, rather it is a characterisation of the constraints which clearly hold of linguistic systems as a consequence of normal learning processes. These processes are not to be seen as part of a module unlike any other component of mind, nor are they part of some great undifferentiated general theory of learning. There are modular processes of speech processing and autonomous levels of representation, but there are also metaprocesses of category-formation which exhibit similarities across domains of knowledge (namely, the prototypicality of categories). An explanatory theory of SLA must account for all of these things, and the Autonomous Induction Theory certainly attempts to do so. It does so by adopting a specific architecture of language in which the construct of level of language plays a critical role. The traditional levels have been reinterpreted as autonomous levels of representation which are linked to each other via correspondences, basically that a category C of level A corresponds to a category C′ at level B. Learning, in this framework, then consists of
394
EPILOGUE
acquiring the categories of a given level, learning the correspondences across levels, and the constructions which within-level operations build. The theory includes a variety of constraints on each of these metaprocesses, thus attempting to deal with the principal critique of general theories of learning. Finally, I have introduced a theory of feedback and correction which focuses on the interpretation of feedback, as a kind of metalinguistic commentary on the learner’s production, and what that entails in terms of feedback as input to learning problems which involve encodings in autonomous representational systems. It turns out that feedback and correction in the L2 can, in principle, play only a limited role in second language acquisition because the utterance in which the feedback is expressed must be parsable and parsed. Moreover, the learner must attribute to his interlocutor a corrective intent, which, as we have seen, is not always straightforward, given that certain forms of correction can first be given a truth-functionally Relevant interpretation. The clearest cases are those where the feedback is Irrelevant to that purpose, but to be a factor in grammatical restructuring, the correction or feedback must still be Relevant to the learner as a piece of metalinguistic commentary. Should it be the case that the learner has already represented the information, despite what she produces spontaneously, then she will dismiss the feedback as Irrelevant a second time. Finally, the content of the feedback must be put into correspondence with representations of the grammar which can connect to the conceptual representations. Lexical entries are themselves correspondences and so they can be prime targets for feedback and correction as agents of grammatical restructuring. Prosodic representations, in particular, aspects of constituent order, will also be susceptible to input from feedback and correction since these are the representations which are projected into awareness, and awareness, we have said, is more detailed processing at that level. This leaves huge areas of grammatical competence potentially impervious to feedback and correction — aspects of meaning, morphosyntax, internal morphological structure of words, sub-segmental aspects of phonology and all of phonetic knowledge. The empirical research which has been presented here is compatible with these predictions, but it should be obvious that the theory followed the empirical studies rather than preceding them. This book, in the end, offers a research framework for the investigation of feedback and correction, rather than the last word on the topic. I hope, nonetheless, that it will inspire more, and better, work than that I have presented here.
Appendix 1 Acceptability judgement task
01. 02. 03. 04. 05. 06. 07. 08. 09. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31.
She offered some milk to the cat. Antonio went to sleep early; he had a good sleep. His brother is resembled by Peter. The teacher explained the class the problem. All the piloters go through hard training. My friend is taking driving lessons; she will be a good driver. Usually, they ride on the TTC; it is a short riding from their home to work. A book was given to me yesterday. After we finished cooking all the food, we sat down and ate the delicious cook. Our TV doesn’t work, so we had to call a repairer. The policeman asked a question to Teresa. The boy cut a piece of cake for his sister. In our classroom, some are better understanders than others. Both French and Spanish are spoken in Morocco. The repairman charged Ling twenty dollars. I love listening to her, she is such a talented singer. He got Susan to admit her mistake. My cousin always speeds on the highway; he doesn’t respect the legal speed. Please start Consuelo the car. We climbed all the way to the top of the mountain; it was a difficult climbing. Elephants can be seen at the Toronto Zoo. He made her to stay until the dishes were done. The grandmother knits a sweater for her granddaughter. In the evening, the insects began to bite; soon, the tourists were full of mosquito bites. This house cost a fortune to Mike. When I promise something, I always keep my promise. My closet is a mess; I am going to get a closet organiser. Mary envied her wardrobe to Ines. The car I have now is more better than the car I had before. Run, Kim, Run. Show them what a good runner you are. Manuel wanted to learn how to skate, but he didn’t have skates.
396 32. 33. 34. 35. 36. 37. 38. 39. 40. 41. 42. 43. 44. 45. 46. 47. 48. 49. 50.
APPENDIX 1 This book reads well; it is an exciting reader. The money was stolen by the robbers. Ellen can learn a poem in ten minutes; she is the best canner I know. He fixed his wife the sewing machine. The door opened. I made the letter written by Ho. The boys got together to prepare the party; everybody was enthusiastic about the prepare. John got his little brother to wash his car. We get all our bread from my brother; he is a baker. The boss made Pablo his foreman. Margarita is not as taller as her sister. Then there is a lot of work to be done, each member of the family becomes a helper. Emilio born in Mexico. The children learned how to paint with brushes, then they each started to work on a paint to take home. She wished her daughter a good night. My dog comes when I call him; he is a quick comer. Jane speaks faster than Anna. Smell the flowers. They have such a nice smeller. The woman started to laugh; she had a soft melodious laugh.
Appendix 2 Experimental session
First Set of Feedback Items 01. John is going to fly to New York tomorrow. He used to be afraid of flying. 02. They took a long time to decide how to name the baby. Finally, they came up with a name that everybody liked. 03. Every year at Harbourfront, writers read their books to the Public. They have successful readings. 04. Tom is learning how to drive. An instructor supervises his driving. 05. Maria likes to brush her daughter’s hair. She uses a soft brush. 06. Please water all the vegetables in the garden. Give them a lot of water, it is quite hot today. 07. The judges choose the winner of the Miss Grey Cup contest. The choosing of the winner is always difficult. 08. You have to be careful when you cut with a sharp knife. It is very easy to get a cut. 09. My daughters phone their friends all the time. We had to install a phone in their room. 10. Alice decided to paint her house. The painting took a month. 11. How does this book end? If you want to find out, read the end of the book. 12. I always give my friends presents on their birthdays. The giving of presents is part of Canadian culture. First Set of Guessing Items 01. Some people dream every night. Not everyone can remember their dream in the morning.
398
APPENDIX 2
02. The dentist is going to fill the cavity in the patient’s tooth. The filling doesn’t hurt, but it scares people anyway. 03. Anna likes to cook for her family. They all enjoy her tasty cooking. 04. Which way do you want to walk? I would prefer a walk near the river. 05. A tailor must fit the suit for the customer. The fitting takes a lot of time. 06. Judy is going to leave her job on Friday. She is going on maternity leave. 07. We went to swim in the lake. We had a good swim. 08. The bank manager decided to offer Micaela a job in his bank. Micaela accepted his offer. 09. My parents want to move this Sunday. We are going to help them with the moving. 10. My cats purr when they are happy. Their purring can go on for a long time. 11. There are people who work from morning till late at night. Their work is their life. 12. Salespeople sell merchandise. They must be very good at selling merchandise to be promoted. Second Set of Feedback Items 01. The teacher likes to test her students regularly. She gives them a test every week. 02. The boy asked the girl: “Can I kiss you?” “Ok” said the girl, “but only a little kiss.” 03. I water my plants every week. The watering takes place on weekends. 04. So many people divorce these days. My friend just got a divorce from her husband. 05. Some writers prefer to write in complete silence. They do most of their writing during the night. 06. Every election, political parties mail information to voters. These mailings are quite expensive. 07. Deer often cross the highways in Northern Ontario. You can easily hit one with your car during a herd’s crossing. 08. In the evening, it started to snow heavily. The snow was soft and deep. 09. The children ask grandmother to tell them stories. These favourite stories get longer a each telling.
APPENDIX 2 10. It is all right to question everything. By asking a question, one can learn new things. 11. Children go to school to learn. Schools are places of learning. 12. What did the actors play last week at Stratford? It was a play by Shakespeare. Second Set of Guessing Items 01. The Japanese continue to kill whales. This killing of endangered animals has to be stopped. 02. Learners try to understand the rules of English grammar. Their understanding of these rules is important. 03. Cats catch mice. Often, they bring their catch into the house. 04. Miguel doesn’t like to shave himself. He goes to the barber for a shave. 05. The teacher shows the students how to spell some new words. She corrects their spelling of these words when they are at the blackboard. 06. Juan likes to rest in the afternoon. He takes a rest a 3:00 o’clock every day. 07. Robert Campeau wants to open new stores in the U.S. The opening of each store gets lots of publicity. 08. Raul started to teach math last year. His teaching is improving all the time. 09. Peter decided to drive to Montreal. It is a six-hour drive. 10. Antonio hopes to get a bicycle for birthday. His sister knows about his secret hope. 11. The visiting professor was invited to talk at the University. Many students attended his talk. 12. George Bell plays in left field. His wonderful playing may get him a raise.
399
References
Abraham, W., Kosmeijer, K., and Reuland, E. (eds.) 1990. Issues in Germanic syntax. New York: Mouton de Gruyter. Aksu-Koç, A. A., and Slobin, D. I. 1985. “Acquisition of Turkish.” In D. I. Slobin (1985b), 839–78. Alanen, R. 1992. Input enhancement and rule presentation in second language acquisition. Unpublished M. A. thesis, University of Hawai’i at Manoa, Honolulu, Hawai’i. Alanen, R. 1995. “Input enhancement and rule presentation in second language acquisition.” In R. Schmidt, 359–302. Allwood, J. 1993. “Feedback in second language acquisition.” In C. Perdue, Vol. 2: The results, 196–235. Altmann, G. T. M. (ed.) 1990. Cognitive models of speech processing: Psycholinguistic and computational perspectives. Cambridge, Ma: MIT Press. Altmann, G., and Shillcock, R. (eds.) 1993. Cognitive models of speech processing: The second Sperlonga meeting. Hove, U.K: Erlbaum. Andersen, R. W. (ed.) 1983. Pidginization and creolization as language acquisition. Rowley, Ma: Newbury House. Andersen, R. W. (ed.) 1984. Second languages: A cross-linguistic perspective. Rowley, Ma: Newbury House. Andersen R. W. 1991. “Developmental sequences: the emergence of aspect marking in second language acquisition.” In T. Huebner and C. A. Ferguson, 305–24. Andersen, R. W. 1993. “Four operating principles and input distribution as explanations for underdeveloped and mature morphological systems.” In K. Hyltenstam and A. Viberg, 309–339. Anderson, J. R. 1983. The architecture of cognition. Cambridge, Ma: Harvard University Press. Anderson, J. R. 1985. Cognitive psychology and its implications. New York: Freeman. Anderson, J. R. (p.c., 1995). (Statement made in response to questions about his presentation) ACT: A simple theory of complex cognition. Paper to The Turing Hypothesis Revisited: New perspectives on architectures of cognition. Einstein Forum and Potsdam University, Oct. 12, 1995. Anderson, S. R. 1992. A-morphous morphology. Cambridge: Cambridge University Press.
402
REFERENCES
Anderson, S. R., and Kiparsky, P. (eds.) 1973. A Festschrift for Morris Halle. New York: Holt, Rinehart and Winston. Archibald, J. 1991. Language learnability and phonology: The acquisition of L2 metrical parameters. Doctoral dissertation, University of Toronto. Archibald, J. 1993. Language learnability and L2 phonology: The acquisition of metrical parameters. Dordrecht: Kluwer. Archibald, J. 1994. “A formal model of learning L2 prosodic phonology.” Second Language Research 10(3): 215–40. Archibald, J. 1999. “Charting the learning path in second language phonology: The acquisition of OT grammars.” Eurosla 9, Lund University, June 10–12 1999, Lund Sweden. Arnoud, P., and Bejoint, H. (eds.) 1992. Vocabulary and applied linguistics. London: Macmillan. Aronoff, M. 1976. Word formation in generative grammar. Cambridge, Ma: MIT Press. Aronoff, M. 1994. Morphology by itself: Stems and inflectional classes. Cambridge, Ma: MIT Press. Asher, R. E. (ed.) 1994. The encyclopaedia of language and linguistics. Oxford: Pergamon Press. Aslin, R. N., Alberts, J. R. and Petersen, M. R. (eds.) 1981. Development of perception: Psychobiological perspectives. Vol. 1. New York: Academic Press. Bach, E., and Harms, R. T. (eds.) 1968. Universals in linguistic theory. New York: Holt, Rinehart and Winston. Bach, K., and Harnish, R. M. 1982. Linguistic communication and speech acts. Cambridge, Ma: MIT Press. Baillargéon, R. 1986. “Representing the existence and the location of hidden objects: object permanence in 6- and 8-month old infants.” Cognition 23: 21–42. Bailey, C.-J. N., and Shuy, R. W. (eds.) 1973. New ways of analysing variation in English. Washington, D. C.: Georgetown University Press. Baker, C. L. 1991. “The syntax of English not: The limits of Core grammar.” Linguistic Inquiry 22(3): 387–429. Baker, C. L., and McCarthy, J. J. (eds.) 1981. The logical problem of language acquisition. Cambridge, Ma: MIT Press. Baltin, M. R., and Kroch, A. S. (eds.) 1989. Alternative conceptions of phrase structure. Chicago: University of Chicago Press. Banfield, A. 1982. Unspeakable sentences: Narration and representation in the language of fiction. Boston: Routlege and Kegan Paul. Barbeau, P., Ducharme, C., and Valois, D. 1981. “L’usage du genre en canadien français: Une étude syntaxique et sociolinguistique de la feminisation des noms à initiale vocalique.” Recherches linguistiques à Montréal 17: 1–42. Bardovi-Harlig, K., and Bofman, T. 1989. “Attainment of syntactic and morphological accuracy by advanced language learners.” Studies in Second Language Acquisition 11(1): 17–34.
REFERENCES
403
Barringer, C., and Gholson, B. 1979. “Effects of type and combination of feedback upon conceptual learning by children: Implications for research in academic learning.” Review of Educational Research 49(3): 459–78. Bates, E., Bretherton, I., and Snyder, L. 1988. From first words to grammar: Individual differences and dissociable mechanisms. Cambridge: Cambridge University Press. Bates, E., and MacWhinney, B. 1981. “Second language acquisition from a functionalist perspective: Pragmatic, semantic, and perceptual strategies.” In H. Winitz, 190–214. Bates, E., and MacWhinney, B. 1987. “Competition, variation and language learning.” In B. MacWhinney, 1987c, 157–193. Bates, E., and MacWhinney, B. 1989. “Functionalism and the Competition Model.” In B. MacWhinney and E. Bates, 3–73. Bates, E., McNew, S., MacWhinney, B., Devescovi, A., and Smith, S. 1982. “Functional constraints on sentence processing: A crosslinguistic study.” Cognition 11: 245–300. Beauvois, M. F., and Dérousné, J. 1979. “Phonological alexia: Three dissociations.” Journal of Neurology, Neurosurgery, and Psychiatry 42: 1115–24. Berko, J. 1958. “The child’s learning of English morphology.” Word 14(1): 150–77. Bernstein Ratner, N. 1986. “Durational cues which mark clause boundaries in motherchild speech.” Journal of Phonetics 14: 303–10. Berry, D., and Broadbent, D. 1984. “On the relationship between task performance and associated verbalisable knowledge.” Quarterly Journal of Experimental Psychology 36A: 209–31. Bertolo, S. 1995. “Maturation and learnability in parametric systems.” Language Acquisition 4(4): 277–318. Berwick, R. 1985. The acquisition of syntactic knowledge. Cambridge, Ma: MIT Press. Berwick, R. C., and Weinberg, A. S. 1984. The grammatical basis of linguistic performance. Cambridge, Ma: MIT Press. Best, C. T., McRoberts, G. W., and Sithole, N. M. 1988. “Examination of the perceptual re-organisation for speech contrasts: Zulu click discrimination by English-speaking adults and infants.” Journal of Experimental Psychology: Human perception and performance 14(2): 345–360. Bialystok, E. 1979. “Explicit and implicit judgements of L2 grammaticality.” Language Learning 29(1): 81–103. Bialystok, E. 1982. “On the relationship between knowing and using linguistic forms.” Applied Linguistics 3(3): 181–206. Bialystok, E. 1986a. “Factors in the growth of linguistic awareness.” Child Development 57(2): 498–510. Bialystok, E. 1986b. “Children’s concept of word.” Journal of Psycholinguistic Research 15(1): 13–32. Bialystok, E. 1987. “Influences of bilingualism on metalinguistic development.” Second Language Research 3(2): 154–66. Bialystok, E. 1994. “Analysis and control in the development of second language proficiency.” Studies in Second Language Acquisition 16(2): 157–68.
404
REFERENCES
Bialytok, E., and Bouchard Ryan, E. 1985. “A metacognitive framework for the development of first and second language skills.” In D-L. Forrest-Presley, G. E. MacKinnon, and T. G. Waller, 207–52. Bialystok, E., and Sharwood Smith, M. 1985. “Interlanguage is not a state of mind.” Applied Linguistics 6(2): 101–17. Bickerton, D. 1990. Syntactic development: The brain just does it. Ms. University of Hawai’i at Manoa. Birdsong, D. 1989. Metalinguistic performance and interlinguistic competence. Berlin: Springer Verlag. Birdsong, D. 1991. “On the notion of ‘critical period’ in UG/L2 theory: A response to Flynn and Manuel.” In L. Eubank, 147–165. Birdsong, D. 1992. “Ultimate attainment in second language acquisition.” Language 68(4): 706–55. Bishop, D. V. M. 1994. “Grammatical errors in specific language impairment: Competence or performance limitations?” Applied Psycholinguistics 15(4): 507–50. Blaxton, T. A. 1992. “Dissociations among memory measures in memory-impaired subjects: Evidence for a processing account of memory.” Memory and Cognition 20(5): 549–62. Bley-Vroman, R. 1990. “The logical problem of second language learning.” Linguistic Analysis 20(1–2): 3–49. Versions also published in (1989) as: “The fundamental character of foreign language learning.” In W. Rutherford and M. Sharwood Smith, 19–30; and also in (1989) as: “What is the logical problem of foreign language acquisition?” In S. M. Gass and J. Schachter, 41–68. Bley-Vroman, R. 1994. “Updating the Fundamental Difference Hypothesis: Rationales and principles of UG.” Eurosla 4, Université d’Aix-en-Province, November, 1994, Aix-en-Provence France Bley-Vroman, R., and Chaudron, C. 1990. “Second language processing of subordinate clauses and anaphora — First language and universal influences: A review of Flynn’s research.” Language Learning 40(2): 245–85. Bloch, H., and Bertenthal, B. I. (eds.) 1990. Sensory motor organisations and development in infancy and early childhood. Dordrecht: Kluwer. Bloom, P. 1990. “Subjectless sentences in child language.” Linguistic Inquiry 21(4): 419–504. Blum-Kulka, S., and Levenston, E. A. 1983. “Universals of lexical simplification.” In C. Faerch and G. Kasper, 119–39. Bohannon, J. N., and Stanowicz, L. 1988. “The issue of negative evidence: Adult responses to children’s language errors.” Developmental Psychology 24(5): 684–89. Bond, Z. S. 1981. “From an acoustic stream to a phonological representation: The perception of fluent speech.” In N. I. Lass, 375–410. Bond, Z. S., and Robey, R. R. 1983. “The phonetic structure of errors in the perception of fluent speech.” In N. I. Lass, 249–83. Borer, H. 1984. Parametric syntax: Case studies in Semitic and Romance languages. Foris: Dordrecht.
REFERENCES
405
Borer, H. and Wexler, K. 1987. “The maturation of syntax.” In T. Roeper and E. Williams, 123–72. Borer, H., and Wexler, K. 1992. “Bi-unique relations and the maturation of grammatical principles.” Natural Language and Linguistic Theory 10(2): 147–89. Bowerman, M. 1978. “Systematizing semantic knowledge: Changes over time in the child’s organization of meaning.” Child Development 49(4): 977–87. Bowerman, M. 1981a. “Beyond communicative adequacy: From piecemeal knowledge to an integrated system in the child’s acquisition of language.” Papers and Reports on Child Language Development. No. 20: 1–24, (Department of Linguistics, Stanford University). Bowerman, M. 1981b. “The child’s expression of meaning: Expanding relationships among lexicon, syntax, and morphology.” In H. Winitz, 172–89. Bowerman, M. 1982a. “Reorganizational processes in lexical and syntactic development.” In E. Wanner and L. R. Gleitman, 319–46. Bowerman, M. 1982b. “Starting to talk worse: Clues to language acquisition from children’s late speech errors.” In S. Strauss, 101–45. Bowerman, M. 1987. “Commentary: Mechanisms of language acquisition.” In B. MacWhinney (1987c), 443–66. Bowers, J. 1993. “The syntax of predication.” Linguistic Inquiry 24(4): 591–676. Braine, M. D. S. 1963. “On learning the grammatical order of words.” Psychological Review 70(4): 323–48. Braine, M. D. S. 1966. “Learning the positions of words relative to a marker element.” Journal of Experimental Psychology 72(4): 532–540. Braine, M. D. S. 1971. “On two types of models of the internalization of grammars.” In D. I. Slobin, 153–86. Braine, M. D. S. 1987. “What is learned in acquiring word classes — a step toward an acquisition theory.” In B. MacWhinney, 65–87. Brentari, D., Larson, G., and MacLeod, L. (eds.) 1992. The joy of grammar. Amsterdam: John Benjamins. Bresnan, J. 1978. “A realistic transformational grammar.” In M. Halle, J. Bresnan, and G. A. Miller, 1–59. Bresnan, J. (ed.). 1982. The mental representation of grammatical relations. Cambridge, Ma: MIT Press. Brière, E. 1966. “An investigation of phonological interference.” Language 42(4): 769–96. Brock, C., Crookes, G., Day, R., and Long, M. 1986. “The differential effects of corrective feedback in native speaker-non-native-speaker conversation.” In R. Day, 229–36. Broen, P. 1972. “The verbal environment of the language learning child.” ASHA Monograph 17. Broselow, E. 1983. “Non-obvious transfer: On predicting epenthesis errors.” In S. M. Gass and L. Selinker, 269–80. Reprinted in G. Ioup and S. Weinberger, 292–304.
406
REFERENCES
Broselow, E. 1984. “An investigation of transfer in second language phonology.” IRAL 22: 253–69. Reprinted in G. Ioup and S. Weinberger, 261–78. Brown, R. 1973. A first language: The early stages. Cambridge, Ma: Harvard University Press. Brown, R., and Hanlon, C. 1970. “Derivational complexity and order of acquisition in child speech.” In J. R. Hayes, 155–207. Cadierno, T. 1992. Explicit instruction in grammar: A comparison of input based and output based instruction in second language acquisition. Doctoral dissertation, University of Illinois at Urbana-Champaign. Cadierno, T. 1995. “Formal instruction from a processing perspective: An investigation into the Spanish past tense.” Modern Language Journal 79(2): 179–93. Campos, H., and Kemchinsky, P. (eds.) 1996. Evolution and revolution in linguistic theory: Studies in honour of Carlos P. Otero. Washington, D.C: Georgetown University Press. Cancino, H., Rosansky, E., and Schumann, J. 1974. “Testing hypotheses about second language acquisition: The copula and the negative in three subjects.” Working Papers in Bilingualism 6: 80–96. Caplan, D. (ed.). 1980. Biological studies of mental processes. Cambridge, Ma: MIT Press. Caplan, D., and Hildebrandt, N. 1988. Disorders of syntactic comprehension. Cambridge, Ma: MIT Press. Caramazza, A., Yeni-Komshian, G., Zurif, E., and Carbone, E. 1973. “The acquisition of a new phonological contrast: The case of stop consonants in French-English bilinguals.” Journal of the Acoustical Society of America 54: 421–28. Carey, S. 1978. “The child as word learner.” In M. Halle et al., 264–93. Carey, S. 1985. Conceptual change in childhood. Cambridge, Ma: MIT Press. Carlson, G. N., and Tanenhaus, M. K. (eds.) 1989. Linguistic structure in language processing. Dordrecht: Kluwer Academic Publishers. Carr, T. H., and Curran, T. 1994. “Cognitive factors in learning about structured sequences: Applications to syntax.” Studies in Second Language Acquisition 16(2): 205–30. Carroll, S. 1986. “Reflexives and the dependency relation ‘R’.” Canadian Journal of Linguistics 31(1): 1–43. Carroll, S. 1989. “Second language acquisition and the computational paradigm.” Language Learning 39(4): 535–94. Carroll, S. 1992. “On cognates.” Second Language Research 8(2): 93–119. Carroll, S. 1995a. “The hidden danger in computer modelling: Remarks on Sokolik and Smith’s connectionist learning model of French gender.” Second Language Research 11(2): 193–205. Carroll, S. 1995b. “The Irrelevance of verbal feedback to language learning.” In L. Eubank et al., 73–88. Carroll, S. 1997. Die phonologische-syntaktische Schnittstelle: Experimente über prosodische Wahrnehmung, und die Verarbeitung und das Lernen von syntaktischen Einheiten beim Zweitsprachenerwerb. Project proposal to the Deutsche Forschungsgemeinschaft, July, 1997, Universität Potsdam.
REFERENCES
407
Carroll, S. 1999a. “Adults’ sensitivity to different sorts of input.” Language Learning 49(1): 37–92. Carroll, S. 1999b. “Putting ‘input’ in its proper place.” Second Language Research 15(4): 337–88. Carroll, S., Roberge, Y., and Swain, M. 1992. “The role of feedback in adult second language acquisition: error correction and morphological generalizations.” Applied Psycholinguistics 13(2): 173–198. Carroll, S., and Swain, M. 1991. “Negative evidence in second language learning”. Paper to the 11th Second Language Research Forum, University of Southern California, February 28-March 3, 1991. Carroll, S., and Swain, M. 1993. “Explicit and implicit negative feedback: An empirical study of the learning of linguistic generalisations.” Studies in Second Language Acquisition 15(2): 357–86. Cassidy, K. W., and Kelly, M. H. 1991. “Phonological information for grammatical category assignments.” Journal of Memory and Language 30(3): 348–69. Cazden, C., Cancino, H., Rosansky, E., and Schumann, J. 1975. Second language acquisition sequences in children, adolescents and adults. Final report to the U.S. Department of Health, Education and Welfare, National Institute of Education Office of Research Grants (grant no. NE-6–00–3–014). Chambers, J. 1995. Sociolinguistic theory. Oxford: Blackwell. Charvillat, A., and Kail, M. 1991. “The status of ‘canonical SVO sentences’ in French: A developmental study of the on-line processing of dislocated sentences”. Journal of Child Language 18(3): 591–608. Chaudron, C. 1977. “A descriptive model of discourse in the corrective treatment of learners’ errors.” Language Learning 27(1): 29–46. Chaudron, C. 1986. “Teachers’ priorities in correcting learners’ errors in French immersion classes.” In R. Day, 64–84. Chaudron, C. 1988. Second language classrooms: Research on teaching and learning. Cambridge: Cambridge University Press. Cheng, A. 1995. Grammar instruction and input processing: The acquisition of Spanish ser and estar. Doctoral dissertation, University of Illinois at Urbana-Champaign. Cheng, P., and Holyoak, K. J. 1985. “Pragmatic reasoning schemas.” Cognitive Psychology 17(3): 391–416. Cheng, P., Holyoak, K. J., Nisbett, R. E., and Oliver, L. 1986. “Pragmatic versus syntactic approaches to training deductive reasoning.” Cognitive Psychology 18(3): 293–328. Reprinted in R. E. Nisbett (1993b), 165–203. Cheng, P., and Nisbett, R. E. 1993. “Pragmatic constraints on causal deduction.” In R. E. Nisbett (1993b), 207–27. Chien, Y. C., and Wexler, K. 1990. “Children’s knowledge of locality conditions in binding as evidence for the modularity of syntax and pragmatics.” Language Acquisition 1(3): 225–95. Chomsky, C. 1969. The acquisition of syntax in children from 5 to 10. Cambridge, MA: MIT Press.
408
REFERENCES
Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, Ma: MIT Press. Chomsky, N. 1970. “Remarks on nominalizations.” In R. Jacobs and P. Rosenbaum, 184–221. Chomsky, N. 1975. Reflections on language. New York: Pantheon. Chomsky, N. 1980. Rules and representations. Oxford: Blackwell. Chomsky, N. 1981a. Lectures on government and binding. Dordrecht: Foris. Chomsky, N. 1981b. “Principles and parameters in syntactic theory.” In N. Hornstein and D. Lightfoot, 32–75. Chomsky, N. 1982. Some concepts and consequences of the theory of government and binding. Cambridge, Ma: MIT Press. Chomsky, N. 1986. Knowledge of language: its nature, origin and use. New York: Praeger. Chomsky, N. 1988. Language and problems of knowledge: The Managua lectures. Cambridge, Ma: MIT Press. Chomsky, N. 1991a. “Linguistics and adjacent fields: A personal view.” In A. Kasher, 3–25. Chomsky, N. 1991b. “Linguistics and cognitive science: Problems and mysteries.” In A. Kasher, 26–53. Chomsky, N. 1991c. “Some notes on economy of derivation and representation.” In R. Freidin, 417–54. Reprinted in N. Chomsky, 1995. Chomsky, N. 1994. “Bare phrase structure.” MIT Occasional Papers in Linguistics 5. Cambridge, Ma: MIT Working Papers in Linguistics. Chomsky, N. 1995. The minimalist program. Cambridge, Ma: MIT Press. Chomsky, N., and Halle, M. 1968. The sound pattern of English. New York, N.Y: Harper and Row. Chomsky, N., and Lasnik, H. 1977. “Filters and control.” Linguistic Inquiry 8(3): 425–504 Chomsky, N., and Lasnik, H. 1995. “The theory of principles and parameters.” In N. Chomsky, 13–127. Chun, A., Day, R., Chenoweth, A., and Luppescu, S. 1982. “Errors, interaction, and correction: A study of native-nonnative conversations.” TESOL Quarterly 16(3): 537–47. Clahsen, H. 1982. Spracherwerb in der Kindheit. Tübingen: Narr. Clahsen, H. 1984. “The acquisition of German word order: A test case for cognitive approaches to L2 development.” In R. W. Andersen, 219–42. Clahsen, H. 1987. “Connecting theories of language processing and (second) language acquisition.” In C. Pfaff, 103–16. Clahsen, H. 1988. “Parameterised grammatical theory and language acquisition: A study of verb placement and inflection by children and adults.” In S. Flynn and W. O’Neil, 47–75. Clahsen, H. 1989. “The grammatical characterisation of developmental dysphasia.” Linguistics 27(5): 897–920. Clahsen, H. 1990/1991. “Constraints on parameter setting: A grammatical analysis of some acquisition stages in German child language.” Language Acquisition 1(4): 361–91.
REFERENCES
409
Clahsen, H. 1991. Child language and developmental dysphasia: Linguistic studies in the acquisition of German. Amsterdam: John Benjamins. Clahsen, H. 1995. Paper to the 13th Hamburger Kognitionskolloquium: Sprachentwicklung und Sprachentwicklungsstörungen. Universität Hamburg, February 3–4, 1995. Clahsen, H., and Hong, U. 1995. “Agreement and null subjects in German L2 development: New evidence from reaction-time experiments.” Second Language Research 11(1): 57–87. Clahsen, H., Meisel, J. M., and Pienemann, M. 1983. Deutsch als Zweitsprache. Der Spracherwerb ausländischer Arbeiter. Tübingen: Narr. Clahsen, H., and Muysken, P. 1986. “The accessibility of universal grammar to adult and child learners: A study of the acquisition of German word order.” Second Language Research 2(2): 93–119. Clahsen, H., and Muysken, P. 1989. “The UG paradox in L2 acquisition.” Second Language Research 5(1): 1–29. Clahsen, H., and Penke, M. 1992. “The acquisition of agreement morphology and its syntactic consequences: New evidence on German child language from the Simonecorpus.” In J. Meisel, 181–224. Clahsen, H., Rothweiler, M., Woest, A., and Marcus, G. 1992. “Regular and irregular inflection in the acquisition of German noun plurals.” Cognition 45(2): 225–55. Clahsen, H., and Smolka, K.-D. 1986. “Psycholinguistic evidence and the description of V2 in German.” In H. Haider and M. Prinzhorn, 137–67 Clark, H. H. 1969. “Linguistic processes in deductive reasoning.” Psychological Review 76(2): 387–404. Clark, H. H. 1977. “Linguistic processes in deductive reasoning.” In P. N. Johnson-Laird and P. C. Wason, 1977b, 98–113. Clark, J., and Yallop, C. 1990. An introduction to phonetics and phonology. Oxford: Blackwell. Clark, R. 1989. “On the relationship between input data and parameter setting.” NELS 19: 48–62. GLSA, University of Massachusetts, Amherst. Clark, R. 1992. “The selection of syntactic knowledge.” Language acquisition: A journal of developmental linguistics 2(2): 83–149. Clements, N. 1985. “The geometry of phonological features.” Phonology 2: 225–52. Clements, N., and Hume, E. V. 1995. “The internal organisation of speech sounds.” In J. A. Goldsmith, 245–306. Cole, P., and Morgan, J. L. (eds.) 1975. Syntax and semantics, Vol. III, Speech acts. New York: Academic Press. Cole, R. A., and Jakimik, 1978. “Understanding speech: How words are heard.” In G. Underwood, 67–116. Comrie, B. 1981. Language universals and linguistic typology. Oxford: Blackwell. Cook, V. (ed.) 1986. Experimental approaches to second language learning. Oxford: Pergamon Press. Cooper, W., and Paccia-Cooper, J. 1980. Syntax and speech. Cambridge, Ma: Harvard University Press.
410
REFERENCES
Cooper, W. E., and Walker, E. C. T. (eds.), 1979. Sentence processing: Psycholinguistic studies presented to Merrill Garrett. Hillsdale, N.J.: Erlbaum. Coppetiers, R. 1987. “Competence differences between native and fluent non-native speakers.” Language 63(3): 544–73. Corder, S. P. 1967. “The significance of learners’ errors.” IRAL 5(4): 161–70. Corder, S. P. 1978. “Language-learner language.” In J. C. Richards, 71–93. Courchene, R. 1980. “The error analysis hypothesis, the contrastive analysis hypothesis, and the correction of error in the second language classroom.” TESL Talk 11(2): 3–13, and 11(3): 10–29. Craig, C. G. (ed.) 1986. Noun classes and categorisation: Proceedings of a symposium on categorisation and noun classification, Eugene, Oregon, October 1983. Amsterdam: John Benjamins. Craik, F. I. M., and Tulving, E. 1975 “Depth of processing and the retention of words in episodic memory.” Journal of Experimental Psychology: General 104: 268–294. Crain, S. 1991. “Language acquisition in the absence of experience.” Behavioral and Brain Sciences 14(4): 597–650. Crain, S., and Steedman, M. 1985. “On not being led up the garden path: the use of context by the psychological syntax processor.” In D. R. Dowty et al., 320–58. Crain, S., Thornton, R., Boster, C., Conway, L., Lillo-Martin, D., and Woodams, E. 1996. “Quantification without qualification.” Language Acquisition 5(2): 83–153. Crookes, G., and Rulon, K. 1985. “Incorporation of corrective feedback in native speaker/ non-native speaker conversation.” Technical report No. 3. Honolulu: Centre for Second Language Classroom Research, Social Science Research Institute, University of Hawaii. Crystal, D. 1967. “English, in word classes.” Lingua 17(1): 24–56. Crystal, D. 1969. Prosodic systems and intonation in English. Cambridge: Cambridge University Press. Culicover, P., and Rochemont, M. 1983. “Stress and focus in English.” Language 59(1): 123–65. Culicover, P., and Wilkins, W. 1984. Locality in linguistic theory. Orlando, Fla: Academic Press. Cummins, R. 1983. The nature of psychological explanation. Cambridge, Ma: MIT Press. Curtiss, S. 1977. Genie: A psycholinguistic study of a modern day “wild child”. New York: Academic Press. Curtiss, S. 1982. “Developmental dissociations of language and cognition.” In L. Obler and L. Menn, 285–312. Curtiss, S. 1988. “Abnormal language acquisition and the modularity of language.” In F. J. Newmeyer, Vol. II, 96–116. Cutler, A. (ed.) 1982a. Speech errors: A classified bibliography. Bloomington, Ind: Indiana University Linguistics Club. Cutler, A. (ed.) 1982b. Slips of the tongue and language production. Amsterdam: Mouton. Cutler, A., Dahan, D., and van Donselaar, D. 1997. “Prosody in the comprehension of spoken language. A literature review.” Language and Speech 40(1): 141–201.
REFERENCES
411
Cutler, A., and Fay, D. A. 1982. “One mental lexicon, phonologically arranged: Comments on Hurford’s comments.” Linguistic Inquiry 13(1): 107–13. Davidson, D. 1967. “The logical form of action sentences.” In N. Rescher, 81–95. Reprinted in Davidson, 1980. Davidson, D. 1980. Essays on actions and events. Oxford: Clarendon Press. Davies, A., Criper, C., and Howatt, A. P. R. (eds.), 1984. Interlanguage. Edinburgh: Edinburgh University Press. Davis, S. (ed.) 1992. Connectionism: Theory and practice. Oxford: Oxford University Press. Day, R. (ed.). 1986. Talking to learn. Conversation in Second Language Acquisition. Rowley, Ma: Newbury House. Day, R., Chenoweth, A., Chun, A., and Luppescu, S. 1984. “Corrective feedback in native-nonnative discourse.” Language Learning 34(1): 19–45. de Bot, K., Cox, A., Ralston, S., Schaufeli, A., and Weltens, B. 1995. “Lexical processing in bilinguals.” Second Language Research 11(1): 1–19. de Boysson-Bardies, B., Vihman, M. M., Roug-Hellichius, L., Durand, C., Landberg, I., and Arao, F. 1992. “Material evidence of infant selection from the target language: A cross-linguistic phonetic study.” In C. A. Ferguson et al., 369–91. Dechert, H. W., and Raupach, M. (eds.) 1989. Transfer in language production. Norwood, N.J.: Ablex. de Graaf, R. 1997a. “The eXperanto experiment: Effects of explicit instruction on second language acquisition.” Studies in Second Language Acquisition 19(2): 249–76. de Graaf, R. 1997b. Differential effects of explicit instruction on second language acquisition. Doctoral dissertation, Vrije Universiteit, Amsterdam. Distributed through the Holland Institute of Generative Linguistics dissertation series, 35. de Groot, A. B. M., and Nas, G. 1991. “Lexical representation of cognates and noncognates in compound bilinguals.” Journal of Memory and Language 30(1): 90–123. de Groot, A. B. M., and Kroll, J. F. (eds.) 1997. Tutorials in bilingualism: Psycholinguistic perspectives. Hillsdale, N.J.: Erlbaum. DeKeyser, R. M. 1995. “Learning second language grammar rules: An experiment with a miniature linguistic system.” Studies in Second Language Acquisition 17(3): 379–410. DeKeyser, R. M. 1997. “Beyond explicit learning: Automatizing second language morphosyntax.” Studies on Second Language Acquisition 19(2): 195–221. Demetras, M. J., Post, K. N., and Snow, C. E. 1986. “Feedback to first language learners: The role of repetitions and clarification questions.” Journal of Child Language 13(2): 275–92. DeVries, R. 1969. “Constancy of genetic identity in the years three to six.” Society for Research in Child Development Monographs 34 (No. 127). Dinnsen, D. A. 1992. “Variation in developing and fully developed phonetic inventories.” In C. A. Ferguson et al., 191–210. Doughty, C. 1991. “Second language instruction does make a difference: Evidence from an empirical study of SL relativization.” Studies in Second Language Acquisition 13(4): 431–69.
412
REFERENCES
Dowty, D. R., Karttunen, L., and Zwicky, A. M. (eds.) 1985. Natural language parsing: Psychological, computational, and theoretical perspectives. Cambridge: Cambridge University Press. Drapeau, L. 1980. “À propos d’une théorie de la morphologie basée sur le mot.” Annual meeting of the Canadian Linguistic Association/Association canadienne de linguistique, Montréal, Canada, June 3, 1980. Dresher, B. E. 1999. “Charting the learning path: Cues to parameter setting.” Linguistic Inquiry 30(1): 27–67. Dresher, B. E., and Kaye, J. D. 1990. “A computational learning model for metrical phonology”. Cognition 34: 137–95. Dressler, W. U., and Pfeiffer, O. E. (eds.). 1976. Phonologica 1976 Akten der dritten Internationalen Phonologie-Tagung. Vienna, 1–4 September 1976. Innsbruck. Drozd, K., 1996. “L’acquisition de la négation discursive.” L’acquisition de la négation et de la portée en L1, L2, et en pathologie du langage. Université de la Sorbonne Nouvelle-Paris III, Paris, 12 April 1996. Dulay, H., Burt, M., and Krashen, S. 1982. Language two. New York: Oxford University Press. DuPlessis, J., Solin, D., Travis, L., and White, L. 1987. “UG or not UG, that is the question: A reply to Clahsen and Muysken.” Second Language Research 3(1): 56–75. Echols, C. H. 1988. “The role of stress, position and intonation in the representation and identification of early words.” Papers and reports on child language development 27: 39–46. Eckman, F., Bell, L.H, and Nelson, D. (eds.) 1984. Universals of second language acquisition. Rowley, Ma: Newbury House. Eckman, F., Moravscik, E. A., and Wirth, J. R. (eds.) 1986. Markedness. New York: Plenum Press. Eckman, F., Bell, L. H., and Nelson, D. 1988. “On the generalization of relative clause instruction in the acquisition of English as a second language.” Applied Linguistics 9(1): 1–20. Eckman, F., Highland, D., Lee, P. W., Mileham, J., and Weber, R. R. (eds.) 1995. Second language acquisition and pedagogy. Hillsdale, N.J.: Erlbaum. Eimas, P. D. 1974. “Auditory and linguistic units of processing of cues for place of articulation by infants.” Perception and Psychophysics 16(4): 513–21. Eimas, P. D. 1975. “Auditory and phonetic coding of the cues for speech: Discrimination of the [r-l] distinction by young infants.” Perception and Psychophysics 18(3): 341–47. Eimas, P. D. 1985. “Constraints on a model of infant speech perception.” In J. Mehler and R. Fox, 185–98. Eimas, P. D., Siqueland, E. R., Jusczyk, P., and Vigorito, J. 1971. “Speech perception in infants.” Science 171: 303–6. Eisenstein, M. (ed.) 1989. Variation and second language acquisition: Empirical approaches. New York: Plenum.
REFERENCES
413
Elbers, L., and Wijnen, F. 1992. “Effort, production skill, and language learning”. In C. A. Ferguson et al, 337–68. Ellis, R. 1984. Classroom second language development: A study of classroom interaction and language acquisition. Oxford: Pergamon Press. Ellis, R. 1985. “A variable competence model of second language acquisition.” IRAL 23: 47–59. Ellis, R. 1989. “Are classroom and naturalistic acquisition the same? A study of the classroom acquisition of German word order rules.” Studies in Second Language Acquisition 11(3): 305–28. Ellis, R. 1990. Instructed second language acquisition. Oxford: Blackwell. Ellis, W. D. (ed.) 1938. A source book of Gestalt psychology. London: Routledge and Kegan Paul. Emonds, J. 1976. A transformational approach to English syntax. New York: Academic Press. Emonds, J. 1978. “The verbal complex V′–V in French.” Linguistic Inquiry 9(1): 151–75 Emonds, J. 1985. A unified theory of syntactic categories. Dordrecht: Foris. Emmorey, K. D., and Fromkin, V. A. 1988. “The mental lexicon.” In F. J. Newmeyer, Vol. III, 124–49. Epstein, S. D., Flynn, S., and Martohardjono, G. 1996. “Second language acquisition: Theoretical and experimental issues in contemporary research.” Behavioral and Brain Sciences 19(4): 677–758. Eubank, L. 1987. “The acquisition of German negation by formal language learners.” In VanPatten et al., 33–51. Eubank, L. (ed.) 1991. Point counterpoint: Universal grammar in the second language. Amsterdam: John Benjamins. Eubank, L. 1992. “Verb movement, agreement, and tense in L2 acquisition.” In J. M. Meisel, 225–44. Eubank, L. 1993/1994. “On the transfer of parametric values in L2 development.” Language Acquisition 3(3): 183–208. Eubank, L., and Gregg, K. R. 1995. “ ‘Et in amygdala ego’? UG, (S)LA, and neurobiology.” Studies in Second Language Acquisition 17(1): 35–57. Eubank, L., and Schwartz, B. D. (eds.) 1992. Second Language Research 12(1), special issue on The L2 initial state. Eubank, L., Selinker, L., and Sharwood Smith, M. (eds.) 1995. The current state of interlanguage: Studies in honor of William E. Rutherford. Amsterdam: John Benjamins. Eysenck, M. W., and Keane, M. T. 1995. Cognitive psychology: A student’s handbook. 3rd edition. Hove, U.K: Erlbaum. Faerch, C., Haastrup, K., and Phillipson, R. 1984. Learner language and language learning. Clevedon, England: Multilingual Matters. Faerch, C., and Kasper, G., (eds.) 1983. Strategies in interlanguage communication. London: Longman. Fanselow, J. F. 1977. “The treatment of error in oral work.” Foreign Language Annals 10: 583–93.
414
REFERENCES
Felix, S. 1981. “On the (in)applicability of Piagetian thought to language learning.” Studies in Second Language Acquisition 3(2): 179–92. Felix, S. 1984. “Maturational aspects of Universal Grammar.” In A. Davies et al., 133–61. Felix, S. 1985. “More evidence on competing cognitive systems.” Second Language Research 1(1): 47–72. Felix, S. 1986. Cognition and language growth. Dordrecht: Foris. Ferguson, C. A., Menn, L., and Stoel-Gammon, C. (eds.) 1992. Phonological development: Models, research, implications. Timonium, Md: York Ferguson, C. A., and Moravcsik, E. (eds.) 1978. Universals of human language. 4 vols. Stanford, Ca: Stanford University Press. Ferguson, C. A., and Slobin, D. I. (eds.) 1973. Studies of child language development. New York: Holt, Rinehart and Winston. Fernald, A. 1985. “Four-month-old infants prefer to listen to motherese.” Infant Behavior and Development 8(2): 181–95. Fillmore, C. 1968. “The case for case.” In E. Bach and R. T. Harms, 1–90. Finer, D. 1991. “Binding parameters in second language acquisition.” In L. Eubank, 351–74. Fink, R. 1985. “French adjective morphophonemic patterns: Their generalisation and representation.” Linguistics 23(3): 567–96. Flege, J. E. 1981. “The phonological basis of foreign accent.” TESOL Quarterly 15: 443–55. Flege, J. E. 1984. “The detection of French accent by American listeners.” Journal of the Acoustical Society of America 76(3): 692–707. Flege, J. E. 1987. “The production of ‘new’ and ‘similar’ phones in a foreign language: Evidence for the effect of equivalence classification.” Journal of Phonetics 15(1): 47–65. Flege, J. E. 1991. “Perception and production: The relevance of phonetic input to L2 phonological learning.” In T. Huebner and C. A. Ferguson, 249–89. Flege, J. E. 1992. “Speech learning in a second language.” In C. A. Ferguson et al, 565–604. Flege, J. E. 1993. “Production and perception of a novel, second-language phonetic contrast.” Journal of the Acoustical Society of America 93(3): 1589–1608. Flege, J. E., and Eefting, W. 1986. “Linguistics and developmental effects on the production and perception of stop consonants.” Phonetica 43(1): 155–71. Flege, J. E., and Eefting, W. 1987. “The production and perception of English stops by Spanish speakers of English.” Journal of Phonetics 15(1): 67–83. Flege, J., and Eefting, W. 1988. “Imitation of a VOT continuum by native speakers of English and Spanish: Evidence for phonetic category formation.” Journal of the Acoustical Society of America 83(2): 729–40. Flege, J. E., and Munro, M. 1994. “The word unit in L2 speech production and perception.” Studies in Second Language Acquisition 16(4): 381–411.
REFERENCES
415
Flege, J. E., Frieda, E. M., Walley, A. C., and Randazza, L. A. 1998. “Lexical factors and segmental accuracy in second language production.” Studies in Second Language Acquisition 20(2): 155–87. Fletcher, P., and MacWhinney, B. (eds.) 1995. The handbook of child language. Oxford, U.K: Basil Blackwell. Flynn, S. 1983. Study of the effects of principal branching direction in second language acquisition: The generalisation of a parameter of universal grammar from first to second language acquisition. Doctoral dissertation, Cornell University. Flynn, S. 1987. A parameter-setting model of L2 acquisition: Experimental studies in anaphora. Dordrecht: Reidel. Flynn, S. 1988. “Second language acquisition and grammatical theory.” In F. J. Newmeyer, Vol. II, 53–73. Flynn, S. 1989a.” Spanish, Japanese and Chinese speakers’ acquisition of English relative clauses: new evidence for the head-direction parameter.” In K. Hyltenstam and L. K. Obler, 116–31. Flynn, S. 1989b. “The role of the head-initial/head-final parameter in the acquisition of English relative clauses by adult Spanish and Japanese speakers.” In S. M. Gass and J. Schachter, 89–108. Flynn, S., and Espinal, I. 1985. “Head-initial/head-final parameter in adult Chinese L2 acquisition of English.” Second Language Research 1(2): 93–117. Flynn, S., and O’Neil, W. (eds.), 1988. Linguistic theory in second language acquisition. Dordrecht: Kluwer. Fodor, J. A. 1980. “Fixation of belief and concept acquisition.” In M. Piattelli-Palmarini, 143–9. Fodor, J. A. 1983. The modularity of mind. Cambridge, Ma: MIT Press. Fodor, J. D. 1990. “Thematic roles and modularity: Comments on the chapters by Frazier and Tanenhaus et al.” In G. T. M. Altmann, 434–56. Fodor, J. D. 1998a. “Learning to parse?” Journal of Psycholinguistic Research 27(2): 285–319. Fodor, J. D. 1998b. “Parsing to learn.” Journal of Psycholinguistic Research 27(3): 339–74. Fodor, J. D. 1998c. “Unambiguous triggers.” Linguistic Inquiry 29(1): 1–36. Forrest-Presley, D. L., MacKinnon, G. E., and Waller, T. G. (eds.) 1985. Metacognition, cognition, and human performance. Volume 1: Theoretical perspectives. Orlando: Academic Press. Forster, K. I. 1979. “Levels of processing and the structure of the language processor.” In W. E. Cooper and E. C. T. Walker, 27–85. Foss, B. (ed.) 1966. New horizons in psychology. Harmondsworth, England: Penguin. Frauenfelder, U. H., and Tyler, L. K. (eds.) 1987. Spoken word recognition. Cambridge, Ma: Bradford Books/MIT Press. Reprinted from Cognition 25 (1987). Frazier, L. 1978. On comprehending sentences: Syntactic parsing strategies. Doctoral dissertation, University of Connecticut. Distributed by the Indiana University Linguistics Club.
416
REFERENCES
Frazier, L. 1990. “Exploring the architecture of the language-processing system.” In G. T. M. Altmann, 409–33. Frazier, L., and de Villiers, J. (eds.) 1990. Language processing and language acquisition. Dordrecht: Kluwer. Freidin, R. (ed.). 1991. Principles and parameters in comparative grammar. Cambridge, Ma: MIT Press. Fromkin, V. A. 1971. “The non-anomalous nature of anomalous utterances.” Language 47(1): 27–52. Fromkin, V. A. (ed.) 1980. Errors in linguistic performance: Slips of the tongue, ear, pen and hand. New York: Academic Press. Fujimura, O. (ed.) 1973. Three dimensions of linguistic theory. Tokyo: TEC Corporation. Galloway, E., and Richards, B. J. (eds.) 1994. Input and interaction in language acquisition. Cambridge: Cambridge University Press. Galloway, L. M. 1981. “The convolutions of second language: A theoretical article with a critical review and some new hypotheses towards a neuropsychological model of bilingualism and second language performance.” Language Learning 31(2): 439–464. Gardner, R. C. 1985. Social psychology and second language learning: The role of attitudes and motivation. London: Edward Arnold. Garfield, J. L. (ed.). 1987. Modularity in knowledge representation and natural-language understanding. Cambridge, Ma: Bradford books/MIT Press. Garnes, S., and Bond, Z. S. 1975. “Slips of the ear: Errors in perception of casual speech.” In R. E. Grossman et al., 214–25. Garnes, S., and Bond, Z. S. 1976. “The relationship between semantic expectation and acoustic information.” In W. U. Dressler and O. E. Pfeiffer, 285–93. Gass, S. M. 1979. “Sentence processing by L2 learners.” Studies in Second Language Acquisition 2(1): 85–98. Gass, S. M. 1987. “The resolution of conflicts among competing systems: A bidirectional perspective.” Applied Linguistics 8(3): 329–50. Gass, S. M. 1988. “Integrating research areas: A framework for second language studies.” Applied Linguistics 9(2): 198–217. Gass, S. M., MacKay, A., and Pica, T. 1998. “The role of input and interaction in second language acquisition. Introduction to the special issue.” The Modern Language Journal 82(3): 299–307. Gass, S. M. and Madden, C. (eds.) 1985. Input in second language acquisition. Rowley, Ma: Newbury House. Gass, S. M., and Schachter, J. (eds.) 1989. Linguistic perspectives on second language acquisition. Cambridge: Cambridge University Press. Gass, S. M., and Selinker, L. (eds.) 1983. Language transfer in language learning. Rowley, MA: Newbury House. Gass, S. M., and Selinker, L. (eds.) 1992. Language transfer in language learning. Amsterdam: John Benjamins. Gass, S. M., and Selinker, L. 1994. Second language acquisition: An introductory course. Hove, U.K: Erlbaum.
REFERENCES
417
Gass, S. M., and Varonis, E. M. 1985a. “Negotiation of meaning in NNS/NNS interactions.” In S. M. Gass and C. Madden, 149–61. Gass, S. M., and Varonis, E. M. 1985b. “Variation in native speaker speech modification to non-native speakers.” Studies in Second Language Acquisition 7(1): 37–57. Gass, S. M., and Varonis, E. M. 1989. “Incorporated repairs in non-native discourse.” In M. Eisenstein, 71–86. Gass, S. M., and Varonis, E. M. 1994. “Input, interaction, and second language production.” Studies in Second Language Acquisition 16(3): 283–302. Gasser, M. 1990. “Connectionism and universals of second language acquisition.” Studies in Second Language Acquisition 12(2): 179–99. Gazdar, G., Klein, E., Pullum, G., and Sag, I. 1985. Generalised Phrase Structure Grammar. Cambridge, Ma: Harvard University Press. Gelman, R. 1990a. “First principles organise attention to and learning about relevant data: Number and animate-inanimate distinction as examples.” Cognitive Science 14(1): 79–106. Gelman, R. 1990b. “Structural constraints on cognitive development.” Cognitive Science 14(1): 39 Gelman, R., and Gallistal, C. 1979. The young child’s understanding of numbers: A window on early cognitive development. Cambridge, Ma: Harvard University Press. Gentner, D. 1982. “Why nouns are learned before verbs: Linguistic relativity vs. natural partitioning.” In S. Kuczaj, 301–34. Gerken, L.-A. 1996. “Prosody’s role in language acquisition and adult parsing.” Journal of Psycholinguistic Research 25(2): 345–56. Gerken, L.-A., Landau, B., and Remez, R. E. 1990. “Function morphemes in young children’s speech perception and production.” Developmental Psychology 26(2): 204–16. Geschwind, N., and Galaburda, A. M. 1987. Cerebral lateralization: Biological mechanisms, associations, and pathology. Cambridge, Ma: MIT Press. Gibson, E. 1992. “On the adequacy of the competition model.” Language 68(4): 812–30. Gibson, E., and Wexler, K. 1994. “Triggers.” Linguistic Inquiry 25(3): 407–54. Gingras, R. (ed.) 1978. Second language acquisition and foreign language teaching. Arlington, Va: Center for Applied Linguistics. Gleitman, L. R. 1990. “The structural sources of verb meanings.” Language Acquisition 1(1): 3–55. Gleitman, L. R., Gleitman, H., Landau, B., and Wanner, E. 1988. “Where learning begins: initial representations for language learning.” In F. J. Newmeyer, Vol. III, 150–93. Gleitman, L., Newport, E. L., and Gleitman, H. 1984. “The current status of the motherese hypothesis.” Journal of Child Language 11(1): 43–79. Gleitman, L. R., and Wanner, E. 1982. “Language acquisition: The state of the state of the art.” In E. Wanner and L. R. Gleitman, 3–48. Gold, E. M. 1967. “Language identification in the limit.” Information and Control 16: 447–474.
418
REFERENCES
Goldin-Meadow, S. 1991. “Is ‘innate’ another name for ‘developmentally resilient’? Commentary, pp. 619–20, to Stephen Crain’s Language acquisition in the absence of experience.” Behavioral and Brain Sciences 14(4): 597–650. Goldin-Meadow, S., and Mylander, C. 1984. “Gestural communication in deaf children: The effects and non-effects of parental input on early language development.” Monographs of the Society for Research in Child Development 49: 1–121. Goldin-Meadow, S., and Mylander, C. 1990a. “Beyond the input given: The child’s role in the acquisition of language.” Language 66: 323–55. Goldin-Meadow, S., and Mylander, C. 1990b. “The role of parental input in the development of a morphological system.” Journal of Child Language 17: 527–63. Goldsmith, J. A. 1990. Autosegmental and metrical phonology. Oxford: Blackwell. Goldsmith, J. 1992. “Local modelling in phonology.” In S. Davis, 229–46. Goldsmith, J. A. 1995. The handbook of phonological theory. Oxford, U.K: Basil Blackwell. Golinkoff, R. M. 1975. “Semantic development in infants: The concept of agent and recipient.” Merrill-Palmer Quarterly 21(2): 181–93. Golinkoff, R. M. 1999. “Breaking the language barrier.” Plenary to GALA ’99, Universität Potsdam, 10–12 September 1999, Potsdam, Germany. Golinkoff, R. M., Harding, C. G., Carlson, V., and Sexton, M. E. 1984. “The infant’s perception of causal events: The distinction between animate and inanimate objects.” In L. P. Lipsitt and C. Rovee-Collier, 145–51. Gombert, J.-E. 1992. Metalinguistic development. Chicago: University of Chicago Press. First published as Le développement métalinguistique (1990), Paris: Presses Universitaires de France. Gopnik, M., and Crago, M. 1991. “Familial aggregation of a developmental language disorder.” Cognition 39: 1–50. Gorrell, P. 1995. Syntax and parsing. Cambridge: Cambridge University Press. Green, D. W. 1986. “Control, activation, and resource: A framework and a model for the control of speech in bilinguals.” Brain and Language 27(1): 210–23. Greenberg, J. (ed.) 1966. Universals of language. Cambridge, MIT Press. Gregg, K. R. 1990. “The Variable Competence Model of second language acquisition, and why it’s not.” Applied Linguistics 11(4): 364–83. Gregg, K. R. 1993. “Taking explanation seriously; or, Let a couple of flowers bloom.” Applied Linguistics 14(3): 276–94. Gregg, K. R. 1994a. “Second language acquisition: History and theory.” In R. E. Asher, 3720–6. Gregg, K. R. 1994b. “The logical and developmental problems of second language acquisition.” In W. C. Ritchie and T. Bhatia, 49–81. Gregg, K. R. 1995. personal communication. Gregg, K. R. 1997. “UG and SLA theory: The story so far.” Revista Canaria de Estudios ingleses 34: 69–99. (A revised edition is in press as “Learnability and second language acquisition theory.” In P. Robinson, (ed.) Cognition and second language instruction. Cambridge: Cambridge University Press.
REFERENCES
419
Grévisse, M. 1980. Le bon usage. Paris: Duculot. 11th edition. Grice, H. P. 1975. “Logic and conversation.” In P. Cole and J. L. Morgan, 41–58. Grimshaw, J. 1981. “Form, function, and the language acquisition device.” In C. L. Baker and J. J. McCarthy, 165–82. Grimshaw, J. 1985. “Discussant’s remarks to M. Sharwood Smith’s Modularity in muddy waters: Linguistic theory and the acquisition of non-native syntax.” Paper to Linguistic theory and second language acquisition conference, MIT, Oct. 25, 1985. Grimshaw, J. 1990. Argument structures. Cambridge, Ma: MIT Press. Grimshaw, J. 1997. “Projection, heads, and optimality.” Linguistic Inquiry 28(3): 373–422. Grimshaw, J., and Rosen, S. 1990. “Knowledge and obedience: The developmental status of the binding theory.” Linguistic Inquiry 24(1): 69–101. Grodzinsky, Y., and Kave, G. 1993/1994. “Do children really know Condition A?” Language Acquisition 3(1): 41–54. Gropen, J., Pinker, S., Hollander, M., Goldberg, R., and Wilson, R. 1989. “The learnability and acquisition of the dative alternation in English.” Language 65(2): 203–57. Grosjean, F. 1985. “The bilingual as a competent but specific speaker-hearer.” Journal of Multilingual and Multicultural Development 6(6): 467–78. Grosjean, F. 1989. “Neurolinguists beware! The bilingual is not two monolinguals in one person.” Brain and Language 36(1): 3–15. Grosjean, F. 1997. “Processing mixed language: Issues, findings, and models”. In A. B. M. de Groot and J. F. Kroll, 225–54. Grossman, R. E., San, L. J., and Vance, T. J. (eds.) 1975. Papers from the eleventh regional meeting of the Chicago Linguistic Society. Chicago, Ill: Chicago Linguistic Society. Gruber, J. 1965. Studies in lexical relations. Doctoral dissertation, MIT. Distributed through the Indiana University Linguistics Club. Reprinted as part of Lexical structures in syntax and semantics (1976). Amsterdam: North Holland. Gruber, H. E., and Voneche, J. J. 1977. The essential Piaget: An interpretive reference and guide. London: Routledge and Kegan Paul. Haberzettel, S. 1999. “Ein Prüfstein für die Transfer-Frage: Der Erwerb der Verbstellungsregeln des Deutschen durch Lerner mit verschiedenen L1.” Eurosla 9, Lund University, June 10–12 1999, Lund Sweden. Haberzettel, S. in preparation. Holzwege und Königswege. Eine Untersuchung zum Erwerb der Verbstellung des Deutschen als Zweitsprache durch türkische und russische Kinder. D.Phil. dissertation, Universität Potsdam, Potsdam, Germany. Haider, H. and Prinzhorn, M. (eds.) 1986. Verb second phenomena in Germanic languages. Dordrecht: Foris. Halle, M., Bresnan, J., and Miller, G. A. (eds.) 1978. Linguistic theory and psychological reality. Cambridge, Ma: MIT Press. Halliday, M. A. K. 1967. “Notes on transitivity and theme in English.” Journal of Linguistics 3(2): 199–244.
420
REFERENCES
Hancin-Bhatt, B., and Bhatt, R. 1997. “Optimal L2 syllables.” Studies in Second Language Acquisition 19(3): 331–78. Harley, B. 1979. “French gender ‘rules’ in the speech of English-dominant, Frenchdominant and monolingual French-speaking children.” Working Papers in Bilingualism 19: 129–56. Harley, B. 1989a. “Functional grammar in French immersion: A classroom experiment.” Applied Linguistics 10: 331–59. Harley, B. 1989b. “Transfer in the written compositions of French immersion students.” In H. W. Dechert and M. Raupach, 3–19. Harley, B., and King, M. L. 1989. “Verb lexis in the written compositions of young L2 learners.” Studies in Second Language Acquisition 11(4): 415–439. Harley, B., and Swain, M. 1984. “The interlanguage of immersion students and its implications for second language teaching.” In A. Davies et al., 291–311. Harley, B., Allen, P., Cummins, J., and Swain, M. (eds.) 1987. The development of bilingual proficiency: Final report. Vol. II: Classroom treatment. Modern Language Centre: The Ontario Institute for Studies in Education. Harrington, M. 1987. “Processing transfer: Language-specific strategies as a source of interlanguage variation.” Applied Psycholinguistics 8(3): 351–78. Harris, R. (ed.) 1992. Cognitive processing in bilinguals. Amsterdam: Elsevier. Hatch, E. 1978a. “Discourse analysis and second language acquisition.” In E. Hatch, 401–35. Hatch, E. (ed.) 1978b. Second language acquisition. Rowley, Ma: Newbury House. Haugen, E. 1953. The Norwegian language in America: A study in bilingual behaviour. Philadelphia: University of Pennsylvania Press. Hauser, M. D., and Marler, P. 1992. “How do and should studies of animal communication affect interpretations of child phonological development?” In C. A. Ferguson et al., 663–80. Hawkins, J. A. 1983. Word order universals. New York: Academic Press. Hawkins, J. A. 1988a. “Explaining language universals.” In J. A. Hawkins, 3–28. Hawkins, J. A. (ed.) 1988b. Explaining language universals. Oxford: Blackwell. Hawkins, J. A. 1994. A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, J. A. 1996. “Movement out of clauses in English and German.” Colloquium to the Institute for English and American Studies, Universität Potsdam. Feb. 6, 1996. Hawkins, R. 1987. “Markedness and the acquisition of the English dative alternation by L2 speakers.” Second Language Research 3(1): 20–55. Hawkins, R., Towell, R., and Bazergui, N. 1993. “Universal grammar and the acquisition of verb movement by native speakers of English.” Second Language Research 9(3): 189–233. Hayes, B. 1986. “Assimilation as spreading in Toba Batak.” Linguistic Inquiry 17(3): 467–99. Hayes, J. R. (ed.) 1970. Cognition and development of language. New York: Wiley.
REFERENCES
421
Hendrick, R. A., Masek, C. A., and Miller, M. F. (eds.) 1981. Papers from the Seventh Regional Meeting, Chicago Linguistic Society. Chicago: Chicago Linguistic Society Herron, C., and Tomasello, M. 1988. “Learning grammatical structures in a foreign language: Modelling versus feedback.” The French Review 61(6): 910–22. Higginbotham, J. 1983. “The logic of perceptual reports.” Journal of Philosophy 80(2): 100–27. Higginbotham, J. 1985. “On semantics.” Linguistic Inquiry 16(4): 547–93. Hilles, S. 1986. “Interlanguage and the pro-drop parameter.” Second Language Research 2(1): 32–52. Hilles, S. 1991. “Access to Universal Grammar in second language acquisition.” In L. Eubank, 305–38. Hirsh-Pasek, K., Kemler Nelson, D. G., Jusczyk, P. W., Wright Kennedy, K., Druss, B., and Kennedy, L. 1987. “Clauses are perceptual units for young infants.” Cognition 26: 269–286. Hirsh-Pasek, K., Treiman, R., and Schneiderman, M. 1984. “Brown and Hanlon revisited: Mothers’ sensitivity to ungrammatical forms.” Journal of Child Language 11(1): 81–88. Hoekstra, T. 1984. Transitivity. Dordrecht: Foris. Hoekstra, T. 1995. “The function of functional categories.” GLOT International 1(2): 3–6. Hoekstra, T., and Schwartz, B. D. (eds.) 1994. Language acquisition studies in generative grammar. Amsterdam: John Benjamins. Holland, J. H., Holyoak, K. J., Nisbett, R. E., and Thagard, P. R. 1986. Induction: Processes of inference, learning and discovery. Cambridge, Mass: MIT Press (all citations are from the paperback edition, 1989). Holland, J. H., and Reitman, J. 1978. “Cognitive systems based on adaptive algorithms.” In D. Waterman and F. Hayes-Roth, 313–29. Hornstein, N., and Lightfoot, D. (eds.) 1981. Explanation in linguistics: The logical problem of language acquisition. New York: Longman. Huebner, T. 1983. A longitudinal analysis of the acquisition of English. Ann Arbor: Karoma. Huebner, T., and Ferguson, C. A. (eds.) 1991. Crosscurrents in second language acquisition and linguistic theories. Amsterdam: John Benjamins. Hulstijn, J. H. 1989. “Experiments with semi-artificial input in second language acquisition research.” Scandinavian Working Papers on Bilingualism: Language learning and learner language 8: 28–40. Stockholm: University of Stockholm. Hulstijn, J. H. 1990. “A comparison between the information-processing and the analysis/ control approaches to language learning.” Applied Linguistics 11(1): 30–45. Hulstijn, J. H. 1992. “Retention of inferred and given word meanings: Experiments in incidental vocabulary learning.” In P. Arnaud and H. Bejoint, 113–25. Hulstijn, J. H. 1997. “Second language acquisition research in the laboratory: Possibilities and limitations.” Studies in Second Language Acquisition 19(2): 131–4. Hulstijn, J. H., and DeKeyser, R. (eds.) 1997. Special issue: Testing SLA theory in the research laboratory. Studies in Second Language Acquisition 19(2).
422
REFERENCES
Hunter, I. M. L. 1966. “Kopfrechnen und Kopfrechner.” Bild der Wissenschaft (April): 296–303. Hunter, I. M. L. 1977. “Mental calculation.” In P. N. Johnson-Laird and P. C. Wason (1977b), 35–45. Huttenlocher, J. 1968. “Constructing spatial images: A strategy in reasoning.” Psychological Review 75(3): 550–60. Huttenlocher, J. 1977. “Constructing spatial images: A strategy in reasoning.” In P. N. Johnson-Laird and P. C. Wason (1977b), 89–97. Hyams, N. 1986. Language acquisition and the theory of parameters. Dordrecht; Reidel. Hyams, N. 1991. “Seven not-so-trivial trivia of language acquisition: Comments on Wolfgang Klein.” In L. Eubank, 71–87. Hyltenstam, K. 1977. “Implicational patterns in interlanguage syntax variation.” Language Learning 27(2): 383–411. Hyltenstam, K. and Obler, L. K. (eds.) 1989. Bilingualism across the lifespan. Cambridge: Cambridge University Press. Hyltenstam, K. and Viberg, A. (eds.) (1993). Progression and regression in language: Sociocultural, neuropsychological and linguistic perspectives. Cambridge: Cambridge University Press. Iatridou, S. 1990. “About Agr(P).” Linguistic Inquiry 21(4): 551–577. Ingram, D. 1989. First language acquisition: Method, description and explanation. Cambridge: Cambridge University Press. Ioup, G., Boustagui, E., El Tigi, M., and Moselle, M. 1994. “Reexamining the critical period hypothesis: A case study of successful adult SLA in a naturalistic environment.” Studies in Second Language Acquisition 16(1): 73–98. Ioup, G., and Weinberger, S. (eds.) 1987. Interlanguage phonology: The acquisition of a second language sound system. Cambridge, Ma: Newbury House. Izumi, S., and Lakshmanan, U. 1998. “Learnability, negative evidence and the L2 acquisition of the English passive.” Second Language Research 14(1): 62–101. Jackendoff, R. S. 1972. Semantic interpretation in generative grammar. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1977. X-syntax: A study of phrase structure. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1983. Semantics and cognition. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1985. “Multiple subcategorizations and the θ-criterion: The case of climb.” Natural Language and Linguistic Theory 3(2): 271–95. Jackendoff, R. S. 1987. Consciousness and the computational mind. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1990. Semantic structures. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1992. Languages of the mind. Cambridge, Ma: MIT Press. Jackendoff, R. S. 1995. Lexical insertion in a post-minimalist grammar. Ms. Jackendoff, R. S. 1996a. “Conceptual semantics and cognitive semantics.” Cognitive Linguistics 7(1): 93–129. Jackendoff, R. 1996b. “The conceptual structure of intending and volitional action.” In H. Campos and P. Kempchinsky, 198–227.
REFERENCES
423
Jackendoff, R. 1996c. “The proper treatment of measuring out, telicity, and perhaps even quantification in English.” Natural Language and Linguistic Theory 14(3): 305–54. Jackendoff, R. S. 1997. The architecture of the language faculty. Cambridge, Ma: MIT Press. Jacobs, B. 1988. “Neurobiological differentiation of primary and secondary language acquisition.” Studies in Second Language Acquisition 10(3): 303–37. Jacobs, R., and Rosenbaum, P. (eds.) 1970. Readings in transformational grammar. Waltham, Ma: Blaisdell. Jacoby, L. L., Toth, J. P., and Yonelmas A. P. 1993. “Separating conscious and unconscious influences of memory: Measuring recollection.” Journal of Experimental Psychology: General 122(1): 139–54. Jensen, J. T. 1990. Morphology: Word structures in generative grammar. Amsterdam: John Benjamins. Johnson, J. S., and Newport, E. L. 1989. “Critical period effects in second language learning: the influence of maturational state on the acquisition of English as a second language.” Cognitive Psychology 21(1): 60–99. Johnson, J. S., and Newport, E. L. 1991. “Critical period effects on universal properties of language: The status of subjacency in the acquisition of a second language.” Cognition 39: 215–58. Johnson, K. 1996. Language teaching and skill learning. Oxford: Blackwell. Johnson, M. 1996. “Resource sensitivity in Lexical Functional Grammar.” Universität Potsdam Kolloquium, June 4th, 1996. Johnson, M. H. 1990a. “Cortical maturation and the development of visual attention in early infancy.” Journal of Cognitive Neuroscience 2(1): 81–95 Johnson, M. H. 1990b. “Cortical maturation and perceptual development.” In H. Bloch and B. I. Bertenthal, 145–62. Johnson, M. H., and Morton, J. 1991. Biology and cognitive development: The case of face recognition. Oxford: Blackwell. Johnson-Laird, P. N. 1983. Mental Models. Cambridge, Ma: Harvard University Press. Johnson-Laird, P. N., Legrenzi, P., and Sonino-Legrenzi, M. 1972. “Reasoning and a sense of reality”. British Journal of Psychology 63: 395–400. Johnson-Laird, P. N., and Wason, P. C. 1977a. “A theoretical analysis of insight into a reasoning task”. In P. N. Johnson-Laird and P. C. Wason, 1977b, 143–57. Johnson-Laird, P. N., and Wason, P. C. (eds.) 1977b. Thinking: Readings in cognitive science. Cambridge: Cambridge University Press. Jones, E. E., Kanouse, D. E., Kelley, H. H., Nisbett, R. E., Valins, S., and Weiner, B. (eds.) 1972. Attribution: Perceiving the causes of behaviour. Morristown, N.J.: General Learning Press. Juffs, A. 1996. Learnability and the lexicon: Theories and second language acquisition research. Amsterdam: John Benjamins. Juffs, A. 1998a. “Main verb vs. reduced relative clause ambiguity resolution in second language sentence processing.” Language Learning 48(1): 107–47.
424
REFERENCES
Juffs, A. 1998b. “Some effects of first language argument structure and morphosyntax on second language sentence processing.” Second Language Research 14(4): 406–24. Jusczyk, P. W. 1981. “The processing of speech and nonspeech sounds by infants: Some implications.” In R. N. Aslin et al, 191–217. Jusczyk, P. W. 1985. “On characterizing the development of speech perception.” In J. Mehler and R. Fox, 199–229. Jusczyk, P. W. 1992. “Developing phonological categories from the speech signal.” In C. A. Ferguson et al, 17–64. Jusczyk, P. 1993. “How word recognition may evolve from infant speech perception capacities.” In G. Altmann, and R. Shillcock, 27–55. Jusczyk, P. W., Pisoni, D. B., Reed, M., Fernald, A., and Myers, M. 1983. “Infants’ discrimination of the duration of rapid spectrum change in non-speech signals.” Science 222: 175–77. Jusczyk, P. W., Pisoni, D. B., Walley, A. C., and Murray, J. 1980. “Discrimination of the relative onset of two-component tones by infants.” Journal of the Acoustical Society of America 67(1): 262–70. Jusczyk, P. W., Rosner, B. S., Reed, M. A., and Kennedy, L. J. 1989. “Could temporal order differences underlie 2-month-olds’ discrimination of English voicing contrasts?” Journal of the Acoustical Society of America 85(4): 1741–49. Kail, M. 1989. “Cue validity, cue cost, and processing types in sentence comprehension in French and Spanish.” In B. MacWhinney and E. Bates, 77–117. Kaplan, R., and Bresnan, J. 1982. “Lexical-Functional Grammar: A formal system for grammatical representation.” In J. Bresnan, 173–281. Karmiloff-Smith, A. 1979. A functional approach to child language: A study of determiners and reference. Cambridge: Cambridge University Press. Karmiloff-Smith, A. 1986. “From meta-processes to conscious access: Evidence from children’s metalinguistic and repair data.” Cognition 23: 95–147. Karmiloff-Smith, A. 1992. Beyond modularity. Cambridge, Ma: MIT Press. Kasher, A. (ed.) 1991. The Chomskyan Turn. Oxford: Basil Blackwell. Kasper, G. 1985. “Repair in foreign language teaching.” Studies in Second Language Acquisition 7(2): 200–215. Katz, J. 1995. “The unfinished Chomskyan revolution.” Colloquium to the Universität Hamburg, June, 1995 Katz, J. 1996. “The unfinished Chomskyan revolution.” Mind and Language 11(3): 270–94. Katz, N., Baker, E., and Macnamara, J. 1974. “What’s in a name? A study of how children learn common and proper names.” Child Development 45(2): 469–73. Kayne, R. 1994. The antisymmetry of syntax. Cambridge, Ma: MIT Press. Kean, M-L. 1977. “The linguistic interpretation of aphasic syndromes.” In E. C. T. Walker, 9–46. Kean, M-L. 1980. “Grammatical representations and the description of language processing.” In D. Caplan, 239–68.
REFERENCES
425
Keil, F. C. 1979. Semantic and conceptual development: An ontological perspective. Cambridge, Ma: Harvard University Press. Keil, F. C. 1989. Concepts, kinds and cognitive development. Cambridge, Ma: MIT Press. Kellerman, E. 1978. “Giving learners a break: native language intuitions as a source of predictions about transferability.” Working Papers on Bilingualism 15: 59–92. Kellerman, E. 1986. “An eye for an eye: Crosslinguistic constraints on the development of the L2 lexicon.” In E. Kellerman and M. Sharwood Smith, 35–48. Kellerman, E. 1987. Aspects of transferability in second language acquisition. Doctoral dissertation, Katholieke Universiteit te Nijmegen. Kellerman, E., and Sharwood Smith, M. (eds.) 1986. Crosslinguistic influence in second language acquisition. Oxford: Pergamon Press. Kelley, H. H. 1972. “Causal schemata and the attribution process”. In E. E. Jones et al., 151–74. Kelley, H. H. 1973. “The process of causal attribution”. American Psychologist 28(2): 107–28. Kelly, M. H. 1988a. “Phonological biases in grammatical category shifts.” Journal of Memory and Language 27(3): 343–58. Kelly, M. H. 1988b. “Rhythmic alternations and lexical stress differences in English.” Cognition 30: 107–37. Kelly, M. H. 1989. “Rhythm and language change in English.” Journal of Memory and Language 28(5): 690–710. Kelly, M. H. 1992. “Using sound to solve syntactic problems: The role of phonology in grammatical category assignments.” Psychological Review 99(2): 349–64. Kelly, M. H., and Martin, S. 1994. “Domain-general abilities applied to domain-specific tasks: Sensitivity to probabilities in perception, cognition, and language.” Lingua 92(1–4): 105–40. Kenstowicz, M. 1994. Phonology in generative grammar. Oxford: Blackwell. Kent, R. D. 1992. “The biology of phonological development.” In C. A. Ferguson et al., 65–90. Keyser, J. 1985. “Introductory remarks.” Linguistic theory and second language acquisition conference. MIT, Oct. 25, 1985. Kilborn, K. 1987. Sentence processing in a second language: Seeking a performance definition of fluency. Doctoral dissertation, University of California, San Diego. Kilborn, K., and Cooreman, A. 1987. “Sentence interpretation strategies in adult DutchEnglish bilinguals”. Applied Psycholinguistics 8(4): 415–31. Kilborn, K., and Ito, T. 1989. “Sentence processing strategies in adult bilinguals.” In B. MacWhinney and E. Bates, 257–91. Kiparsky, P. 1973. “Elsewhere in phonology.” In S. R. Anderson and P. Kiparsky, 93–106. Kiparsky, P. 1982. “Lexical morphology and phonology.” In I. S. Yang, 3–91.. Klein, W. 1986. Second language acquisition. Cambridge: Cambridge University Press. Klein, W. 1991a. “Seven trivia of language acquisition.” In L. Eubank, 49–69.
426
REFERENCES
Klein, W. 1991b. “SLA theory: Prolegomena to a theory of language acquisition and implications for theoretical linguistics.” In T. Huebner and C. A. Ferguson, 169–94. Klein, W., and Dittmar, N. 1979. Developing grammars: The acquisition of German syntax by foreign workers. Berlin: Springer-Verlag. Klein, W., and Perdue, C. 1992. Utterance structure: Developing grammars again. Amsterdam: John Benjamins. Köhler, W. 1929. Gestalt psychology. New York: Liveright. Köpcke, K.-M. 1987. “Der Erwerb morphologischer Ausdrucksmittel durch L2-Lerner am Beispiel der Personalflexion.” Zeitschrift für Sprachwissenschaft 6(2): 186–205. Köpcke, K.-M., and Zubin, D. A. 1983. “Die kognitive Organisation der Genuszuweisung zu den einsilbigen Nomen der deutschen Gegenwartssprache.” Zeitschrift für germanistische Linguistik 11: 166–82. Köpcke, K.-M., and Zubin, D. A. 1984. “Sechs Prinzipien für die Genuszuweisung im Deutschen: Ein Beitrag zur natürlichen Klassifikation.” Linguistische Berichte 93: 26–50. Koffka, K. 1935. Principles of Gestalt psychology. New York: Harcourt, Brace and World. Koopman, H. 1984. The syntax of verbs: From verb-movement rules in Kru languages to Universal Grammar. Dordrecht: Foris. Krashen, S. 1981. Second language acquisition and second language learning. Oxford: Pergamon. Krashen, S. 1982. Principles and practices of second language acquisition. Oxford: Pergamon. Krashen, S. 1985. The input hypothesis: issues and implications. London: Longman. Kuczaj, S. A. II (ed.) 1982. Language development: Language, thought, and culture. Vol. II. Hillsdale, N.J.: Erlbaum. Kuhl, P. K., and Iverson, P. 1995. “Linguistic experience and the perceptual magnet effect”. In W. Strange, 121–54. Kulhavy, R. W. 1977. “Feedback in written instruction.” Review of Educational Research 47(1): 211–32. Labov, W. 1963. “The social motivation of sound change.” Word 19: 273–307. Labov, W. 1966. The social stratification of English in New York City. Washington, D.C: Center for Applied Linguistics. Labov, W. 1972a. Language in the inner city. Philadelphia: University of Pennsylvania Press. Labov, W. 1972b. Sociolinguistic patterns. Philadelphia: University of Pennsylvania Press. Labov, W. 1994. Principles of linguistic change. Vol. 1: Internal factors. Oxford: Blackwell. Labov, W. 1998. “Co-existent systems in African-American Vernacular English.” In S. S. Mufwene et al., 110–53. Lackner, J. R., and Tuller, B. 1976. “The influence of syntactic segmentation on perceived stress.” Cognition 4: 303–7.
REFERENCES
427
Lakshmanan, U. 1989. Accessibility to universal grammar in child second language acquisition. Doctoral dissertation, University of Michigan, Ann Arbor. Lakshmanan, U. 1991. “Morphological uniformity and null subjects in child second language acquisition.” In L. Eubank, 389–410. Lakshmanan, U. 1993/1994. “‘The boy for the cookie’ — Some evidence for the nonviolation of the Case Filter in child second language acquisition.” Language Acquisition 3(1): 55–91. Landau, B., and Gleitman, L. R. 1985. Language and experience: Evidence from the blind child. Cambridge, Ma: Harvard University Press. Langendoen, T., and Postal, P. 1984. The vastness of natural languages. Oxford: Blackwell Lardiere, D. 1995. “L2 acquisition of English synthetic compounding is not constrained by level-ordering (and neither, probably, is L1).” Second Language Research 11(1): 20–56. Larsen-Freeman, D. 1983. “The importance of input in second language acquisition.” In R. Andersen, 87–93. Larsen-Freeman, D., and Long, M. H. 1991. An introduction to second language acquisition research. New York: Longman. Lasky, R. E., and Gogol, W. C. 1978. “The perception of relative motion by young infants.” Perception 7: 617–23. Lass, N. I. (ed.) 1981. Speech and language: Advances in basic research and practice. Vol. 6. New York: Academic Press. Lass, N. I. (ed.) 1983. Speech and language: Advances in basic research and practice. Vol. 9. New York. Leonard, L. B. 1979. “Language impairment in children.” Merrill-Palmer Quarterly 25(3): 205–32. Leonard, L. B., Bortolini, U., Caselli, M. C., McGregor, K. K., and Sabbadini, L. 1992. “Morphological deficits in children with specific language impairment: The status of features in the underlying grammar.” Language Acquisition 2(2): 151–79. Leonard, L. B., Sabbadini, L., Volterra, V., and Leonard, J. S. 1988. “Some influences on the grammar of English- and Italian-speaking children with specific language impairment.” Applied Psycholinguistics 9(1): 39–57. Levelt, W. J. M. 1989. Speaking: From intention to articulation. Cambridge, Ma: MIT Press. Levelt, W. J. M. 1996. Producing and monitoring words. Colloquium to the Institut für Linguistik, Universität Potsdam, 12 January 1996. Levy, Y. 1983. “It’s frogs all the way down.” Cognition 15(1): 75–93. Liceras, J. M. 1988. “L2 learnability: Delimiting the domain of core grammars as distinct from the marked periphery.” In S. Flynn and W. O’Neil, 199–224. Liceras, J. M. 1999. “Triggers in L2 acquisition: The case of Spanish N–N compounds.” Eurosla 9, Lund University, 10–12 June 1999, Lund Sweden. Lightbown, P. M. 1983. “Exploring relationships between developmental and instructional sequences in L2 acquisition.” In H. Seliger and M. Long, 217–43.
428
REFERENCES
Lightbown, P. M. 1987. “Classroom language as input to second language acquisition.” In C. Pfaff, 169–87. Lightbown, P. M. 1991. “What have we here? Some observations on the influence of instruction on L2 learning.” In R. Phillipson et al., 197–212. Lightbown, P. M., and Spada, N. 1990. “Focus-on-form and corrective feedback in communicative language teaching.” Studies in Second Language Acquisition 12(4): 429–48. Lightfoot, D. 1989. “The child’s trigger experience: Degree-0 learnability.” Behavioral and Brain Sciences 12(2): 321–34. Lightfoot, D. 1991. How to set parameters: Arguments from language change. Cambridge, Ma: MIT Press. Linebarger, M. 1989. “Neuropsychological evidence for linguistic modularity.” In G. N. Carlson and M. K. Tanenhaus, 197–238. Lipsett, L. P., and Rovee-Collier, C. (eds.) 1984. Advances in infancy research. Vol. 3. Norwood, N.J.: Ablex. Locke, J. 1964. An essay concerning human understanding. Cleveland: Meridian Books. Originally published in 1690. Locke, J. L., and Pearson, D. M. 1992. “Vocal learning and the emergence of phonological capacity: A neurobiological approach.” In C. A. Ferguson et al., 91–129. Loewenstein, W. 1960. “Biological transducers.” Scientific American. (August). Long, M. H. 1980. Input, interaction, and second language acquisition. Doctoral dissertation, University of California, Los Angeles. Long, M. H. 1981. “Input, interaction, and second language acquisition.” In H. Winitz, 259–78. Long, M. H. 1983. “Linguistic and conversational adjustment to non-native speakers.” Studies in Second Language Acquisition 5(2): 177–93. Long, M. H. 1985. “Input and second language acquisition theory.” In S. M. Gass and C. Madden, 377–93. Long, M. H. 1990. “Maturational constraints on language development.” Studies in Second Language Acquisition 12(3): 251–85. Long, M. H. 1996. “The role of the linguistic environment in second language acquisition.” In W. C. Ritchie and T. K. Bhatia, 413–68. Loschky, L. 1994. “Comprehensible input and second language acquisition.” Studies in Second Language Acquisition 16(3): 303–23. Lowenstein, W. R. 1960. Biological transducers. Scientific American (August). Lucas, E. 1975. Teachers’ reacting moves following errors made by pupils in post-primary English as second language classes in Israel. M.A. thesis, School of Education, Tel Aviv University. Lumsden, J. 1992. “Underspecification in grammatical and natural gender.” Linguistic Inquiry 23(3): 469–86. Lysakowski, R. S., and Walberg, H. J. 1982. “Instructional effects of cues, participation and corrective feedback: A quantitative synthesis.” American Educational Research Journal 19(4): 559–78.
REFERENCES
429
Macnamara, J. (ed.) 1977. Language learning and thought. New York: Academic Press. Macnamara, J. 1982. Names for things: A study of human learning. Cambridge, Ma: MIT Press. MacWhinney, B. 1978. “The acquisition of morphophonology.” Monographs of the society for research in child development 43(1). MacWhinney, B. 1987a. “The competition model and bilingualism.” Applied Psycholinguistics 8(4): 315–431. MacWhinney, B. 1987b. “The competition model.” In B. MacWhinney (1987c), 249–308. MacWhinney, B. (ed.) 1987c. Mechanisms of language acquisition. Hillsdale, N.J.: Erlbaum. MacWhinney, B. 1989. “Competition and teachability.” In M. L. Rice and R. L. Schiefelbusch, 63–105. MacWhinney, B. 1992. “Transfer and competition in second language learning.” In R. Harris, 371–90. MacWhinney, B. 1997. “Second language acquisition and the Competition Model.” In A. B. M. de Groot and J. F. Kroll, 113–42. MacWhinney, B., and Bates, E. (eds.) 1989. The crosslinguistic study of sentence processing. Cambridge: Cambridge University Press. MacWhinney, B., Pléh, C., and Bates, E. 1985. “The development of syntactic comprehension in Hungarian.” Cognitive Psychology 17(1): 178–209. Major, R. 1988. “Balancing form and function.” IRAL 26: 81–100. Manzini, R., and Wexler, K. 1987. “Parameters, binding theory and learnability.” Linguistic Inquiry 18(3): 413–44. Markman, E. 1989. Categorization and naming in children. Cambridge, Ma: MIT Press. Markman, E. and Hutchinson, 1984. “Children’s sensitivity to constraints on word meaning: Taxonomic vs. thematic relations.” Cognitive Psychology 16: 1–27. Marler, P. 1987. “Sensitive periods and the roles of specific and general sensory stimulation in birdsong learning.” In J. P. Rauschecker and P. Marler, 99–135. Marr, D. 1982. Vision. San Francisco: Freeman. Marshall, H. 1965. “The effect of punishment on children: A review of the literature and a suggested hypothesis.” Journal of Genetic Psychology 106(1): 23–33. Marshall, J. 1980. “On the biology of language acquisition.” In D. Caplan, 106–48. Marslen-Wilson, W. D. 1989. “Access and integration: Projecting sound onto meaning.” In W. Marslen-Wilson, 3–24. Marslen-Wilson, W. D. (ed.) 1989. Lexical representation and process. Cambridge, Ma: MIT Press. Marslen-Wilson, W. D., and Tyler, L. K. 1980. “The temporal structure of spoken language understanding.” Cognition 8: 1–71. Marslen-Wilson, W. D., and Tyler, L. K. 1981. “Central processes in speech understanding.” Philosophical transactions of the Royal Society of London B295: 317–32. Marslen-Wilson, W. D., and Tyler, L. K. 1987. “Against modularity.” In J. L. Garfield, 63–82.
430
REFERENCES
Massey, C., and Gelman, R. 1988. “Preschoolers’ ability to decide whether pictured unfamiliar objects can move themselves.” Developmental Psychology 24: 307–17. Mathiot, M. 1979. “Sex roles as revealed through referential gender in American English.” In M. Mathiot, 1–47. Mathiot, M. (ed.) 1979. Ethnolinguistics: Boaz, Sapir and Whorf revisited. The Hague: Mouton. Mathews, R. C., Buss, R. R., Stanley, W. B., Blanchard-Fields, F., Cho, J. R., and Druhan, B. 1989. “Role of implicit and explicit processes in learning from examples: A synergistic effect.” Journal of Experimental Psychology: Learning, Memory, and Cognition 15(5): 1083–1100. Matthews, R. J. 1991. “Psychological reality of grammars”. In A. Kasher, 182–99. Matthews, R. J., and Demopoulos, W. (eds.) 1984. Learnability and linguistic theory. Dordrecht: Kluwer. May, R. 1977. The grammar of quantification. Doctoral dissertation, MIT. Distributed by the Indiana University Linguistics Club. May, R. 1985. Logical form: Its structure and derivation. Cambridge, Ma: MIT Press. Mazurkewich, I. 1984a. “The acquisition of the dative alternation by second language learners and linguistic theory.” Language Learning 34(1): 91–109. Mazurkewich, I. 1984b. “Dative questions and markedness.” In F. Eckman et al., 119–31. Mazurkewich, I. 1985. “Syntactic markedness and language acquisition.” Studies in Second Language Acquisition 7(1): 15–36. Mazurkewich, I., and White, L. 1984. “The acquisition of the dative alternation: Unlearning overgeneralisations.” Cognition 16: 261–83. McCarthy, J., and Prince, A. 1995. “Faithfulness and reduplicative identity.” University of Massachusetts Occasional Papers 18: Papers in Optimality Theory: 249–384. GLSA, University of Massachusetts, Amherst, Ma. Also available through the Rutgers Optimality Archive ROA-60, http://ruccs.rutgers.edu/roa.html McDaniel, D., Cairns, H. S., and Hsu, J. 1990. “Binding principles in the grammars of young children.” Language Acquisition 1(1): 121–38. McDaniel, D., and Maxfield, T. 1992. “Principle B and contrastive stress.” Language Acquisition 2(4): 337–58. McDonald, J. L. 1984. The mapping of semantic and syntactic processing cues by first and second language learners of English, Dutch and German. Doctoral dissertation, Carnegie-Mellon University, Pittsburgh. McDonald, J. L. 1986. “The development of sentence comprehension strategies in English and Dutch.” Journal of Experimental Child Psychology 41(3): 317–35. McDonald, J. L. 1989. “The acquisition of cue-category mappings.” In B. MacWhinney and E. Bates, 375–96. McLaughlin, B. 1987. Theories of second-language learning. London: Edward Arnold. McLaughlin, B., Rossman, T., and McLeod, B. 1983. “Second language learning: An information-processing perspective.” Language Learning 33: 135–58. McLeod, B. and McLaughlin, B. 1986. “Restructuring or automaticity? Reading in a second language.” Language Learning 36: 109–23.
REFERENCES
431
McNeill, D. 1966. “Developmental psycholinguistics.” In F. Smith and G. A. Miller, 155–207. Mehler, J., and Fox, R. (eds.). 1985. Neonate cognition: Beyond the blooming buzzing confusion. Hillsdale, N.J.: Erlbaum. Mehler, J., Walker, E. C. T., and Garrett, M. (eds.) 1982. Perspectives on mental representations. Hillsdale, N.J.: Erlbaum. Meisel, J. M. 1983. “Strategies of second language acquisition. More than one kind of simplification.” In R. W. Andersen, 120–57. Meisel, J. M. 1986. “Word order and case marking in early child language. Evidence from simultaneous acquisition of two first languages: French and German.” Linguistics 24(1): 123–83. Meisel, J. M. 1987. “Reference to past events and actions in the development of natural second language acquisition.” In C. Pfaff, 206–224. Meisel, J. M. 1990a. “INFL-ection: Subjects and subject–verb agreement.” In J. M. Meisel, 237–98. Meisel, J. M. (ed.). 1990b. Two first languages: Early grammatical development in bilingual children. Dordrecht: Foris. Meisel, J. M. 1991. “Principles of universal grammar and strategies of language use: On some similarities and differences between first and second language acquisition.” In L. Eubank, 231–76. Meisel, J. M. (ed.). 1992. The acquisition of verb placement: Functional categories and V2 phenomena in language acquisition. Dordrecht: Kluwer. Meisel, J. M. (ed.). 1994a. Bilingual first language acquisition: French and German grammatical development. Amsterdam: John Benjamins. Meisel, J. M. 1994b. “Getting FAT: Finiteness, Agreement and Tense in early grammars”. In J. M. Meisel (1994a), 89–129. Meisel, J. M. (ed.). 1994c. La adquisición del vasco y del castellano en niños bilingües. Frankfurt: Iberoamericana. Meisel, J. M. 1995. “Parameters in acquisition.” In P. Fletcher and B. MacWhinney, 10–35. Meisel, J. M. 1997. “The acquisition of the syntax of negation in French and German: Contrasting first and second language development.” Second Language Research 13(3): 227–63. Meisel, J. M. 1998. “Revisiting Universal Grammar.” Paper to the special colloquium UG access in L2 acquisition: Reassessing the question. Second Language Research Forum 18, University of Hawai’i at Manoa, October 1998. Meisel, J. M. 1999. “On the possibility of becoming a monolingual but competent speaker.” Plenary address to GALA ’99, Universität Potsdam, 10–12 September 1999, Potsdam Germany. Meisel, J. M. in preparation. First and second language acquisition: Parallels and differences. Cambridge, U.K: Cambridge University Press.
432
REFERENCES
Meisel, J. M., Clahsen, H., and Pienemann, M. 1981. “On determining developmental stages in natural second language acquisition.” Studies in Second Language Acquisition 3(2): 109–35. Meisel, J. M., and Müller, N. 1992. “Finiteness and verb placement in early child grammars: Evidence from simultaneous acquisition of French and German in bilinguals.” In J. M. Meisel, 109–38. Mervis, C. B. 1980. “Category structure and the development of categorization.” In R. Spiro et al., 279–307. Mervis, C., Catlin, J., and Rosch, E. 1976. “Relationships among goodness-of-example, category norms, and word frequency.” Bulletin of the Psychonomic Society 7: 283–4. Miller, G. A. 1956. “The magical number seven, plus or minus two: Some limits on our capacity for processing information.” Psychological Review 63(1): 81–97. Milroy, J. 1992. Linguistic variation and change. Oxford: Blackwell. Milroy, L. 1980. Language and social networks. Oxford: Blackwell. Mintz, T. H., Newport, E. L., and Bever, T. G. 1995. “Distributional regularities of form class in speech to young children.” NELS 25(2): 43–54. Mohanen, K. P. 1986. The theory of lexical phonology. Dordrecht: Reidel. Moerk, E. L. 1983a. The mother of Eve — as a first language teacher. Norwood: Ablex. Moerk, E. L. 1983b. “A behavioural analysis of controversial topics in first language acquisition: Reinforcements, corrections, modeling, input frequencies, and the threeterm contingency pattern.” Journal of Psycholinguistic Research 12(1): 129–55. Moerk, E. L. 1991. “Positive evidence for negative evidence.” First Language 11(2): 219–51. Moore, T. E. (ed.) 1973. Cognitive development and the acquisition of language. New York: Academic Press. Morgan, J. L. 1986. From simple input to complex grammar. Cambridge, Ma: MIT Press. Morgan, J. L. 1989. “Learnability considerations and the nature of trigger experiences in language acquisition.” Behavioral and Brain Sciences 12(2): 352–3. Morgan, J. L., and Demuth, K. (eds.) 1996. Signal to syntax: Bootstrapping from speech to grammar in early acquisition. Hillsdale, N.J.: Erlbaum. Morgan, J. L., Meier, R. P., and Newport, E. L. 1987. “Structural packaging in the input to language learning: Contributions of prosodic and morphological marking of phrases to the acquisition of language.” Cognitive Psychology 19(3): 498–550. Morin, Y.-C. 1982. “Analogie, quatrième proportionnelle et terminaison thématique.” Revue de l’Association québécoise de linguistique 2(2): 127–43. Morris, C. D., Bransford, J. D., and Franks, J. J. 1977. “Levels of processing versus transfer of appropriate processing.” Journal of Verbal Learning and Verbal Behaviour 16(4): 519–33. Müller, N. 1990. “Developing two gender assignment systems simultaneously.” In J. M. Meisel, 193–234. Müller, N. 1993. Komplexe Sätze. Der Erwerb von COMP und von Wortstellungsmustern bei bilingualen Kindern (Französisch/Deutsch). Tübingen: Narr.
REFERENCES
433
Müller, N. 1994. “Parameters cannot be reset: Evidence from the development of COMP.” In J. M. Meisel (1994a), 235–69. Müller, N. 1998. “Transfer in bilingual first language acquisition.” Bilingualism: Language and Cognition 1(3): 151–71. Müller, R.-A. 1996. “Innateness, autonomy, universality? Neurobiological approaches to language.” Behavioral and Brain Sciences 19(4): 611–75. Mufwene, S. S. 1992. “Why grammars are not monolithic.” In D. Brentari et al., 225–50. Mufwene, S. S., Rickford, J. R., Bailey, G., and Baugh, J. (eds.) 1998. African-American English. Structure, history and use. London: Routledge. Murphy, G. L., and Medin, D. 1985. “The role of theories in conceptual coherence.” Psychological Review 92(2): 289–316. Myles, F. 1995. “Interaction between linguistic theory and language processing in SLA.” Second Language Research 11(3): 235–66. Neeleman, A., and Weerman, F. 1997. “L1 and L2 word order acquisition.” Language Acquisition 6(2): 125–70. Nespor, M., and Vogel, I. 1986. Prosodic phonology. Dordrecht: Foris. Neufeld, G. G. 1978. “On the acquisition of prosodic and articulatory features in adult language learning.” The Canadian Modern Language Review 34(2): 163–74. Neufeld, G. G. 1979. “Towards a theory of language learning ability.” Language Learning 29(2): 227–41. Newell, A. and Simon, H. A. 1972. Human problem-solving. Englewood Cliffs, N.J.: Prentice-Hall. Newmeyer, F. J. (ed.), 1988. Linguistics: The Cambridge survey. Vol. I, II and III. Cambridge, U.K: Cambridge University Press. Newport, E. L. 1990. “Maturational constraints on language learning.” Cognitive Science 14(1): 11–28. Nisbett, R. E. 1993a. “Reasoning, abstraction, and the prejudices of 20th-century psychology.” In R. E. Nisbett (1993b), 1–12. Nisbett, R. E. (ed.) 1993b. Rules for reasoning. Hove, U.K: Erlbaum. Nisbett, R. E., Krantz, D. H., Jepson, D., and Kunda, Z. 1983. “The use of statistical heuristics in everyday reasoning.” Psychological Review 90(2): 339–63. Reprinted in Nisbett (1993b), 15–54. Nishigauchi, T., and Roeper, T. 1987. “Deductive parameters and the growth of empty categories.” In T. Roeper and E. Williams, 91–121. Noyau, C. 1982. “French negation in the language of Spanish-speaking immigrant workers: Social acquisition/variability/transfer/individual systems.” Paper to The European-North American Workshop on cross-linguistic second language acquisition research. Lake Arrowhead. Obler, L. K. 1993. “Neurolinguistic aspects of second language development and attrition.” In K. Hyltenstam and A. Viberg, 178–95. Obler, L. K., and Menn, L. (eds.) 1982. Exceptional language and linguistic theory. New York: Academic Press.
434
REFERENCES
O’Grady, W. 1987. Principles of grammar and learning. Chicago: University of Chicago Press. O’Grady, W. 1991. “Language acquisition and the ‘Pro-drop’ phenomenon: A response to Hilles.” In L. Eubank, 339–49. O’Grady, W. 1996. “Language acquisition without Universal Grammar: A general nativist proposal for L2 learning.” Second Language Learning 12(4): 374–97. Olguin, R. and Tomasello, M. 1993. “Two-year-old children do not have a grammatical category of verb.” Cognitive Development 8(2): 245–72. Osherson, D., Stob, M., and Weinstein, S. 1984. “Learning theory and natural language.” Cognition 17: 1–28. Reprinted in R. Matthews and W. Demopoulos. Ouhalla, J. 1991. Functional categories and parametric variation. London: Routledge. Oyama, S. 1976. “A sensitive period for the acquisition of a non-native phonological system.” Journal of Psycholinguistic Research 5(2): 261–285. Oyama, S. 1978. “The sensitive period and comprehension of speech.” Working Papers in Bilingualism 16: 1–18. Pal, A. 1999. “Cross-linguistic lexical interference: the role of cognates in word recognition and retrieval.” VIIIth International Congress for the Study of Child Language, 16th July, San Sebastien, Spain. Pal, A. 2000. The role of cross-linguistic formal similarity in Hungarian-German bilingual learners of English as a foreign language. D.Phil. dissertation, University of Potsdam. Parsons, T. 1990. Events in the semantics of English. Cambridge, Ma: MIT Press. Patkowski, M. 1980. “The sensitive period for the acquisition of syntax in a second language.” Language Learning 30(3): 449–72. Penner, S. 1987. “Parental responses to grammatical and ungrammatical child utterances.” Child Development 58(2): 376–84. Perdue, C. 1993. Adult language acquisition: Crosslinguistic perspectives written by members of the European Science Foundation Project on adult language acquisition. Vol. I Field methods, Vol. II The results. Perdue, C. 1996. “Pre-basic varieties. The first stages of second language acquisition.” Toegepaste Taalwetenschap in Artikelen 55: 135–50. Perkins, K., and Larsen-Freeman, D. 1975. “The effects of formal language instruction on the order of morpheme acquisition.” Language Learning 25(2): 237–43. Pesetsky, D. 1982. Paths and categories. Doctoral dissertation, MIT. Peters, A. 1983. The units of language acquisition. Cambridge, U.K: Cambridge University Press. Pfaff, C. W. (ed.) 1987. First and second language acquisition processes. Rowley, Ma: Newbury House. Philip, W. 2000. “Adult and child understanding of simple reciprocal sentences.” Language 76(1): 1–27. Phillipson, R., Kellerman, E., Selinker, L., Sharwood Smith, M. and Swain, M. (eds.) 1991. Foreign language pedagogy research: A commemorative volume for Claus Faerch. Clevedon, U.K: Multilingual Matters.
REFERENCES
435
Phinney, M. 1987. “The pro-drop parameter in second language acquisition.” In T. Roeper and E. Williams, 221–38. Piattelli-Palmarini, M. (ed.) 1980. Language and learning: The debate between Jean Piaget and Noam Chomsky. Cambridge, Ma: Harvard University Press. Pica, T. 1983. “Adult acquisition of English as a second language under different conditions of exposure.” Language Learning 33(3): 465–97. Pica, T. 1987. “Second language acquisition, social interaction, and the classroom.” Applied Linguistics 7(1): 1–25. Pica, T. 1988. “Interlanguage adjustments as an outcome of NS–NNS negotiated interaction.” Language Learning 38(1): 45–73. Pica, T. 1994. “Review article. Research on negotiation: What does it reveal about second-language learning conditions, processes and outcomes.” Language Learning 44(3): 493–527. Pica, T., and Doughty, C. 1985. “Input and interaction in the communicative language classroom: A comparison of teacher-fronted and group activities.” In S. M. Gass and C Madden, 115–32. Pica, T., Holliday, L., Lewis, N., and Morgenthaler, L. 1989. “Comprehensible output as an outcome of linguistic demands on the learner.” Studies in Second Language Acquisition 11(1): 63–90. Pica, T., Young, R., and Doughty, C. 1986. “Making input comprehensible: Do interactional modifications help?” ITL Review of Applied Linguistics 72(1): 1–25. Pica, T., Young, R., and Doughty, C. 1987. “The impact of interaction on comprehension.” TESOL Quarterly 21(4): 737–58. Pienemann, M. 1981. Der Zweitspracherwerb ausländischer Arbeitskinder. Bonn: Bouvier. Pienemann, M. 1984. “Psychological constraints on the teachability of languages.” Studies in Second Language Acquisition 6(2): 186–214. Pienemann, M. 1987. “Determining the influence of instruction on L2 speech processing.” Australian Review of Applied Linguistics 10(2): 83–113. Pienemann, M. 1998a. “Developmental dynamics in L1 and L2 acquisition: Processability Theory and generative entrenchment.” Bilingualism: Language and Cognition 1(1): 1–20. Pienemann, M. 1998b. Language processing and second language acquisition: Processability theory. Amsterdam: John Benjamins. Pierrehumbert, J. 1980. The phonology and phonetics of English intonation. Doctoral dissertation, MIT. Pinker, S. 1979. “Formal models of language learning.” Cognition 7: 217–283. Pinker, S. 1984. Language learnability and language development. Cambridge, Ma: Harvard University Press. Pinker, S. 1987. “The bootstrapping problem in language acquisition.” In B. MacWhinney (1987c), 399–441. Pinker, S. 1989. Learnability and cognition: The acquisition of argument structure. Cambridge, Ma: MIT Press.
436
REFERENCES
Pishwa, H. 1994 On underlying principles in second language acquisition: German word order rules reconsidered. Ms., Berlin. Pisoni, D. B., and Luce, P. A. 1987. “Acoustic-phonetic representation in word recognition.” In U. H. Frauenfelder and L. K. Tyler, 21–52. Platt, C. B., and MacWhinney, B. 1983. “Error assimilation as a mechanism in language learning.” Journal of Child Language 10(2): 401–14. Platzack, C. 1992. “Functional categories and early Swedish.” In J. M. Meisel, 63–82. Pollock, J.-Y. 1989. “Verb movement, universal grammar, and the structure of IP.” Linguistic Inquiry 20(3): 365–424. Posner, M. I. (ed.) 1989. Foundations of cognitive science. Cambridge, Ma: MIT Press. Posner, M. I., and Snyder, C. R. R. 1975. “Facilitation and inhibition in the processing of signals.” In P. M. A. Rabbitt and S. Dornic, 669–82. Premack, D. 1990. “The infant’s theory of self-propelled objects.” Cognition 36: 1–16. Prince, A., and Smolensky, P. 1993. Optimality Theory: Constraint interaction in generative grammar. Technical report RuCCs TR-2, Rutgers Center for Cognitive Science. Prince, E. F. 1988. “Discourse analysis: a part of the study of linguistic competence.” In F. J. Newmeyer. Vol. II, 164–82. Pritchett, B. 1987. Garden path phenomena and the grammatical basis of languge processing. Doctoral dissertation, Harvard University. Pritchett, B. 1992. Grammatical competence and parsing performance. Chicago, Il: University of Chicago Press. Putnam, H. 1975a. “The meaning of meaning.” In H. Putnam, 215–71. Putnam, H. (ed.) 1975b. Mind, language and reality. Vol. 2. Cambridge, U.K.: Cambridge University Press. Putnam, H. 1989. Representation and reality. Cambridge, Ma: MIT Press. Pustejovsky, J. 1991. “The syntax of event structure.” Cognition 41(1): 47–81. Pye, C. 1989. “Synthesis/commentary. The nature of language.” In M. L. Rice and R. L. Schiefelbusch, 127–30. Pylyshyn, Z. 1991. “Rules and representations: Chomsky and representational realism.” In A. Kasher, 231–51. Quirk, R., Svartvik, J., Greenbaum, S., and Leech, G. 1991. A comprehensive grammar of the English language. Longman: London Rabbitt, P. M. A., and Dornic, S. (eds.) 1975. Attention and performance V. New York: Academic Press. Randall, J. 1990. “Catapults and pendulums: The mechanisms of language acquisition.” Linguistics 28(6): 1381–1406. Randall, J. 1992. “The catapult hypothesis: An approach to unlearning.” In J. Weissenborn et al., 93–138. Rauschecker, J. P., and Marler, P. (eds.) 1987. Imprinting and cortical plasticity. New York: J. P. Wiley and Sons. Reber, A. 1976. “Implicit learning of synthetic languages: The role of instructional set.” Journal of Experimental Psychology: Learning, memory, and cognition 15(5): 1083–1100.
REFERENCES
437
Reber, A. 1989. “On the relationship between implicit and explicit modes in the learning of a complex rule structure.” Journal of Experimental Psychology: General 118(3): 219–235. Reber, A., Kassin, S., Lewis, S., and Cantor, G. 1980. “On the relationship between implicit and explicit modes in the learning of a complex rule structure.” Journal of Experimental Psychology, Human Learning and Memory 6(5): 492–502. Reinhart, T. 1976. The syntactic domain of anaphora. Doctoral dissertation, MIT. Reinhart, T. 1983. Anaphora and semantic interpretation. Chicago: University of Chicago Press. Rescher, N. (ed.) 1967. The logic of decision and action. Pittsburgh: University of Pittsburgh Press. Rice, M. L., and Schiefelbusch, R. L. (eds.) 1989. The teachability of language. Baltimore, Md: Paul H. Brookes. Richards, J. C. (ed.) 1978. Understanding second and foreign language learning. Rowley, Ma: Newbury House. Ritchie, W. C., and Bhatia, T. K. (eds.) 1996. Handbook of language acquisition. New York: Academic Press. Ritter, F. 1995. “Empirical constraints and cognitive modelling.” Paper to The Turing Hypothesis Revisited: New perspectives on architectures of cognition. Einstein Forum/ Potsdam University, 13 October, 1995. Robinson, P. J. 1996. “Learning simple and complex second language rules under implicit, incidental, rule-search, and instructed conditions.” Studies in Second Language Acquisition 18(1): 27–67. Robinson, P. J. 1997a. “Generalisability and automaticity of second language learning under implicit, incidental, enhanced, and instructed concitions.” Studies in Second Language Acquisition 19(2): 223–47. Robinson, P. J. 1997b. “Individual differences and the fundamental similarity of implicit and explicit adult second language learning.” Language Learning 47(1): 45–99. Robinson, P. J., and Ha, M. A. 1993. “Instance theory and second language rule learning under explicit conditions.” Studies in Second Language Acquisition 15(4): 413–35. Roca, I. M. 1989. “The organisation of grammatical gender.” Transactions of the Philological Society 87(1): 1–32. Rochemont, M. 1986. Focus in generative grammar. Amsterdam: John Benjamins. Roediger, H. L. III. 1990. “Implicit memory: Retention without remembering.” American Psychologist 45(10): 1043–56. Roeper, T. 1999. “On universal bilingualism.” Bilingualism: Language and cognition. 2(3): 169–86. Roeper, T., and Williams, E., (eds.) 1987. Parameter Setting. Dordrecht: Reidel. Romaine, S. 1989. Bilingualism. Oxford: Blackwell. Rosch, E. 1973. “On the internal structure of perceptual and semantic categories.” In T. E. Moore, 111–44. Rosch, E. 1975. “Cognitive representations of semantic categories.” Journal of Experimental Psychology: General 104: 192–233.
438
REFERENCES
Rosch, E. and Mervis, C. B. 1975. “Family resemblances: Studies in the internal structure of categories.” Cognitive Psychology 7: 573–605. Rosch, E., Mervis, C. B., Gray, W. D., Johnson, J., and Boyes-Braem, P. 1976. “Basic objects in natural categories.” Cognitive Psychology 8: 382–439. Ross, J. R. 1972. “Endstation Hauptwort: The category squish.” Chicago Linguistic Society 8: 316–28. Ross, J. R. 1973a. “A fake NP squish.” In C.-J. N. Bailey and R. W. Shuy, 96–140. Ross, J. R. 1973b. “Nouniness.” In O. Fujimura, 137–258. Rothstein, S. 1983. The syntactic forms of predication. Doctoral dissertation, MIT. Reproduced (1985) by the Indiana University Linguistics Club. Rumelhart, D. E., and McClelland, J. L. 1986. “On learning the past tenses of English verbs.” In D. E. Rumelhart et al., 216–71. Rumelhart, D. E., and McClelland, J. L. 1987. “Learning the past tenses of English verbs: Implicit rules or parallel distributed processing.” In B. MacWhinney, 195–248. Rumelhart, D. E., McClelland, J. L., and the PDP Research Group (eds.) 1986. Parallel distributed processing. Explorations in the microstructures of cognition. Vol. 2. Psychological and biological models. Cambridge, Ma: Bradford Books/MIT Press. Rutherford, W. and Sharwood Smith, M. (eds.) 1989. Grammar and second language teaching: A book of readings. New York: Newbury House. Sagey, E. 1986. The representation of features and relations in nonlinear phonology. Doctoral dissertation, MIT. Appeared 1991 in the Distinguished Dissertations Series, New York: Garland Press. Saleemi, A. P. 1992. Universal grammar and language learnability. Cambridge: Cambridge University Press. Salica, C. 1981. Testing a model of corrective discourse. M.A. in TESL thesis, University of California, Los Angeles. Sasaki, Y. 1992. Paths of processing strategy transfer in learning Japanese and English as foreign languages: A competition model approach. Doctoral dissertation, University of Illinois at Urbana-Champaign. Sasaki, Y. 1994. “Paths of processing strategy transfers in learning Japanese and English as foreign languages.” Studies in Second Language Acquisition 16(1): 43–72. Sato, C. J. 1986. “Conversation and interlanguage development: Rethinking the connection.” In R. Day, 23–45. Sato, C. J. 1990. The syntax of conversation in interlanguage development. Tübingen: Gunter Narr. Schachter, J. 1986. “Three approaches to the study of input.” Language Learning 36(2): 211–25. Schachter, J. 1988. “Second language acquisition and its relationship to universal grammar.” Applied Linguistics 9(3): 220–35. Schachter, J. 1990. “On the issue of completeness in second language acquisition.” Second Language Research 6(2): 93–124. Schachter, J. 1991. “Issues in the accessibility debate: A reply to Felix.” In L. Eubank, 105–16.
REFERENCES
439
Schachter, J. 1992. “A new account of language transfer.” In S. M. Gass and L. Selinker, pp. 32–46. Schmidt, R. 1990. “The role of consciousness in second language learning.” Applied Linguistics 11(2): 127–58. Schmidt, R. 1993. “Awareness and second language acquisition.” Annual Review of Applied Linguistics 13: 206–26. Schmidt, R. 1994. “Deconstructing consciousness in search of useful definitions for applied linguistics.” Consciousness in second language learning, AILA Review 11: 11–26. Schmidt, R. (ed.) 1995. Attention and consciousness in foreign language learning. Technical Report 9. Honolulu, Hi: University of Hawai’i, Second Language Teaching and Curriculum Centre. Schütze, C. 1995. The empirical base of linguistics: Grammaticality judgements and linguistic methodology. Chicago: University of Chicago Press. Schumann, J. 1978. “The acculturation model for second language acquisition.” In R. Gingras, 27–50. Schwartz, B. D. 1986. “The epistemological status of second language acquisition.” Second Language Research 2(2): 120–59. Schwartz, B. D. 1987. The modular basis of second language acquisition. Ph.D. dissertation, University of Southern California. Schwartz, B. D. 1991. “Conceptual and empirical evidence: A response to Meisel.” In L. Eubank, 277–304. Schwartz, B. D. 1992. “Testing between UG-based and problem-solving models of L2A: Developmental sequence data.” Language Acquisition 2(1): 1–20. Schwartz, B. D. 1993. “On explicit and negative data effecting and affecting competence and linguistic behavior.” Studies in Second Language Acquisition 15(2): 147–63. Schwartz, B. D. 1998. “Back to basics in generative second language acquisition research.” Paper to the Colloquium “UG access in L2 acquisition: Reassessing the question”, Second Language Research Forum, University of Hawai’i at Manoa, 15–18 October, 1998. Schwartz, B. D., and Gubala-Ryzak, M. 1992. “Learnability and grammar reorganisation in L2A: Against negative evidence causing the unlearning of verb movement.” Second Language Research 8(1): 1–38. Schwartz, B. D., and Sprouse, R. 1994. “Word order and nominative case in non-native language acquisition: A longitudinal study of (L1 Turkish) German interlanguage.” In T. Hoekstra and B. D. Schwartz, 317–68. Schwartz, B. D., and Sprouse, R. 1996. “L2 cognitive states and the Full Transfer/Full Access model.” Second Language Research 12(1): 40–72. Schwartz, B. D., and Tomaselli, A. 1990. “Some implications from an analysis of German word order.” In W. Abraham et al., 251–74. Schwartz, B. D., and Vikner, S. 1989. “All verb second clauses are CPs.” Working Papers in Scandinavian Syntax 43: 27–49.
440
REFERENCES
Schwartz, M. F., Marin, O. S. M., and Saffran, E. M. 1979. “Dissociations of language function in dementia: A case study.” Brain and Language 7(3): 277–306. Searle, J. 1969. Speech acts. Cambridge: Cambridge University Press. Searle, J. 1979. Expression and meaning: Studies in the theory of speech acts. Cambridge: Cambridge University Press. Searle, J. 1983. Intentionality: An essay in the philosophy of mind. Cambridge: Cambridge University Press. Seliger, H., and Long, M. (eds.) 1983. Classroom-oriented research in second language acquisition. Rowley, Ma: Newbury House. Selinker, L. 1972. “Interlanguage.” IRAL 10(3): 209–30. Selkirk, E. O. 1984. Phonology and syntax: The relation between sound and structure. Cambridge, Ma: MIT Press. Sharwood Smith, M. 1979. “Strategies, learner transfer and the simulation of of the language learner’s mental operations.” Language Learning 29(2): 345–61. Sharwood Smith, M. 1986. “The Competence/Control model, crosslinguistic influence and the creation of new grammars.” In E. Kellerman and M. Sharwood Smith, 10–20. Sharwood Smith, M. 1991. “Speaking to many minds: On the relevance of different types of language information for the L2 learner.” Second Language Research 7(2): 118–32. Sharwood Smith, M. 1993 “Input enhancement in instructed SLA: Theoretical bases.” Studies in Second Language Acquisition 15(2): 165–79. Shiffrin, R. M., and Schneider, W. 1977. “Controlled and automatic human information processing: II. Perceptual learning, automatic attending, and a general theory.” Psychological Review 84(1): 127–90. Shillcock, R., and Bard, E. G. 1993. “Modularity and the processing of closed-class words.” In G. Altmann and R. Shillcock, 163–85. Siegel, D. 1974. Topics in English morphology. Doctoral dissertation, MIT. Simon, H. A., and Kaplan, C. A. 1989. “Foundations of cognitive science.” In M. I. Posner, 1–47. Singh, R., D’Anglejan, A., and Carroll, S. 1982. “Elicitation of inter-English.” Language Learning 32(2): 271–88. Slobin, D. I. (ed.) 1971. The ontogenesis of grammar: A theoretical symposium. New York: Academic Press. Slobin, D. I. 1973. “Cognitive prerequisites for the development of grammar.” In C. A. Ferguson and D. I. Slobin, 175–208. Slobin, D. I. 1977. “Language change in childhood and in history.” In J. Macnamara, 185–214. Slobin, D. I. 1979. Psycholinguistics. 2nd edition. Glenview, Ill: Scott, Foresman. Slobin D. I. 1985a. “Crosslinguistic evidence for the language-making capacity.” In D. I. Slobin, 1157–1256. Slobin, D. I. (ed.), 1985b. The crosslinguistic study of language acquisition. Vol. I and II. Hillsdale, N.J.: Erlbaum.
REFERENCES
441
Slobin, D. I. 1991. “Can Crain constrain the constraints? Commentary, pp. 633–4, to Stephen Crain’s Language acquisition in the absence of experience.” Behavioral and Brain Sciences 14(4): 597–650. Slobin, D. I., and Bever, T. G. 1982. “Children use canonical sentence schemas: A crosslinguistic study of word order and inflections.” Cognition 12: 229–65. Smith, E. E., and Medin, D. L. 1981. Categories and concepts. Cambridge, Ma: Harvard University Press. Smith, F., and Miller, G. A. (eds.) 1966. The genesis of language. Cambridge, Ma: MIT Press. Smith, K. H. 1966. “Grammatical intrusions in the recall of structured letter pairs: Mediated transfer or positional learning?” Journal of Experimental Psychology 72(3): 580–8. Smith, K. H. 1969. “Learning cooccurrence restrictions: Rule learning or rote learning?” Journal of Verbal Learning and Verbal Behaviour 8(2): 319–21. Smith Cairns, H. 1991. “Not in the absence of experience. Commentary, pp. 614–5, to Stephen Crain’s Language acquisition in the absence of experience.” Behavioral and Brain Sciences 14(4): 597–650. Snow, C. 1972. “Mothers’ speech to children learning language.” Child development 43: 549–65. Snow, C. 1977. “Mothers’ speech research: From input to interaction.” In C. Snow and C. A. Ferguson, 31–49. Snow, C., and Ferguson, C. A. (eds.). 1977. Talking to children: Language input and acquisition. Cambridge: Cambridge University Press. Snow, C. E., and Goldfield, B. A. 1983. “Turn the page please: Situation-specific language acquisition.” Journal of Child Language 10(3): 551–69. Sokolik, M. E., and Smith, M. E. 1992. “Acquisition of gender to French nouns in primary and secondary language: A connectionist model.” Second Language Research 8(1): 39–58. Sorace, A. 1993. “Incomplete vs. divergent representations of unaccusativity in non-native grammars of Italian.” Second Language Research 9(1): 22–47. Sorace, A. in press. “Near-nativeness, optionality and L1 attrition.” Proceedings of the 12th international symposium of theoretical and applied linguistics. Thessaloniki: Aristotle University of Thessaloniki. Spada, N. 1986. “The interaction between types of content and type of instruction: Some effects on the L2 proficiency of adult learners.” Studies in Second Language Acquisition 8(2): 181–99. Spada, N., and Lightbown, P. M. 1993. “Instruction and the development of questions in L2 classrooms.” Studies in Second Language Acquisition 15(2): 205–24. Spelke, E. S. 1982. “Perceptual knowledge of objects in infancy.” In J. Mehler et al., 409–32. Spelke, E. S. 1985. “Perception of unity, persistence, and identity: Thoughts on infants’ conceptions of objects.” In J. Mehler and R. Fox, 89–113.
442
REFERENCES
Spelke, E. S. 1988. “Where perceiving ends and thinking begins: The apprehension of objects in infancy.” In A. Yonas, 197–234. Spelke, E. S. 1990. “Principles of object perception.” Cognitive Science 14(1): 29–56. Spencer, A. 1991. Morphological theory. Oxford: Blackwell. Sperber, D., and Wilson, D. 1986. Relevance: Communication and cognition. Cambridge, MA: Harvard University Press. Spiro, R. J., Bruce, B. C., and Brewer, W. F. (eds.) 1980. Theoretical issues in reading comprehension. Perspectives from Cognitive Psychology, Linguistics, Artificial Intelligence, and Education. Hillsdale, N.J.: Erlbaum. Starkey, P., and Cooper, R. G. 1980. “Perception of numbers by human infants.” Science 210(Nov. 28th): 1033–35. Starkey, P., Spelke, E. S. and Gelman, R. 1983. “Detection of intermodal numerical correspondences by human infants.” Science 222 (Oct. 14): 179–81. Starkey, P., Spelke, E. S., and Gelman, R. 1990. “Numerical abstraction by human infants.” Cognition 36: 97–127. Stowell, T. 1981. Origins of phrase structure. Doctoral dissertation, MIT. Strauss, S. (ed.) 1982. U-shaped behavioral growth. New York: Academic Press. Streeter, L. A. 1976. “Language perception of two-month-old infants shows effects of both innate mechanisms and experience.” Nature 259 (Jan. 1 and 2): 39–41. Swain, M., and Carroll, S. 1987. “The immersion observatory study.” In B. Harley et al., 190–263. Swain, M., and Lapkin, S. 1982. Evaluating bilingual education: A Canadian case study. Clevedon, Avon: Multilingual Matters. Surridge, M. 1985. “Le genre grammatical des composés en français.” The Canadian Journal of Linguistics 30(3): 247–72. Surridge, M. 1986. “Genre grammatical et dérivation lexicale en français.” The Canadian Journal of Linguistics 31(3): 267–72. Tanaka, J. 1999. Implicit/explicit learning of focus marking in Japanese as a foreign language: A case of learning through output and negative feedback. Doctoral dissertation, University of Toronto. Tanenhaus, M. K., Carlson, G. N., and Seidenberg, M. S. 1985. “Do listeners compute linguistic representations?” In D. R. Dowty et al., 359–408. Tarone, E. 1988. Variation in interlanguage. London: Edward Arnold. Taylor, J. R. 1989. Linguistic categorisation: Prototypes in linguistic theory. 2nd edition. Oxford: Clarendon Press. Taylor, J. R. 1994. “Fuzzy categories in syntax: The case of possessives and compounds in English.” Rivista di Linguistica 6(2): 327–45. Taylor-Browne, K. 1984. The acquisition of grammatical gender by children in French immersion programmes. Unpublished M.A. thesis, University of Calgary, Calgary Canada. Tees, R. C., and Werker, J. F. 1984. “Perceptual flexibility: Maintenance of the recovery of the ability to discriminate non-native speech sounds.” Canadian Journal of Psychology 38b: 579–90.
REFERENCES
443
Tenny, C. 1987. The grammaticalisation of aspect and affectedness. Doctoral dissertation, MIT. Thagard, P. R., and Nisbett, R. E.. 1982. “Variability and confirmation.” Philosophical Studies 50: 250–67. Reprinted in R. E. Nisbett (1993b), 55–69. Thomas, M. 1989. “The interpretation of reflexive pronouns by non-native speakers.” Studies in Second Language Acquisition 11(3): 281–303. Thomas, M. 1991. “Do second language learners have ‘rogue’ grammars of anaphora?” In L. Eubank, 375–88. Thomas, M. 1993. Knowledge of reflexives in a second language. Amsterdam: John Benjamins. Thomas, M. 1995. “Acquisition of the Japanese reflexive zibun and movement of anaphors in Logical Form.” Second Language Research 11(3): 206–34. Thompson, H., and Altmann, G. T. M. 1990. “Modularity compromised: Selecting partial hypotheses.” In G. Altmann, 324–44. Tomaselli, A., and Schwartz, B. D. 1990. “Analysing the acquisition stages of negation in L2 German: Support for UG in adult SLA.” Second Language Research 6(1): 1–38. Tomasello, M. 1992. First verbs: A case study of early grammatical development. Cambridge, U.K: Cambridge University Press. Tomasello, M. 2000. “Do young children have syntactic competence?” Cognition 74: 209–53. Tomasello, M., and Herron, C. 1988. “Down the garden path: Inducing and correcting overgeneralisation errors in the foreign language classroom.” Applied Psycholinguistics 9(3): 237–46. Tomasello, M., and Herron, C. 1989. “Feedback for language transfer errors: The garden path technique.” Studies in Second Language Acquisition 11(4): 385–95. Tomlin, R. S., and Villa, V. 1994. “Attention in cognitive science and second language acquisition.” Studies in Second Language Acquisition 15(2): 183–203. Towell, R., and Hawkins, R. 1994. Approaches to second language acquisition. Clevedon, U.K: Multilingual Matters. Trager, G., and Smith, H. L. 1951. An outline of English structure. Norman, Okla: Battenburg Press. Trahey, M., and White, L. 1993. “Positive evidence and preemption in the second language classroom.” Studies in Second Language Acquisition 15(2): 181–204. Travis, L. 1984. Parameters and effects of word order variation. Doctoral dissertation, MIT. Travis, L. 1989. “Parameters of phrase structure.” In M. R. Baltin and A. S. Kroch, 263–79. Trehub, S. E. 1976. “The discrimination of foreign speech contrasts by infants and adults.” Child Development 47: 466–72. Trévise, A., and Noyau, C. 1984. “Adult Spanish speakers and the acquisition of French negation forms: Individual variation and linguistic awareness.” In R. W. Andersen, 165–89.
444
REFERENCES
Trudgill, P. 1972. “Sex, covert prestige and linguistic change in the urban British English of Norwich.” Language in Society 2(2): 215–46. Trudgill, P. 1974. The social differentiation of English in Norwich. Cambridge: Cambridge University Press. Trudgill, P. 1986. Dialects in contact. Oxford: Blackwell. Truscott, J. 1998a. “Instance theory and Universal Grammar in second language research.” Second Language Research 14(3): 257–91. Truscott, J. 1998b. “Noticing in second language acquisition: A critical review.” Second Language Research 14(2): 103–35. Tsimpli, I-M. and Rousseau, A. 1991. “Parameter resetting in L2?” UCL Working Papers in Linguistics 3: 149–169. London: University College London. Tucker, G. R., Lambert, W. F., and Rigault, A. G. 1977. The French speaker’s skill with grammatical gender. The Hague: Mouton. Tucker, G. R., Lambert, W. F., Rigault, A. G., and Segalowitz, N. 1968. “A psycholinguistic investigation of French speaker’s skill with grammatical gender.” Journal of Verbal Learning and Verbal Behavior 7(2): 312–6. Tulving, E., Schachter, D. L., and Stark, H. A. 1982. “Priming effects in word-fragment completion are independent of recognition memory.” Journal of Experimental Psychology: Learning, Memory, and Cognition 8(4): 336–42. Tyler, L. K. 1989. “The role of lexical representations in language comprehension.” In W. Marslen-Wilson, 439–62. Underwood, G. (ed.) 1978. Strategies of information processing. New York: Academic Press. Vaid, J. (ed.) 1986. Language processing in bilinguals: Psycholinguistic and neuropsychological perspectives. Hillsdale, N.J.: Erlbaum. Valian, V. V. 1990a. “Logical and psychological constraints on the acquisition of syntax.” In L. Frazier and J. de Villiers, 119–45. Valian, V. V. 1990b. “Null subjects: A problem for parameter-setting models of language acquisition.” Cognition 35: 105–22. Valian, V. V. 1993. “Parser failure and grammar change.” Cognition 46: 195–202. Valian, V. V. 1999. “Input, innateness and learning.” Plenary address to GALA ’99, Universität Potsdam, 10–12 September 1999, Potsdam Germany. Valian, V. V., and Coulson, S. 1988. “Anchor points in language learning: The role of marker frequency.” Journal of Memory and Language 27(1): 71–86. Valian, V. V., and Levitt, A. 1996. “Prosody and adults’ learning of syntactic structure.” Journal of Memory and Language 35(4): 497–516. Van der Lely, H. K. J. 1995. “Grammatical specific language impairment in children: Theoretical implications for the modularity of mind hypothesis and language acquisition.” Paper to the 13th Hamburger Kognitionskolloquium: Sprachentwicklung und Sprachentwicklungsstörungen. Universität Hamburg, February 3–4, 1995. Van der Lely, H. K. J. 1996. “Narrative discourse in grammatical specific language impaired children: A modular language deficit?” Journal of Child Language 24(1): 221–56.
REFERENCES
445
VanPatten, B. 1984. “Learners’ comprehension of clitic pronouns: More evidence for a word order strategy.” Hispanic Linguistics 1(1): 57–67. VanPatten, B. 1993. “Grammar teaching for the acquisition-rich classroom.” Foreign Language Annals 26(4): 435–50. VanPatten, B. 1996. Input processing and grammar instruction. Chestnut Hill, N.J.: Ablex. VanPatten, B. and Cadierno, T. 1993a. “Explicit instruction and input processing.” Studies in Second Language Acquisition 15(2): 225–243. VanPatten, B., and Cadierno, T. 1993b. “SLA as input processing: A role for instruction.” Modern Language Journal 77(1): 45–77. VanPatten, B., and Oikkonen, S. 1997. “Explanation vs. structured input in processing instruction.” Studies in Second Language Acquisition 18(4): 495–510. VanPatten, B., and Sanz, C. 1995. “From input to output: Processing instruction and communicative tasks.” In F. Eckman et al., 169–85. Varonis, E. M., and Gass, S. 1985a. “Miscommunication in native speaker/non-native speaker conversation.” Language in Society 14(2): 327–43. Varonis, E. M., and Gass, S. 1985b. “Non-native/non-native conversations: A model for negotiation of meaning.” Applied Linguistics 6(1): 71–90. Velmans, M. 1991. “Is human information processing conscious?” Behavioral and Brain Sciences 14(4): 651–69. Vigil, F., and Oller, J. 1976. “Rule fossilisation: A tentative model.” Language Learning 26(2): 281–95. Vihman, M., Ferguson, C. A., and Elbert, M. 1986. “Phonological development from babbling to speech: Common tendencies and individual differences.” Applied Psycholinguistics 7(1): 3–40. Walker, E. C. T. (ed.) 1977. Explorations in the biology of language. Montgomery, VA: Bradford Books. Wanner, E., and Gleitman, L. R. (eds.) 1982. Language acquisition: The state of the art. Cambridge, U.K: Cambridge University Press. Warren, R. M., Bashford, J. A., and Gardner, D. A. 1990. “Tweaking the lexicon: Organisation of vowel sequences into words.” Perception and Psychophysics 47(5): 423–432. Wason, P. C. 1966. “Reasoning”. In B. Foss, 135–52. Wason, P. C. 1968. “Reasoning about a rule”. Quarterly Journal of Experimental Psychology 20A: 273–81. Wason, P. C. 1977. “Self-contradictions.” In P. N. Johnson-Laird and P. C. Wason (1977b), 114–28. Watson, J. S. 1984. “Bases of causal inference in infancy: Time, space, and sensory relations.” In L. P. Lipsitt and C. Rovee-Collier, 152–65. Waterman, D., and Hayes-Roth, (eds.) 1978. Pattern-directed inference systems. New York: Academic Press. Weinreich, U. 1953. Languages in contact. The Hague: Mouton. Weissenborn, J., Goodluck, H. and Roeper, T. (eds.) 1992. Theoretical issues in language acquisition: Continuity and change in development. Hillsdale, N.J.: Erlbaum.
446
REFERENCES
Werker, J. F., and Logan, J. S. 1985. “Cross-linguistic evidence for three factors in speech perception.” Perception and Psychophysics 37(1): 35–44. Werker, J. F., and Lalonde, C. E. 1988. “Cross-language speech perception: Initial capabilities and developmental change.” Developmental Psychology 24(4): 672–83. Werker, J. F., and Pegg, J. E. 1992. “Infant speech perception and phonological acquisition.” In C. A.Ferguson et al, 285–311. Werker, J. F., and Tees, R. C. 1984a. “Cross-language speech perception: Evidence for perceptual reorganization during the first year of life.” Infant behavior and Development 7(1): 49–63. Werker, J. F., and Tees, R. C. 1984b. “Phonemic and phonetic factors in adult crosslanguage speech perception.” Journal of the Acoustical Society of America 75(6): 1866–78. Wertheimer, M. 1923. “Laws of organization in perceptual forms.” In W. D. Ellis, 71–88. Wesche, M. B. 1994. “Input and interaction in second language acquisition.” In E. Galloway and B. J. Richards, pp. 219–49. Wexler, K. 1990. “On unparsable input in language acquisition.” In L. Frazier and J. de Villiers, 105–17. Wexler, K., and Culicover, P. 1980. Formal principles of language acquisition. Cambridge, Ma: MIT Press. Wexler, K., and Manzini, R. 1987. “Parameters and learnability in binding theory.” In T. Roeper and E. Williams, 1–54. White, L. 1986. “Implications of parametric variation for adult second language acquisition: An investigation of the pro-drop parameter.” In V. Cook, 55–72. White, L. 1985a. “The ‘pro-drop’ parameter in adult second language acquisition.” Language Learning 35(1): 47–63. White, L. 1985b. “The acquisition of parameterized grammars: subjacency in second language acquisition.” Second Language Research 1(1): 1–17. White, L. 1986. “Markedness and parameter setting: Some implications for a theory of adult second language acquisition.” In F. Eckman et al, 309–27. White, L. 1987a. “A note on parameters and second language acquisition.” In T. Roeper and E. Williams, 239–46. White, L. 1987b. “Against comprehensible input: The input hypothesis and the development of second language competence.” Applied Linguistics 8(2): 95–110. White, L. 1988. “Island effects in second language acquisition.” In S. Flynn and W. O’Neil, 144–72. White, L. 1989a. “The principle of adjacency in second language acquisition: Do L2 learners observe the subset principle?” In S. M. Gass and J. Schachter, 134–58. White, L. 1989b. Universal grammar and second language acquisition. Amsterdam: John Benjamins. White, L. 1990/1991. “The verb-movement parameter in second language acquisition.” Language Acquisition 1(4): 337–60.
REFERENCES
447
White, L. 1991a. “Adverb placement in second language acquisition: Some effects of positive and negative evidence in the classroom.” Second Language Research 7(2): 133–61. White, L. 1991b. “Second language competence versus second language performance: UG or processing strategies?” In L. Eubank, 167–89. White, L. 1995. “Chasing after linguistic theory: How minimal should we be?” In L. Eubank et al., 63–71. White, L. 1998. “Universal Grammar in second language acquisition: The nature of interlanguage representation.” Paper to the Colloquium “UG access in L2 acquisition: Reassessing the question”, Second Language Research Forum, University of Hawai’i at Manoa, 15–18 October, 1998. White, L., and Genesee, F. 1996. “How native is near-native? The issue of ultimate attainment in adult second language acquisition.” Second Language Research 12(3): 233–65. Wiesel, T. N., and Hubel, D. H. 1963. “Effects of visual deprivation on morphology and physiology of cells in the cat’s lateral geniculate body.” Journal of Neurophysiology 26: 978–93. Wiesel, T. N., and Hubel, D. H. 1965. “Comparison of the effects of unilateral and bilateral eye closure on cortical unit responses in very young kittens.” Journal of Neurophysiology 28: 1029–40. Wilkins, W. 1993/4. “Lexical learning by error detection.” Language Acquisition 3(2): 121–57. Williams, E. 1980. “Predication.” Linguistic Inquiry 11(1): 203–238. Williams, E. 1981. “Argument structure and morphology.” The Linguistic Review 1(1): 81–114. Williams, E. 1987. “Introduction.” In T. Roeper and E. Williams, vii–xi. Winkler, S. 1994. Secondary predication in English: A syntactic and focus-theoretical approach. Doctoral dissertation, Universität Tübingen. Distributed as Report 64–1994, Arbeitspapiere des Sonderforschungsbereichs 340: Sprachtheoretische Grundlagen für die Computerlinguistik. Universität Stuttgart, Universität Tübingen, IBM Deutschland GmbH. Winitz, H. (ed.) 1981. Proceedings of the New York Academy of Sciences Conference on Native Language and Foreign Language Acquisition. New York: New York Academy of Sciences. Wittgenstein, L. 1953. Philosophical investigations. G. E. M. Anscombe (ed.). Oxford: Blackwell. Wode, H. 1981. Learning a second language: An integrated view of language acquisition. Tübingen: Narr. Wode, H. 1992. “Categorical perception and segmental coding in the ontogeny of sound systems: A universal approach.” In Ferguson et al., 605–31. Wulfeck, B., Juarez, L., Bates, E., and Kilborn, K. 1986. “Sentence interpretation strategies in healthy and aphasic bilingual adults.” In J. Vaid, 199–220.
448
REFERENCES
Yang, I. S. (ed.) 1982. Linguistics in the morning calm. Seoul: Linguistic Society of Korea, Hanshin. Yang, L. R., and Givón, T. 1997. “Benefits and drawbacks of controlled laboratory studies of second language acquisition: The Keck second language learning project.” Studies in Second Language Acquisition 19(2): 173–93. Yonas, A. (ed.) 1988. Perceptual development in infancy. Hillsdale, N.J.: Erlbaum. Zubin, D. A., and Köpcke, K.-M. 1981. “Gender: A less than arbitrary grammatical category.” In R. A. Hendrick et al., 439–49 Zubin, D. A., and Köpcke, K.-M. 1984. “Affect classification in the German gender system.” Lingua 63(1): 41–96. Zubin, D. A., and Köpcke, K.-M. 1986. “Gender and folk taxonomy: The indexical relation between grammatical and lexical categorisation.” In C. G. Craig, 139–80.
Subject index
θ-marking 104 θ-theory 72 A ability 2, 24, 26–28, 30, 33, 35, 37, 40, 60, 64, 69, 78, 83, 104, 111, 133, 134, 135, 136, 139, 149, 183, 205, 208, 209, 216, 217, 219–221, 226, 232, 245, 253, 267, 281, 296, 304, 311, 340, 344, 350, 352, 358, 360, 362, 370, 372, 373, 375, 376, 388–390 ablaut 325, 326 accent 35, 53, 70, 111, 169, 178, 196, 204, 234, 353, 368 acceptability judgement task 26, 27, 60, 163, 186, 205, 297–299, 310, 321, 322 acceptability judgements 60, 64, 88, 89, 158, 313 access/accessing 30, 38, 52–54, 58, 68, 76, 79, 81, 93, 101, 107, 108, 113–115, 123, 139, 155, 157, 163, 196, 197, 199, 210, 241, 242, 245, 248, 250, 264, 270, 279, 283, 386, 393 ACCOMPLISHMENT 326, 344, 345 account of memory 64, 404 acoustic 8, 10, 12, 16, 28, 35, 49, 63, 73, 77, 82, 121, 123, 129, 171, 180, 181, 182, 187, 200–202, 206, 209, 223, 225, 256, 279, 280, 283, 287, 292, 349, 359, 362, 386, 389
ACTION 81, 139, 171, 228, 324–326, 344, 345 ACTIVITY 326, 344 ACTOR 81 Adj 136, 223 adjacency 60, 62, 297, 299, 342 Adjacency Condition on Case assignment 297 adjective 73, 77, 102, 135, 152, 180, 200, 230, 313, 323, 345, 362, 369, 370 adjective phrase 73, 102 AdjP 73 adjunct 85, 86, 102, 104, 115, 155, 299, 317, 342 adverb 102, 135, 158, 159, 177, 276, 277, 286, 296–299, 306, 342 AGENT 62, 166, 236, 237, 342 agentivity 62 Agr 116, 159, 276, 278 agreement 44, 47, 62, 73, 87, 104, 152, 153, 158–163, 254, 269, 271, 313, 341, 360–362, 370 analogy 61, 91, 135 analysed knowledge 26, 367 analytic implication 176 anaphora 104, 106, 116, 155 analysis and control 164, 254, 357 argument 73, 85, 87, 114, 200, 299, 342 argument structure 17, 72, 85, 129, 236, 284, 344 articulatory 73, 127, 356, 364 articulatory feature 341
450
SUBJECT INDEX
artificial language 182, 183, 289, 310–312, 320, 340 aspect 162, 177, 284, 294, 359 association 12, 33, 60, 68, 75, 80, 135, 153, 162, 197, 198, 201, 361, 362, 370 association lines 62 attention 23, 25, 27, 35, 56, 106, 127, 136, 146, 149, 180, 181, 193–195, 204, 206, 219–221, 226, 237, 244, 250, 291, 292, 301, 306, 307, 310, 316, 317, 319, 340, 347–350, 356, 357, 361, 363, 364, 366, 368, 375, 385–388 ATTRIBUTE 139 attrition 47, 353 automaticity 254, 303 autonomous processing 149 autonomous representations 27, 141, 187, 268, 273, 368 Autonomy Hypotheses 187, 189, 190, 194, 198, 203, 271 autonomy 43, 189, 195, 197, 198, 271, 274, 285 autosegmental 62 auxiliary 9, 75, 98, 157, 158, 276, 277, 282, 297, 301, 309. 359 awareness 27, 57, 101, 127, 129, 136, 175, 195, 196, 250, 254, 256, 262, 342, 349, 356, 357, 361, 364, 367, 368, 372, 386, 387, 392, 394 awareness constraint on negative evidence 128 B Basic Level of Categorisation Constraint 219, 221 bilingual lexicon 4 binding 72, 97, 128, 254, 262, 263, 269, 271, 285, 341 binding domain 97 blame assignment problem xi, xii, 354–356, 366, 386
bootstrapping 75–79, 81, 84, 136, 179, 181, 208, 228, 247, 272 borrowing 47, 131 bottom-up processors 123 branching direction 102, 104, 116 Bucket Brigade Algorithm 173, 203 C c-command 12, 40, 49, 55, 57, 60, 62, 102–104, 107, 116, 188, 190, 258, 269, 344 c-selection 129 canalisation 108–111, 119, 130, 210, 272 canonical word order 56, 58 case assignment 115, 130, 297 case relations 236 case theory 72, 285 Case Adjacency Parameter 297 case-marking 62, 304, 311 catapult problem 46 categorial features 73, 128, 184 categorical rules 311 categorisation 108, 131, 136, 139, 144, 147, 164, 166, 168, 172, 175, 179, 184–186, 217, 219, 221, 341, 357, 366 Categorisation Constraint 219, 221 categorisation judgement 139, 140–142, 164, 366 category formation 79, 80, 82, 141, 175, 393 category learning 79, 80 causality 218, 354 CAUSATION 76, 359 central processor 124, 197, 250, 255–258, 260, 263–265, 279, 287 central tendency 82, 149, 352 checks 291 clarification requests 291, 389 classifiers 167, 174 clefts 233, 281, 282 co-variation 80, 198, 201
SUBJECT INDEX code switching 47 Coding Constraint 197–200, 202, 203 cognate 82, 177, 179, 200 Cognitive Linguistics 205 Cognitive Principle of Relevance 374 Cohort Model 267 Common Movement Principle 216 Communicative Principle of Relevance 374, 375, 380 Comp 18, 47, 73, 156 competence 35, 69, 100, 177, 199, 209, 240, 245, 256, 259, 260, 274, 280, 284, 286, 343, 352, 378, 394 competence and performance 24 competition 52, 136, 144, 147, 151, 152, 172, 174 Competition Model 33, 38, 40, 45, 48–50, 59, 60, 62, 63, 68, 80, 99, 151, 169, 173, 177, 186, 187, 205, 304 componentiality 251 compound 370 compounding 22 comprehended speech 9, 33 comprehended input 13 comprehended intake 14 Comprehensible Input Hypothesis 9, 12–15 comprehension 14, 15, 27, 116, 121, 150, 163, 193, 242, 289, 291–293, 297, 306, 308, 311, 342–344, 350, 391 comprehension checks 291 concept learning 3, 9, 174 conceptual categories 5, 17, 76, 79, 152, 199, 229, 244 conceptual features 144 Conceptual Semantics 6, 138, 139, 168, 392 conceptual structures 17, 21, 35, 73, 76, 113, 120, 121, 123, 124, 126–128, 139, 140, 142, 143, 147, 150, 164, 165, 172, 174, 179, 202, 203, 249,
451
263, 265, 272, 278, 279, 283, 342, 343, 366, 367, 372, 388, 386, 392 condition-action rules 141–145, 149, 166, 172–174, 387, 388 Connected Surface Principle 216 connectionism 38, 60, 91 conscious awareness 9, 27, 57, 195, 254, 256, 349, 356, 387 consciousness 112, 127, 196, 244, 250 consonant 9. 10, 22, 24, 73, 80, 143, 147, 196, 303 constraints on induction 192, 193, 195, 1998, 199, 204 constraints on null subjects 341 constraints on rule formation 144, 176 constraints problem 51, 52, 120, 188, 249 conversion 321, 322, 324–329, 333, 339, 340, 344, 346 Cooperative Principle 34, 377, 378, 380, 390 Correction Interpretation 373, 376, 382 corrective intention 231, 232, 238, 348, 372, 373, 376–378, 380, 383, 385, 386, 389, 390 correspondence processors 124, 125 correspondence rules 76, 121–124, 126, 143, 154, 184, 190, 203, 286, 351 critical period 63, 95 cross-linguistic speech perception 33 cue 18, 29, 33, 39, 45, 46, 62, 63, 66, 75–80, 82–84, 96, 98–112, 114, 136, 143, 144, 146, 147, 152, 15, 155, 157, 158, 160, 161, 163, 164, 176, 177, 180–183, 185–187, 191–193, 205, 216, 218, 234, 236, 243, 264, 285, 296, 305, 306, 308, 309, 312, 322, 323, 326, 347, 351, 358, 359, 360, 364, 369, 370 cue-based learning 79, 80, 84, 146 D d-structure 72, 114, 277
452
SUBJECT INDEX
declarative knowledge 26, 35 deduction 150, 171, 176, 245 deductive reasoning 149, 150 deductive rule 149, 150 default hierarchies 172, 203, 364 deism 51, 56 Det 136, 154, 223, 358, 360 detectable cues 79 detectable error 351–353 Detectable Error Hypothesis 351–353 determiner 152, 355, 356, 358, 360, 361, 370, 373, 378, 385, 386 DetP 104, 154, 358 developmental orders 46, 160, 161, 294 developmental paths 46, 105, 342 developmental problem 69, 75, 89, 104, 106, 111, 190, 236, 243, 244, 371, 391 distributional analysis 78, 179, 183, 184, 187, 191, 310 do-support 276, 277, 297 double object construction 52, 179, 303, 319, 321, 339, 363–365 E ECP 258, 286, 287 Elsewhere Condition 145, 151, 174 empirical problem of language acquisition 207, 211, 213, 239 empirical rules 142, 145, 166 empty category 87, 258, 277, 287 Empty Category Principle 258 encapsulation 123, 255, 262, 265–267, 269, 274, 280 epenthesis 81 equivalence classes 108 equivalence classification 82, 143, 144, 147, 360, 362 error 23, 27, 32, 69, 89, 128, 169, 174, 175, 188, 192, 197, 203, 206, 231–233, 254, 291, 296, 298–300, 302, 303, 305, 314–319, 329, 342,
347–358, 361, 364, 366, 370, 371, 375, 378, 382, 385, 386, 391, 392 EVENT 137, 139, 171, 229, 272, 325, 326, 344, 345 event structures 54, 284 EXPERIENCER 81, 236 expletive subjects 47, 88 explicit grammatical correction 230 Extended Projection Principle 87 F faculty 24, 30, 31, 58, 65, 71, 84, 89, 106, 109, 115, 116, 120–123, 126, 134, 184, 193, 207, 249–251, 253, 254, 259–261, 268, 281, 283, 290, 348, 356, 367 Featural Deficit Hypothesis 253 features 28, 35, 41, 42, 63, 72–74, 76, 77, 81, 83–87, 99, 109–111, 114, 117, 128, 129, 139, — 142, 144, 146, 147, 149, 152–157, 161, 164, 166, 171, 175–177, 184, 185, 189, 196, 199, 201, 202, 204, 205, 209, 220, 226, 230, 237, 238, 246, 247, 252, 253, 254, 255, 265, 271, 280–282, 316, 323, 324, 341, 350, 356, 358, 360, 361, 364, 367, 369 feedback and correction 1, 2, 5, 6, 13, 17, 31, 32, 34, 39, 40, 50, 61, 123, 127–130, 138, 164, 193, 204, 222, 230, 257, 289, 290, 295, 300, 303, 305, 313, 316, 318, 321, 340, 341, 347, 349, 350, 355, 362, 363, 366–368, 371–377, 380, 381, 385, 386, 389, 390–392, 394 Feedback Constraint 202 feet 55, 73, 79, 181, 218, 365, 372 Felicity Condition for the provision of feedback and correction 381 FIGURE 236 finiteness 87, 97, 98, 104, 105, 160–162, 176, 177, 341
SUBJECT INDEX First Principle of Correction Interpretation 373 focused attention 27, 348, 357, 390 focused sampling 206 foot 77, 318, 350, 365 form extraction 179, 187 form-focused instruction 294, 301, 309 form-meaning 20, 23, 33, 293, 316, 343, 347, 373, 386 formation rules 122 fossilisation 169, 246 Full Access Hypothesis 79, 81, 113 Full Transfer 79, 81, 113 functional architecture 11, 33, 48, 58, 64, 65, 89, 112, 126, 151, 192, 197, 204, 215, 222, 249, 252, 283, 343 functional category 47, 48, 73, 77, 104, 113, 114, 160, 163, 176, 180, 183, 187, 229, 276, 301 Fundamental Difference Hypothesis 51, 54 fundamental frequency 78, 180–182, 204, 205, 235, 237 G Garden Path condition 314, 316 gender 66, 84, 111, 152–155, 179, 199, 311, 355–362, 367, 370 gender feature 83, 152, 154, 358, 360, 361, 369 general nativism 64, 214 general theory of learning 39, 43, 50, 215, 393 generalisation 91, 119, 145, 148, 149, 155, 164–167, 174, 290, 314, 316, 317, 319, 320, 333, 337, 345 Generalised Phrase Structure Grammar 71 Germanic foot 77 gestalt principles 216, 217 GOAL 159 government 115, 254, 271
453
Government and Binding 254 grammatical instruction 257, 289, 294, 299, 306, 313, 340, 342 grammatical restructuring 5, 6, 14 Greediness Constraint 113 GROUND 236 H head 47, 63, 73, 85–86, 96, 102, 111, 115, 134, 160, 177, 205, 206, 246, 297, 301, 359, 370 Head Constraint 84, 85 head direction 86, 95, 96, 102, 104, 114–116, 155, 157–159 Head Driven Phrase Structure Grammar 71 Hypothesis of Cognitive Immaturity 221, 222 Hypothesis of Cognitive Innocence 213, 221 Hypothesis of Levels 121, 348 Hypothesis of Linguistic Innocence 213, 214 hypothesis-formation 71, 120, 191 hypothesis-testing 71, 120 I I-language 24, 25, 28, 32, 65, 92 i-learning 130, 135, 136, 141, 142, 144, 152, 154, 164, 168, 169, 170–174, 179, 184, 185, 191–195, 197–199, 201, 204, 207, 208–211, 213, 214, 218, 220, 226, 233–235, 239, 245, 249, 250, 254, 257, 259, 261, 262, 278, 285, 341, 342, 347, 351, 357, 371 implication 145, 176, 274 implicature 281, 282 indirect feedback 232, 292, 328, 376, 390 indirect negative evidence 116, 290, 291, 350 indirect prompting 337, 389
454
SUBJECT INDEX
INDIVIDUAL 228–230, 237, 326 induction 16, 31, 37, 39, 40, 48–51, 59, 61, 67, 75, 82, 94, 101, 106, 109, 114, 116, 119, 120, 126, 127, 130, 131, 132–136, 138, 141, 142, 144, 145, 148, 150, 15l, 155, 164, 166–171, 173, 174, 177–179, 185, 187–193, 195–200, 202–205, 207, 208, 212, 214, 215, 225–227, 230, 235, 241, 244, 257, 261, 262, 284, 285, 287, 289, 304, 305, 308, 316, 324, 341–343, 347, 366, 371, 372, 391–393 inductive reasoning 120, 131, 136, 174, 179, 208 inference 20, 22, 23, 32, 113, 119, 133, 136, 138, 139, 140, 142, 149, 150, 165, 170, 175, 176, 185, 216, 217, 219, 221, 222, 225, 227, 232, 238, 239, 241, 260, 263, 265, 279, 281, 287, 308, 350, 365, 366, 372, 375, 378, 379, 382, 384, 388, 390 inferencing 6, 12, 14, 15, 17, 20, 21, 23, 32, 116, 121, 130, 134, 137, 138, 147, 149, 150, 171, 176, 197, 214, 221, 222, 229, 232, 241, 244, 245, 250, 251, 254, 260, 262, 265, 271, 279, 281, 290, 337, 346, 357, 366, 372, 376, 380, 382, 390, 391 inferential rules 139, 142, 144 infinitive 98, 159, 234, 385 INFL 19, 158, 176, 297, 301, 431 Infl Phrase 19 inflection 116, 161, 162, 325 inflectional features 161 -ing xi, 345 initial state 2, 45, 53, 54, 71, 92, 93, 115, 131, 132, 179, 198, 199, 210, 213, 245 innateness 67, 109, 111, 213, 215, 229, 273 intake 8–14, 18, 31, 34, 65, 192, 224, 292, 304, 343, 371
integration procedures 351 interaction 15, 32, 51, 71, 93, 100, 108, 109, 121, 181, 186, 187, 199, 203, 204, 206. 221, 222, 237, 238, 250, 252, 266–268, 278, 289–292, 294, 301, 307, 312, 325, 326, 331, 347, 349, 354, 367, 371–373, 377, 390–392 Interaction Hypothesis 15, 291, 292 interface 96, 121, 134, 266, 365, 371 Interface Hypothesis 371 intonation 129, 180, 264, 265, 302, 349, 388, 410, 412, 435 intonational phrases 57, 205 intransitive 75, 85, 114, 233, 317 introduction rules 150 island effects 55 J juncture 129, 223 K k-acquisition 255–262, 286 k-learning 255–258, 260–262 L LAD 71, 92, 113, 208, 257–259, 261, 278, 286, 371 language attrition 47 Language Acquisition Device 71, 81, 208, 256 language faculty 30, 58, 65, 71, 84, 89, 115, 116, 120, 121, 123, 126, 134, 184, 207, 249, 253, 254, 260, 261, 268, 281, 283, 348, 356, 367 Law of Large Numbers Heuristic 145, 147 Leamed Linguistic Knowledge 256, 262, 343 learner style 244 learning strategy 77, 78
SUBJECT INDEX learning theory 1, 38, 41, 64, 113, 145, 184, 187, 206, 226, 316, 323, 340, 354, 385 lengthening 78, 182, 205 level of processing 8, 28 level of representation 10 Level-Ordering Hypothesis 34 lexical category 73, 76, 78, 84, 178, 180, 182, 187, 323, 360, 363 lexical constraints on syntactic operations 344 Lexical Functional Grammar 71 lexical interference 47 lexical representation 388 lexicon 4, 48, 54, 72, 76, 82, 85, 86, 104, 123, 124, 126, 154, 177, 185, 193, 194, 201, 236, 244, 251, 255, 257, 267, 268, 270, 271, 279, 283, 284, 293, 344, 345, 360, 361, 363, 369, 372, 386 LF 72, 268, 278, 280 licensing conditions 72, 85 linear order 55, 64, 72, 102, 128, 304, 312, 347, 386 linguistic competence x, 1, 24, 38, 42, 43, 45, 60, 64, 88, 100, 209, 245, 256, 259, 260, 274, 284, 343 locality 258, 355 logical form 72, 250, 262, 266, 280 Logical Implication Constraint 145 logical problem of language acquisition 51, 207, 210, 213, 214, 225, 227, 233, 236, 238, 240, 241, 243, 245 logical subject 19 longterm memory 12, 68, 91, 124, 125, 139, 141, 166, 174, 180, 193, 197, 244, 249, 251, 255, 283, 293, 314, 360, 390 M MANNER OF MOTION 77, 343 MANNER OF ACTION 343 markedness 61, 300, 369
455
maturation 33, 67, 93, 94, 119, 242, 246 meanings 2–4, 6, 21, 50, 124, 137, 138, 178, 219, 228, 229, 235, 241, 247, 283, 293, 355, 383, 392 memory 4, 12, 25, 26, 29, 30, 35, 38, 48, 60, 64, 68, 91, 106, 124, 125, 127, 138, 139, 141, 142, 150, 166–168, 170, 174, 180, 193, 194, 195, 197, 198, 222, 244, 249–251, 255, 265, 283, 286, 293, 314, 348, 358, 360, 363, 368, 386, 390 mental models 17, 136–138, 143, 170, 172, 174, 175, 192, 194, 203, 204, 208, 217, 219, 221, 244, 278, 280, 356, 372, 381 metalinguistic feedback 202, 222, 232, 329, 331–336, 338, 339, 364, 375, 384 metalinguistic information 3, 13, 24 metaprocesses 174, 179, 184, 191, 393, 394 microvariation 46 mind/body problem 136, 196 Minimalism 59, 61, 71, 72 Minimalist 34, 44, 114, 275 minimodules 268 Misattributed Markers Hypothesis 285 MMs 136–139, 141, 146, 147, 149, 150, 164, 169, 192, 194, 204 modals 75, 98, 111, 147, 157, 275–277, 297, 385, 387 modularity 12, 15, 27, 35, 50, 120, 121, 123, 127, 130, 175, 214, 215, 222, 223, 249–252, 258, 260–263, 265, 266, 268–273, 284–286, 341, 347 Modularity of Mind 250, 251, 262, 269 Modularity of Mind Hypothesis 13, 15, 250, 251, 262 module 393 modules 72, 121, 123, 127, 217, 250–252, 254, 255, 257, 261–263,
456
SUBJECT INDEX
265, 266, 268, 269, 271–274, 283–286, 386, 393 monitoring 64, 202, 265, 307 morae 73 morpheme 49, 91, 116, 253, 325 Morphological Richness 46 Morphological Uniformity 46 morphology 3, 53, 105, 162, 163, 177, 242, 247, 268, 270, 306, 309, 311, 312, 322, 323, 325, 344, 356, 357 morphosyntactic categories 47, 75, 79, 84, 111, 182, 209, 229, 255, 287, 360 morphosyntactic feature 9, 17, 73, 76, 81, 142, 144, 153, 254, 271, 280, 323, 341, 369 MOTION 76, 77 motor-articulatory schemas 27, 35 motor-articulatory processes 350 N N 19, 43, 73, 76, 81, 136, 143, 154, 166, 175, 201, 255, 322–324, 328, 330, 331, 334, 360, 370, 383, 384 naive reflexification 248 Native Language Magnet 82 negation 60, 97–99, 102, 138, 156, 157, 159–161, 166, 228, 275, 276, 278, 284, 384 negative evidence 17–21, 23, 32, 49, 50, 61, 69, 116, 127, 128, 158, 212, 241, 256, 257, 289–291, 295, 299, 300, 303, 310, 316, 320, 340, 348, 350, 357, 359, 360, 366, 367, 371, 373, 388, 390 negative feedback 17, 22, 23, 31, 32, 317, 328, 329, 373 negotiation 291–294, 342 neural nets 12, 49 No Crossing Lines 190 No Crossing Lines Constraint 49, 62 No Negative Evidence 50, 256 No Negative Evidence Hypothesis 256
non-native pronunciation 22 nonce form 90, 91 not-movement 274 noun 63, 66, 72, 73, 76, 77, 80, 81, 83, 85, 91, 113, 135, 140, 152–154,172, 180, 184, 206, 209, 219, 221, 223, 224, 228–231, 233, 237, 246, 253, 254, 285, 308, 311, 313, 316–318, 322–328, 339, 344, 345, 355, 356, 358, 361–363, 367, 369, 370, 376, 379 NP 9, 18, 19, 63, 73, 74, 76, 102, 154, 165, 167, 172, 224, 282, 297, 300, 307, 309, 315, 318, 320, 342, 369 nuclei 80, 350 null subject 46, 70, 87, 88, 95, 100, 161, 163, 164, 296, 341 number 17, 30, 73, 216, 220, 311 O observational 93, 94, 119 on-line language processing 13 opacity 158 operating principles 56, 58, 142, 176, 181, 224, 247 operator 140, 164, 165, 167 operators 131, 133, 135, 173, 262 Optimal Relevance 374, 375 Optimality Theory 44, 59, 61, 71, 72, 80, 151, 374 Optionality 61, 296 output autonomy 265, 266 outputs 202, 250, 255, 273, 278, 286 overgeneralisation 131, 145, 310 P P&P 40, 41, 43, 48, 51, 59, 62, 63, 69–72, 74, 75, 80, 85–88, 92, 93, 95, 96, 99–101, 105, 107, 111–113, 115, 116, 120, 155, 156, 159, 177, 185, 186, 190, 199, 257, 259, 268, 269, 273, 275, 276, 278, 285 paradigm learning 47
SUBJECT INDEX paradigms 43, 316, 355 paratactic organisation 60, 66 parse fail 68 parsers 10, 14, 16–20, 26, 31, 39, 61, 63, 68, 76, 82, 92, 136, 141, 144, 152, 169, 177, 190–194, 199, 203, 208, 255, 266, 268, 273, 285, 290, 305, 351, 360, 363, 371, 386 parsing 4, 9, 16, 18–21, 24–27, 32–35, 39, 41, 45, 54, 55, 63, 65, 67, 68, 69, 76, 79, 82, 83, 91, 99, 106, 120, 129, 135, 136, 143, 152, 153, 169, 170, 177, 183, 190, 192–196, 201, 203, 205, 206, 208–210, 241, 243, 267–269, 271–273, 278, 280, 282, 283, 285, 292, 293, 304–306, 308, 312, 343, 349–353, 358, 360, 361, 371, 372, 389, 391 parsing procedure 20, 25, 32–34, 39, 54, 68, 79, 99, 135, 136, 190, 192, 193, 205, 208, 241, 243, 272, 350, 351, 371, 389 particle 73, 75, 77, 97–99, 234 passive construction 3 PATIENT 62, 237, 342 pendulum effect 101 performance constraints 106 peripheral 10, 61, 80, 113, 121, 123, 125, 185 peripheral auditory system 10 periphery 40, 275 PF 72, 280 phoneme 49, 114, 176 phoneme restoration effects 13 phonemic mode 28 phone 39, 111, 114, 174, 200, 201 phonetic category 82, 108, 143, 199, 341, 352, 353, 368 phonetic feature 27, 28, 35, 73, 74, 111, 117, 144, 147, 199, 350 phonetic form 72, 166, 387 phonological form 165, 167, 270, 280, 358, 359, 370
457
phonological words 74, 200, 270 phrase structure 63, 71, 85, 86, 101, 158, 263 phrases 55, 57, 60, 63, 73, 84–86, 113, 144, 152, 161, 205, 229, 231, 233, 275, 293, 324, 326, 342, 343, 363, 373, 388 pitch 204, 279, 364 plural 22, 34, 90–92, 159, 253, 254, 285, 311, 325, 342 positive evidence 17–21, 23, 31, 67, 145, 295, 303, 310, 340, 342, 367 positive feedback 22, 23, 32 postlexical 269, 325 Poverty of the Stimulus Hypothesis 207, 212, 222, 250 pre-compiled 26, 171 precedence 12, 258 predicate 19, 114, 139, 152, 171, 200, 209, 235, 237, 246, 276, 284, 316, 345 predicate adverbs 286 predication 39, 60, 269, 270, 284, 315, 392 preference 147, 160, 163, 172, 178, 184, 185, 205, 219, 234, 297–299, 341, 343, 364 primary linguistic data 17, 392 principle-based parsing 4, 69 principles and parameters 37, 40, 52, 63, 71, 92–94, 107, 163, 242, 248, 263, 277, 278, 284, 393 Principles and Parameters Theory 4, 37, 40, 71, 163, 263, 393 privative feature 153 Pro-Drop Parameter 46, 70 problem-solving 3, 12, 55, 58, 59, 64, 68, 116, 120, 131–135, 137, 138, 164, 174, 195, 196, 209, 245, 251, 254, 263, 270, 271, 286 procedural knowledge 25, 35, 64 PROCESS 230, 324 Processability Theory 4, 64
458
SUBJECT INDEX
processing procedures 76, 147 processors 10–13, 32, 39, 73, 123–125, 131, 142, 147, 190, 197, 202, 203, 250–252, 261, 263, 270, 272, 273, 283, 285, 348 production 25, 27, 53, 54, 57, 60, 63–65, 67, 69, 73, 76, 81–83, 91, 106, 111, 113, 114, 120, 126, 127, 136, 152, 157, 160, 162, 163, 169, 172, 176, 178, 180, 187, 191, 193, 194, 196, 209, 242, 243, 258, 272, 290–293, 295, 297, 305–311, 313, 316, 349, 351–353, 356, 358, 360–362, 366, 368, 370, 385–389, 394 production systems 25, 27, 32, 209 prominence patterns 60, 77 pronoun 47, 64, 84, 87, 102–104, 205, 235, 237, 304–308, 373 pronunciation 21–23, 35, 91, 223, 355, 357, 385 property 59, 61, 64–67, 110, 120, 127, 132, 144–149, 166, 170, 177, 190, 197, 202, 205, 206, 219, 220, 224, 225, 231, 255, 279, 290, 363, 364 proposition 19, 20, 150, 164, 165, 176, 279, 286, 379, 383, 384 propositions 139, 141, 150, 164, 165, 171, 209, 257, 345 prosodic 39, 57, 62, 73, 81, 84, 123, 126, 128, 129, 143, 171, 179–184, 187, 196, 205, 237, 279, 280, 284, 287, 343, 349–351, 357, 372, 394 prosodic bootstrapping 77–79, 179 Prosodic Bootstrapping Hypothesis 75, 78, 181 prosodic category 78, 79, 84, 171, 351 Prosodic Utterance 60 prototype 80, 82, 137, 144, 201, 311, 323 psychogrammar 24, 32, 35, 41, 47, 58, 65, 70, 89, 90, 92, 106, 108, 112, 113, 190, 201, 210, 211, 213–215,
222, 225, 226, 236–239, 243, 244, 274, 278, 284, 292, 340, 347, 360, 366, 367, 388, 389 psychological reality 49, 89, 90 Q q-morphism 168 quantifier 102, 139, 150, 155, 159, 228, 262 question 71, 138, 158, 159, 177, 178, 188, 189, 224–225, 231, 237, 238, 246, 291, 298, 301, 302, 306, 316, 377, 378, 381, 385 R raising 41, 70, 97–99, 104, 158, 159, 161, 163, 177, 201, 277, 295, 297, 341 re-representation 39, 350 recast 23, 24, 110, 138, 165, 231, 291, 389 recategorisation 110 recognition 82, 107, 111, 139, 164, 179, 220, 242, 266, 270, 362, 372 recognition templates 171 redundancy 91, 92, 113 referential categories 73, 84, 87, 113, 209, 229 reflexives 3, 262, 341 Regulation Schema 176 relative-motion cues 216 Relevance Principle 232, 374, 392 Relevance Theory 6, 372, 378, 383 Representational Autonomy 120, 169, 184, 371 representational features 84 Representational Modularity 27, 50, 120, 121, 123, 261, 347 representational problem of language acquisition 65, 69, 238, 391 representational realism 89–93, 101, 115, 245, 252 representational redescription 26, 27, 68
SUBJECT INDEX restructuring 5, 6, 14, 20, 31, 38, 39, 48, 50, 64, 68–70, 83, 99, 112, 148, 156–158, 160, 168, 177, 191, 193, 202, 229, 260, 273, 289, 292–294, 299, 300, 304, 306, 307, 328, 340, 347, 350, 352, 353, 356, 366, 372, 373, 383, 385, 389, 390, 394 rhymes 349 rhythm 62, 279 rightward movement rules 56 robustness problem 226 rogue grammars 55, 120, 133, 174, 187, 189, 195 root 56, 97, 247, 322, 325, 326, 339 rule learning 71, 75, 91, 164, 278 S s-selection 129 s-structure 72, 278, 357 s-structures 123, 124, 126, 129, 349, 357, 372, 388 saliency 146, 151, 174, 370 sampling problern 225, 233, 235, 241 scope 72, 73, 97, 99, 102, 262, 272, 276, 286, 343 short-term 127, 368 signal 2, 4, 9, 10, 12, 16, 18, 20, 39, 46, 56, 59, 66, 82, 96, 120, 160, 193, 224, 267, 279, 283, 292, 316, 360, 384, 393 Similarity Principle 216 Simplified Input Hypothesis 233–237 Single Value Constraint 113 sisterhood 12, 190, 269 skill 1, 24, 26–28, 30, 32, 178, 196, 251, 253, 286 sociolinguistic 41, 184, 240, 342, 375 speaker-oriented adverbs 286 Spec-Head agreement 73 Specialisation Constraint 144, 145 Specific Language Impairment 253, 254, 271, 285
459
specifier 19, 73, 85, 86, 102, 104, 115, 177, 313, 358, , 361, 362, 369, 370 speech perception 2 speech processing 1 speech production 1, 2, 28, 32, 60, 65, 73, 76, 83, 91, 114, 120, 176, 187, 191, 258, 272, 292, 313, 351–353, 368, 389 speech stream 77, 79, 112, 179, 186–188, 205, 207–209, 223, 228, 230, 237, 247, 254, 255, 264, 343, 363 STATE 139, 140, 230 stem 276, 317, 318, 322–327, 344, 345 stems 97, 303, 322 stress 3, 62, 68, 77, 81, 120, 129, 143, 181, 182, 187, 204, 205, 264, 270, 279, 280, 318, 325, 326, 349, 350, 364, 365 structural constraints 3, 188, 191, 220, 270 subcategorisation 129, 236, 316, 318, 339, 370 subjacency 55, 60, 61, 92, 229, 235, 238 subject 9, 19, 20, 26, 46, 58, 87, 88, 98, 100, 156, 158, 159–163, 166, 177, 224, 237, 258, 282, 286, 296–298, 300, 304, 307, 308, 320, 345 subject-oriented adverbs 286 SUBJECT NOUN PHRASE 152 subject-verb agreement 87 Subset Condition 61, 295, 297, 298 Subset Principle 45, 61, 113, 247, 295, 297 Success Measure 239, 243 successful acquisition 52, 53 successful learning 207, 239 surface structure 27, 56, 325, 342, 343 syllable 2, 3, 9, 10, 24, 39, 54, 55, 62, 68, 73, 74, 77, 78, 81, 82, 114, 128, 143, 171, 172, 174, 180–182, 188, 196, 200, 205, 231, 235, 264,
460
SUBJECT INDEX
270, 303, 318, 326, 349, 350, 364, 365, 372, 385 syntactic bootstrapping 75, 77 syntactic frame 76, 77, 167, 248 syntactic operations 224, 344 synthetic implication 176 system operating principles 142 T taxonomic categories 219 Taxonomic Constraint 175, 219, 221 taxonomic relations 219 teacher talk 23 tempo 279 tense 19, 47, 72, 73, 87, 98, 105, 113, 155, 156, 160–162, 176, 177, 225, 235, 253, 254, 276, 292, 301, 306, 307, 309, 312, 323, 326, 341, 359, 385, 387, 388 tensed verb 18, 20, 25, 54, 97, 98, 158, 258, 276, 387, 388 theism 51, 56 thematic relations 218, 219 THEMEs 342 THING 73, 76, 139, 140, 147, 171, 175, 216, 272, 325, 326 Third Principle of Correction Interpretation 376 tiers 62 tone 78, 180, 204, 237, 260, 279, 349 top-down processors 124 topic shifts 291 TPs 237 transducers 8, 10, 73, 81 transfer 7, 29, 30, 43, 47, 54, 55, 79, 81–84, 88, 100, 113, 137, 155, 165, 173, 184, 189, 194, 200, 202, 205, 206, 227, 241, 257, 297, 304, 309, 315, 340, 342, 360, 371 transformational 101, 245, 280 transition theories 33, 64, 65, 69, 70, 89, 100, 101, 110, 132, 388
transitive 73, 75, 85, 114, 233, 317 translation processor 123 translation-equivalent 241, 382 transparency of agreement 158 trigger 10, 45, 46, 62, 70, 89, 96–101, 105, 111, 112, 116, 130, 131, 157, 170, 192, 206, 246, 257, 273, 352 Triggering Learning Algorithm 113, 205 trisyllabic laxing 325, 326 typicality conditions on categories 147, 172 U UG 1, 7, 40–43, 45, 50–55, 57–59, 61, 63, 67, 68, 70–74, 79–81, 85, 89, 92, 93, 98, 99, 101, 105–108, 110, 111, 113, 115–117, 119, 120, 122, 134, 155, 157, 163, 170, 178, 184, 188, 189, 191, 198, 199, 205–210, 213–215, 222, 228, 229, 233, 236–240, 244, 245, 247, 249, 257–260, 269, 272, 277, 286, 297, 323, 324, 342–344, 393 ultimate attainment issue 52, 110, 117 unanalysed knowledge 26, 35 underspecification 114, 369 Uniform Parsers Hypothesis 190, 192, 194, 371 uniformity problem 225 Unitary Base Hypothesis 323 Universal Grammar 1, 3, 7, 40, 45, 50, 56, 67, 71, 73, 74, 84, 108, 183, 207, 213, 221, 236, 257, 259, 391, 393 universal principles 44, 71, 159, 275 universals 7, 39, 45, 50, 51, 59–61, 71–74, 84, 88, 106, 178, 195, 208, 226, 274, 279, 323, 371, 391 unmarked rules 42 Unusualness Constraint 144, 146, 226
SUBJECT INDEX V V 19, 25, 74, 76, 81, 154, 161, 166, 172, 210, 234, 265, 277, 286, 299, 303, 307, 322, 324, 328, 343, 360 variability 28, 42, 46, 61, 147–149, 163, 172, 178, 226, 235, 316, 391 variation 40–43, 46, 61, 64, 65, 72, 79, 80, 86, 88, 96, 101, 105, 110, 115, 149, 178, 198, 201, 205, 210, 240, 278, 329, 353, 360 verb 18–20, 25, 26, 35, 41, 54, 57, 58, 62, 70, 72, 73, 77, 80, 85, 87, 97, 98, 99, 104, 105, 114, 115, 143, 147, 156–162, 165–167, 174, 176–178, 180, 223, 224, 228–230, 233, 234, 253, 254, 275, 276–278, 282, 284, 294–299, 303, 304, 306–309, 313, 317, 318, 322–327, 339, 341, 342, 344, 363–366, 379, 382, 385, 387, 388 verb inversion 159 verb object order 58 verb raising 70, 98, 99, 158, 161 Voice Onset Time 196
461
VOT 82 VP 63, 76, 156, 158, 160, 161, 166, 275–277, 282, 297, 299, 343 Vulnerable Markers Hypothesis 285 W wave 4 Whole Object Constraint 175, 219, 221 word boundaries 32 word class 76, 130, 143 word formation 284, 325 word order 19, 20, 21, 58 word recognition 13, 28, 147, 181, 201, 202, 267, 268, 283, 349 X X-bar 34, 63, 85 Y Y-model 280 [+Focus] feature 280 Z ZISA 162
In the series LANGUAGE ACQUISITION AND LANGUAGE DISORDERS (LALD) the following titles have been published thus far or are scheduled for publication: 1. WHITE, Lydia: Universal Grammar and Second Language Acquisition. 1989. 2. HUEBNER, Thom and Charles A. FERGUSON (eds): Cross Currents in Second Language Acquisition and Linguistic Theory. 1991. 3. EUBANK, Lynn (ed.): Point Counterpoint. Universal Grammar in the second language. 1991. 4. ECKMAN, Fred R. (ed.): Confluence. Linguistics, L2 acquisition and speech pathology. 1993. 5. GASS, Susan and Larry SELINKER (eds): Language Transfer in Language Learning. Revised edition. 1992. 6. THOMAS, Margaret: Knowledge of Reflexives in a Second Language. 1993. 7. MEISEL, Jürgen M. (ed.): Bilingual First Language Acquisition. French and German grammatical development. 1994. 8. HOEKSTRA, Teun and Bonnie SCHWARTZ (eds): Language Acquisition Studies in Generative Grammar. 1994. 9. ADONE, Dany: The Acquisition of Mauritian Creole. 1994. 10. LAKSHMANAN, Usha: Universal Grammar in Child Second Language Acquisition. Null subjects and morphological uniformity. 1994. 11. YIP, Virginia: Interlanguage and Learnability. From Chinese to English. 1995. 12. JUFFS, Alan: Learnability and the Lexicon. Theories and second language acquisition research. 1996. 13. ALLEN, Shanley: Aspects of Argument Structure Acquisition in Inuktitut. 1996. 14. CLAHSEN, Harald (ed.): Generative Perspectives on Language Acquisition. Empirical findings, theoretical considerations and crosslinguistic comparisons. 1996. 15. BRINKMANN, Ursula: The Locative Alternation in German. Its structure and acquisition. 1997. 16. HANNAHS, S.J. and Martha YOUNG-SCHOLTEN (eds): Focus on Phonological Acquisition. 1997. 17. ARCHIBALD, John: Second Language Phonology. 1998. 18. KLEIN, Elaine C. and Gita MARTOHARDJONO (eds): The Development of Second Language Grammars. A generative approach. 1999. 19. BECK, Maria-Luise (ed.): Morphology and its Interfaces in Second Language Knowledge. 1998. 20. KANNO, Kazue (ed.): The Acquisition of Japanese as a Second Language. 1999. 21. HERSCHENSOHN, Julia: The Second Time Around – Minimalism and L2 Acquisition. 2000. 22. SCHAEFFER, Jeanette C.: The Acquisition of Direct Object Scrambling and Clitic Placement. Syntax and pragmatics. 2000. 23. WEISSENBORN, Jürgen and Barbara HÖHLE (eds.): Approaches to Bootstrapping. Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition. Volume 1. 2001. 24. WEISSENBORN, Jürgen and Barbara HÖHLE (eds.): Approaches to Bootstrapping. Phonological, lexical, syntactic and neurophysiological aspects of early language acquisition. Volume 2. 2001. 25. CARROLL, Susanne E.: Input and Evidence. The raw material of second language acquisition. 2001. 26. SLABAKOVA, Roumyana: Telicity in the Second Language. 2001.