EV@LUTION ~
~
-c/)
~
~
LANGUAG E
This page intentionally left blank
EVGLUTION
LANGUAGE Proceedings of the 7th International Conference (EVOLANG7) Barcelona, Spain
12 - 1 5 March 2008 Editors
Andrew D M Smith University of Edinburgh, UK
Kenny Smith Northumbria University, UK
Ramon Ferrer i Cancho Universitat de Barcelona, Spain
r pWorld Scientific N E W JERSEY
*
LONDON
. SINGAPORE . BElJlNG
*
SHANGHAI
*
H O N G KONG
*
TAlPti
. CHtNNAl
Published by World Scientific Publishing Co. Pte. Ltd.
5 Toh Tuck Link, Singapore 596224 USA ofice: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK ofice: 57 Shelton Street, Covent Garden, London WC2H 9HE
British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.
THE EVOLUTION OF LANGUAGE Proceedings of the 7th International Conference (EVOLANG'I) Copyright 0 2008 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. 7Yiis book, or paris thereoJ may not be reproduced in any form or by any means,
electronic or niechanical, including photocopying, recording or any information storage and retrieval .sy?;temnow known or to be invented, without written permission from the Publisher.
For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, M A 01923, USA. In this case permission to photocopy is not required from the publisher.
ISBN-I3 978-981-277-61 1-2 ISBN-I0 981-277-61 1-7
Printed in Singapore by World Scientific Printers
Preface This volume collects the refereed papers and abstracts of the 7th International Conference on the Evolution of Language (EVOLANG 7), held in Barcelona on 12-15 March 2008. Submissions to the conference were solicited in two forms, papers and abstracts, and this is reflected in the structure of this volume. The biennial EVOLANG conference is characterised by an invigorating, multi-disciplinary approach to the origins and evolution of human language, and brings together researchers from many fields including anthropology, archaeology, artificial life, biology, cognitive science, computer science, ethology, genetics, linguistics, neuroscience, palaeontology, primatology, psychology and statistical physics. The multi-disciplinary nature of the field makes the refereeing process for EVOLANG very challenging, and we are indebted to our panel of reviewers for their conscientious and valuable efforts. A full list of the panel can be found on the following~. page. Further thanks are also due to: The EVOLANG committee: Angelo Cangelosi, Jean-Louis Dessalles, Tecumseh Fitch, Jim Hurford, Chris Knight and Maggie Tallerman. A particular debt of gratitude is owed to Jim Hurford, who has once again given generously of his time and expertise in the preparation of the proceedings. The local organising committee: Sergi Balari, Yolanda Cabre Sans, Joan Castellvi, Pere Cornellas, Ramon Ferrer i Cancho, Ricard Gavalda, Antoni Hernindez, Victor Longa, Guillerrno Lorenzo, Maria Antonia Marti, Txuss Martin, Josep Quer, Carles Riba, Joana Rossell6, Jordi Serrallonga and Mariona Taul6. CosmoCaixa and the Museum of Science for their financial support and offering us their unique facilities. The Department of Innovation, Universities and Business of the Catalan government (Generalitat de Catalunya), the Spanish Ministry of Education and Science, Universitat de Barcelona and Universitat Politecnica de Catalunya for their financial support. The Service of Linguistic Technology (STEL) for computing facilities. The plenary speakers: Derek Bickerton, Rudolf Botha, Camilo Jose Cela Conde, Francesco d’Errico, Susan Goldin-Meadow, Simon Kirby, Gary Marcus, Fridemann Pulvermiiller and Juan Uriagereka. Finally, and most importantly, the authors of all the contributions collected here. Andrew Smith, Kenny Smith and Ramon Ferrer i Cancho November 2007 V
This page intentionally left blank
Panel of Reviewers Michael Arbib Andrea Baronchelli Mark Bartlett Tony Belpaeme Derek Bickerton Joris Bleys Richard Blythe Rudie Botha Ted Briscoe Joanna Bryson Christine Caldwell Josep Call Angelo Cangelosi Ronnie Cann Andrew CarstairsMcCarthy Morten Christiansen Andy Clark Bernard Comrie Louise Connell Fred Coolidge Christophe Coupe Tim Crow Joachim de Beule Bart de Boer Dan Dediu Didier Demolin Jean-Lou is Dessal les Guy Deutscher Mike Dowman Robin Dunbar Shimon Edelman Mark Ellison Wolfgang Enard Nicolas Fay Emma Flynn Bruno Galantucci Simon Garrod Les Gasser Laleh Ghadakpour Kathleen Gibson David Gil
Jonathan Ginzburg Tao Gong Nathalie Gontier Tom Griffiths Takashi Hashimoto Bernd Heine Wolfram Hinzen Jean-Marie Hombert Carmel Houston-Price Jim Hurford Yuki Ike-Uchi Gerhard Jager Sverker Johansson Harish Karnick Simon Kirby Chris Knight Kiran Lakkaraju Simon Levy Phil Lieberman Elena Lieven David Lightfoot John Locke Gary Lupyan Heidi Lyn Dermot Lynott Peter MacNeilage Gary Marcus Davide Marocco Brendan McGonigle April McMahon James Minett Padraic Monaghan Salikoko Mufwene Van0 Nasidze Chrystopher Nehaniv Daniel Nettle Fritz Newmeyer Jason Noble Kazuo Okanoya Gloria Origgi Pierre-Yves Oudeyer Asli Ozyiirek vii
Domenico Parisi Anna Parker Irene Pepperberg Simone Pika Joseph Poulshock Sonia Ragir Florencia Reali Anne Reboul Luke Rendell Debi Roberson Thom Scott-Phillips Robert Seyfarth Katie Slocombe Andrew Smith Kenny Smith James Steele Samarth Swarup Ears Szathmary Maggie Tallerman Ian Tattersall M6nica Tamariz Carel ten Cate Peter Todd Mike Tomasello Huck Turner Natalie Uomini Juan Uriagereka Robert van Rooij Arie Verhagen Marilyn Vihman Paul Vogt Bill Wang Andrew Wedel Mike Wheeler Bencie Woll Liz Wonnacott Hajime Yamauchi Henk Zeevat Jordan Zlatev Klaus Zuberbiihler Jelle Zuidema
This page intentionally left blank
Contents Preface
V
Panel of Reviewers
vi
Part I: Papers Is Pointing the Root of the Foot? Grounding the "Prosodic Word" as a Pointing Word Christian Abry and Virginie Ducey The Subcortical Foundations of Grammaticalization Giorgos P. Argyropoulos Pragmatics and Theory of Mind: A Problem Exportable to the Origins of Language Teresa Bejarano
3 10
18
Two Neglected Factors in Language Evolution Derek Bickerton
26
Expressing Second Order Semantics and the Emergence of Recursion Joris Bleys
34
Unravelling the Evolution of Language with Help from the Giant Water Bug, Natterjack Toad and Horned Lizard Rudolf Botha Linguistic Adaptations for Resolving Ambiguity Ted Briscoe and Paula Buttery Modelling Language Competition: Bilingualism and Complex Social Networks Xavier Castelld, Victor M. Eguiluz, Maxi Sun Miguel, Lucia Loureiro-Porto, Riitta Toivonen, Jari Saramaki and Kimmo Kaski Language, the Torque and the Speciation Event Tim J . Crow The Emergence of Compositionality, Hierarchy and Recursion in Peer-to-Peer Interactions Joachim De Beule ix
42
51
59
67
75
X
Causal Correlations between Genes and Linguistic Features: The Mechanism of Gradual Language Evolution Dan Dediu
83
Spontaneous Narrative Behaviour in Homo Sapiens: How Does It Benefit Speakers? Jean-Louis Dessalles
91
What do Modern Behaviours in Homo Sapiens Imply for the Evolution of Language? Benoit Dubreuil
99
The Origins of Preferred Argument Structure Caleb Everett
107
Long-Distance Dependencies are not Uniquely Human Ramon Ferrer i Cancho, Victor M. Longa and Guillermo Lorenzo
115
How Much Grammar Does It Take to Sail a Boat? (Or, What can Material Artifacts Tell Us about the Evolution of Language?) David Gil
123
The Role of Cultural Transmission in Intention Sharing Tao Gong, James W. Minett and William S-Y. Wang
131
The Role of Naming Game in Social Structure Tao Gong and William S- Y. Wang
139
Do Individuals Preferences Determine Case Marking Systems? David J. C. Hawkey
147
What Impact Do Learning Biases have on Linguistic Structures? David J . C.Hawkey
155
Reanalysis vs Metaphor: What Grammaticalisation CAN Tell Us about Language Evolution Stefan Hoefler and Andrew D. M. Smith Seeking Compositionality in Holistic Proto-Language without Substructure: Do Counter-Examples Overwhelm the Fractionation Process? Sverker Johansson Unravelling Digital Infinity Chris Knight and Camilla Power Language Scaffolding as a Condition for Growth in Linguistic Complexity Kiran Lakkaraju, Les Gasser and Samarth Swarup
163
171 179
187
xi
The Emergence of a Lexicon by Prototype-Categorising Agents in a Structured Infinite World Cyprian Laskowski Evolutionary Framework for the Language Faculty Erkki Luuk and Hendrik Luuk
195 203
Artificial Symbol Systems in Dolphins and Apes: Analogous Communicative Evolution? Heidi Lyn
21 1
The Adaptiveness of Metacommunicative Interaction in a Foraging Environment Zoran Macura and Jonathan Ginzburg
219
On the Impact of Community Structure on Self-organizing Lexical Networks Alexander Mehler
227
A Crucial Step in the Evolution of Syntactic Complexity Juan C. Moreno Cabrera
235
Evolution of the Global Organization of the Lexicon Mieko Ogura and William S-Y. Wang
243
From Mouth to Eye Dennis Philps
25 1
What Use is Half a Clause? Ljiljiana Progovac
259
The Formation, Generative Power, and Evolution of Toponyms: Grounding Vocabulary in a Cognitive Map Ruth Schulz, David Prasser, Paul Stockwell, Gordon Wyeth and Janet Wiles
267
On the Correct Application of Animal Signalling Theory to Human Communication Thomas C. Scott-Phillips
275
Natural Selection for Communication Favours the Cultural Evolution of Linguistic Structure Kenny Smith and Simon Kirby
283
Syntax, a System of Efficient Growth Alona Soschen Simple, but not too Simple: Learnability vs. Functionality i n Language Evolution Samarth Swarup and Les Gasser
29 1
299
xii
Kin Selection and Linguistic Complexity Maggie Tallerman
307
Regularity in Mappings Between Signals and Meanings Mdnica Tamariz and Andrew D. M . Smith
315
Emergence of Sentence Types in Simulated Adaptive Agents Ryoko Uno, Takashi Ikegami, Davide Marocco and Stefano Nolfi
323
Desperately Evolving Syntax Juan Uriagereka
33 1
Constraint-Based Compositional Semantics Wouter Van Den Broeck
338
The Emergence of Semantic Roles in Fluid Construction Grammar Remi Van Trijp
346
Broadcast Transmission, Signal Secrecy and Gestural Primacy Hypothesis Slawomir Wacewicz and Przemysiaw Zywiczynski
354
Self-Interested Agents can Bootstrap Symbolic Communication if They Punish Cheaters Emily Wang and Luc Steels
362
Coping with Combinatorial Uncertainty in Word Learning: A Flexible Usage-Based Model Pieter Wellens
370
Removing 'Mind-Reading' from the Iterated Learning Model Simon F. Worgan and Robert I. Damper How does Niche Construction in Learning Environment Trigger the Reverse Baldwin Effect? Hajime Yamauchi
378
386
Part 11: Abstracts
Coexisting Linguistic Conventions in Generalized Language Games Andrea Baronchelli, Lucia Dull 'Asta, Alain Barrat and Vittorio Loreto
397
Complex Systems Approach to Natural Categorization Andrea Baronchelli, Vittorio Loreto and Andrea Puglisi
399
Regular Morphology as a Cultural Adaptation: Non-Uniform Frequency in an Experimental Iterated Learning Model Arianita Beqa, Simon Kirby and Jim Hurford
40 1
xiii
Neural Dissociation between Vocal Production and Auditory Recognition Memory in Both Songbirds and Humans Johan J. Bolhuis
403
Discourse Without Symbols: Orangutans Communicate Strategically in Response to Recipient Understanding Erica Cartmill and Richard W. Byrne
405
Taking Wittgenstein Seriously: Indicators of the Evolution of Language Camilo J . Cela-Conde, Marcos Nadal, Enric Munar, Antoni Gomila and Victor M. Egui'luz An Experiment Exploring Language Emergence: How to See the Invisible Hand and Why We Should Hannah Cornish
407
409
The Syntax of Coordination and the Evolution of Syntax Wayne Cowart and Dana McDaniel
41 1
The Archaeology of Language Origin Francesco D'Errico
413
The Joy of Sacs Bart De Boer
415
How Complex Syntax Could Be Mike Dowman
417
The Multiple Stages of Protolanguage Mike Dowman
419
A Human Model of Color Term Evolution Mike Dowman, Ying Xu and Thomas L. Griffiths
42 1
Evolution of Song Culture in the Zebra Finch Olga Feher, Partha P. Mitra, Kaeutoshi Sasahara and Ofer Tchernikovski
423
Iterated Language Learning in Children Molly Flaherty and Simon Kirby
425
Gesture, Speech and Language Susan Goldin-Meadow
427
Introducing the Units and Levels of Evolution Debate into Evolutionary Linguistics Nathalie Gontier
429
xiv
What can the Study of Handedness in Nonhuman Apes Tell Us about the Evolution of Language? Rebecca Harrison
43 1
Unidirectional Meaning Change with Metaphoric and Metonymic Inferencing Takashi Hashimoto and Masaya Nakatsuka
433
Recent Adaptive Evolution of Human Genes Related to Hearing John Hawks Inhibition and Language: A Pre-Condition for Symbolic Communicative Behaviour Carlos Hernandez-Sacristan
435
431
Pragmatic Plasticity: A Pivotal Design Feature? Stefan Hoefler
439
Continuity between Non-Human Primates and Modern Humans? Jean-Marie Hombert
44 1
After all, a "Leap" is Necessary for the Emergence of Recursion i n Human Language Masayuki Ike-Uchi
443
Labels and Recursion: From Adjunction-Syntax to Predicate-Argument Relations Aritz Irurtzun
445
Iterated Learning with Selection: Convergence to Saturation Mike Kalish
441
A Reaction-Diffusion Approach to Modelling Language Competition Anne Kandler and James Steele
449
Accent Over Race: The Role of Language in Guiding Children's Early Social Preferences Katherine D. Kinzler, Kristin Shutts, Emmanuel Dupoux and Elizabeth S. Spelke
45 1
Language, Culture and Biology: Does Language Evolve to be Passed on by Us, and Did Humans Evolve to Let that Happen? Simon Kirby
453
Three Issues in Modeling the Language Convergence Problem as a Multiagent Agreement Problem Kiran Lakkaraju and Les Gasser
456
The Development of a Social Signal i n Free-Ranging Chimpanzees Marion Laporte and Klaus Zuberbuhler
458
xv
Gestural Modes of Representation - A Multi-Disciplinary Approach Katja Liebal, Hedda Lausberg, Ellen Frincke and Cornelia Muller
460
Extracommunicative Functions of Language: Verbal Interference Causes Categorization Impairments Gary Lupyan
462
Form-Meaning Compositionality Derives from Social and Conceptual Diversity Gary Lupyan and Rick Dale
464
Language as Kluge Gary Marcus
466
Origins of Communication in Autonomous Robots Davide Marocco and Stefan0 No&
468
Handedness for Gestural Communication and Non-Communicative Actions in Chimpanzees and Baboons: Implications for Language Origins Adrien Meguerditchian, Jacques Vauclair, Molly J. Gardner, Steven J. Schapiro and William D. Hopkins
470
The Evolution of Hypothetical Reasoning: Intelligibility or Reliability? Hugo Mercier
472
Simulation of Creolization by Evolutionary Dynamics Makoto Nakamura, Takashi Hashimoto and Satoshi Tojo
474
Evolution of Phonological Complexity: Loss of Species-Specific Bias Leads to more Generalized Learnability in a Species of Songbirds Kazuo Okanoya and Miki Takahashi
476
Referential Gestures in Chimpanzees i n the Wild: Precursors to Symbolic Communication? Simone Pika and John C. Mitani
47 8
Modeling Language Emergence by Way of Working Memory Alessio Plebe, Vivian De la Cruz and Marc0 Mazzone
480
Mechanistic Language Circuits: What Can be Learned? What is Pre-W ired? Friedemann Pulvermiiller
482
Reflections on the Invention and Reinvention of the Primate Playback Experiment Greg Radick
485
xvi
An Experimental Approach to the R61e of Freerider Avoidance i n the Development of Linguistic Diversity Gareth Roberts Prosody and Linguistic Complexity in an Emerging Language Wendy Sandler, lrit Meir, Svetlana Dachkovsky, Mark Aronoff and Carol Padden Communication, Cooperation and Coherence Putting Mathematical Models into Perspective Federico Sangati and Jelle Zuidema
487 489
49 1
A Numerosity-Based Alarm Call System in King Colobus Monkeys Anne Schel, Klaus Zuberbuhler and Sandra Tranquilli
493
On There and Then: From Object Permanence to Displaced Reference Marie ke Sc ho uwst ra
495
Signalling Signalhood and the Emergence of Communication Thomas C. Scott-Phillips, Simon Kirby and Graham R. S. Ritchie
497
Wild Chimpanzees Modify the Structure of Victim Screams According to Audience Composition Katie E. Slocombe and Klaus Zuberbuhler
499
An Experimental Study on the Role of Language in the Emergence and Maintenance of Human Cooperation John W. F. Small and Simon Kirby
50 1
Replicator Dynamics of Language Processing Luc Steels and Eors Szathmdry Syntactical and Prosodic Cues in Song Segmentation Learning by Bengalese Finches Miki Takahashi and Kazuo Okanoya Why the Transition to Cumulative Symbolic Culture is Rare Mdnica Tamariz
503
505
507
A Gradual Path to Hierarchical Phrase-Structure: Insights from Modeling and Corpus-Data Willem Zuidema
509
Author Index
511
Papers
This page intentionally left blank
IS POINTING THE ROOT OF THE FOOT? GROUNDING THE <
>AS A POINTING WORD CHRISTIAN ABRY Language Sciences Department, Stendhal, BP 25 FRANCE-38040 Grenoble CPdex VIRGINIE DUCEY Virginie Ducey, GIPSA-Lab Stendhal. BP 25 FRANCE-38040 Grenoble CPdex Recently in the Vocalize-to-Localire framework (a functional stance just started in the Interaction Studies 2004-2005 issues we edited, Abry et al., 2004), we addressed the unification of two grounding attempts concerning the syllable and the foot in language ontogeny. Can the movement time of the pointing strokes of a child be predicted from her babbling rhythm? The answer for 6 babies (6-18 months) was a 2.1 pointing-to-syllable ratio. Implications for the grounding of the first words within this Pointing Frame will be examined. More tentatively we will suggest that babbling for protophonology together with pointing for protosyntax pave the way to language.
1. Introduction
While the main scientific endeavour is Jission, say first break already known units, as in physics typically, the afterthought of formal constructions is to restart from primitives, e.g. building blocks. This is the foundational Chomsky & Schutzenberger's free monoid for computational linguistics, then Move and/or Merge in the Minimalist Programme (MP). In physiological behavior the degrees-of-freedom problem is rather seen developmentally as a problem of breaking early given coordinations (e.g. thumb-sucking in utero, Babkin's reflex, etc.) in order to elaborate new couplings for new skills (hand-to-mouth feeding ... piano playing).
2.
Emergence as mergence: Sign+Sign=>Sign and Foot+Foot=>Foot
Regarding the emergence of phonology, some students like Lindblom and ourselves have considered that features, particles, primes, etc., are just byproducts of other mechanisms (for a recent tentative reconciliation with the use A
J
4
of features within our Perception-for-Action-Control-Theory, see Schwartz, Boe & Abry, 2007). But what are the unit of the system you start from? The number of segments? The possible onsets and offsets of syllables. ..? In computational evolutionary phonology, the issue is still between a holistic-formulaic starting point, or a yet undefined layman word unit. This in spite of our linguistic stateof-the art, since ((we still do not have strict definitions of even the most basic units, such as segment, syllable, morpheme, and word,), as complained by Joan Bybee (2003, p. 2 ) . Now instead of fission, can fusion help? In other words can the compositional making of larger units from smaller bricks, be replaced by the blending of already more or less large units, typically two into one unit of the same level (an idea taken earlier in the categorial grammar formalism, still compatible with MP)? Which of course leaves open the evolutionary issue about where they could come from. Let us take an example from a still-on-the-making phonology. In Sign Language, where no stable consensus does exist about phonological units, can one use semantic blending and morphological fusion to evidence these components? In ASL, MIND+DROP=>FAINT (we are indebted to Wendy Sandler for this videoclip example). If Sign+Sign=>Sign is semantic blending (snowman), what are the corresponding phonological units? Is there a signlanguage specific ((syllable conspiracy)), as Sandler claims: Syll+Syll=>Syll? Or a more common foot isochrony Foot+Foot=>Foot? Like one-foot music, musical, musically? Snowman is obviously shorter than snow+man duration. In fact, once measured, the downstroke phase of FAINT (which starts from the head for MIND, with the finger point erased) is just a videoframe longer than the one for DROP (starting lower from the waist). Which is a strong cue of isochrony control for compression in one unit (chunk, template, etc.). Is that just emergence-supervenience of units due to informational constraints, just language-use, the war of attrition on constructions as formmeaning pairings, in cognitive construction grammars? Said otherwise: data compression for sparse coding? Are there no macroscopic units corresponding to universal control units, macroscopic primitives for making morphogenetic ((language bubbles)), not acquired simply by perceptuo-motor statistical patternfinding? Are there phonologically universal babble-syllable constraints in speech acquisition, and more, signs and words in both speech and sign language (even if syllables could be not ubiquitous in both media)? In other words, when in evo-development do you get a tuner for tuning? Who could attune what, along language attunement-imitation, without a specific what-tuner to capture the preferred radiostation among the buzzy broascasting landscape of speakers?
5 3.
The syllable, then the point: whence the word?
Recently in the Yocalize-to-localize framework (a functional stance just started in the Interaction Studies 2004-2005 issues we edited, see Abry, Vilain & Schwartz, 2004), we addressed the unification of two grounding attempts concerning the syllable and the foot in language ontogeny. Both units are highly disputed among phonologists and psycholinguists. But the proposal of a root for proto-syllables in canonical babbling can now be neurally evaluated on the basis of a motor control platform: MacNeilage's Frame/Content theory starting from the control of the mandible as the carrier articulator. We proposed the same ground of evaluation for the foot as the basic control unit for the phonology of the proto-word. We predicted that, if we would measure the babbling rhythm of a baby from the burst of canonical babbling around 6-7 months, we could calculate the range of duration of her pointing arm-strokes, from 9 months upwards. Tested on 6 French children in a longitudinal study, each fortnight between 6 and 18 months, this ((astonishing)) hypothesis was globally successful (Ducey, 2007), with a mean 2.18 pointing/babbling ratio. Moreover each child had at her disposal in her repertoire a sufficiently long point to cover a disyllabic utterance. Like for linguistic demonstratives, the semantics, pragmatics, and even the syntax of pointing have all deserved valuable attention and brought out results in related fields. And Sign Language phonology too, which meets ubiquitously pointing. But nothing was said about the proper phonological integrative links of the pointing gesture with speech phonological units, smaller or larger than the point, like the syllable, the foot, and the so-called ((prosodic word)). We can now consider that the phonology of the point with the arm-index could give for free the template of the ubiquitous one/two-syllable word foot (instead of an arbitrary FOOTBIN in Optimality Theory, where a onesyllable/moraic foot is considered as ((degenerated)) or ((subminimal))). Grounding the phonology of the point motorically, in the neural arm-index control, gives thus for free the template of the two-syllable word as a coordination of the hand and the mouth in language semiotics and phonetics. This result offers in addition considerable insights in line with the parallel development of syntax use of THat-demonstratives and WHat-interrogatives through the grammatization process in the world's languages (Diesel, 1999). It is in favor of an early demonstrative site, later attuned to language specific morphonology: see English (the) house vs. Swedish huset, French la maison vs. Rumanian domul; and even more elaborated compounding, with what could be
6 tagged ((double filled sites)): French cette maison-ci vs. Swedish det har huset, or Afrikaans hierdie huis, etc. This is just one of the issues, the developmental framework reminded below (Fig.]), allowed us to address up to now, in between the Vocalize-to-Localize (2003) seminar and the 2007 VOCOID ( VOcalization, COmmunication, Imitation, and Deixis, in infant and adult human and non-human primates), both international meetings we organized in Grenoble.
4. Beyond the presented Framework (Fig.1) Beyond reinforcing the very general claim that ((pointing is the royal road to language for babies)) (as recalled by the late George Butterworth in Kita, Pointing, 2003), we can add to our prediction of pointing stroke duration distributions from individual babbling rhythm distributions another replicated prediction: namely the prediction that two-word utterance emergence can be calculated from the beginning of the coproduction of a word together with a non redundant pointing (a result found in Susan Goldin-Meadow's group, and replicated with Jana Iverson in Iverson & Goldin-Meadow, 2005). Since this is not a pure slot-grammar story (POINT+Word gives Word+Word, but the POINT is still there in the predicate-argument structure), the rationale behind this development beyond the first year word, remains still a lot mysterious (personal conversation with Susan Goldin-Meadow and Elena Lieven). Finally we will add work in progress on two possible neural circuits found in adults, which could be relevant for language acquisition of the word-foot metric unit, namely the one we dubbed the THAT-PATH, for pointing with the eye, the arm and the voice (Loevenbruck et al., 2005, 2007). And ultimately the verbal working memory network, we dubbed the STABIL-LOOP (Abry, Vilain & Schwartz, 2004), for stabilizing the linguistic word forms (Abry el al., 2003, Sat0 et al., 2004, 2006). Working memory was already proposed by Francisco Aboitiz and Ricardo Garcia (1997) as a masterpiece in the primate evolution toward language, but with little concern about language (universal) preferred forms before matching for recall. We will insist here on the fact that, in our view, this STABIL-LOOP system can stabilize both word order (basic syntax and compounds) and word form structure (morphonology).
5. Summary Beyond the fissionlfusion metaphors, several of these empirical findings from ontogeny could help in building an evo-devo story of language with caveats:
7 (i) Syllables are definitely not built from segments; but segments are a late by-product of new degrees of freedom, making the carried lip and tongue articulator more and more independent from the carrier jaw (rhythm control). (ii) Words are neither built from syllables; but chunked from the babbling flow, in the pointing frame (discrete stroke control). (iii) Syntax does not emerge with 2-word utterances; but syntactic demonstrative (argumentative-referencing)pointing is there from the first word; and still there when 2 words appear, depending on the preceding date of emergence of the skill of pointing to the argument while predicating about a different referent from the pointed one (e.g. saying <
Figure 1. A Framework for two Frames. At about one year, the Speech Frame will be embedded into the Sign Frame: one-two... Syiiables in a Foot template for the first aProsodic Wordsu. For the Speech Frame, after Canonical Babbling, say aSyllablew rhythm emergence, two additional controls have to be mastered: Ciosance control for the rConsonanb, and Coarticulation (Coproducfion)for the <> Postural control, within the &onsonant>>. For the Sign Frame, three maturating brain streams become recruited: occipito-parietal event detection (When),which enters the dorsal (Where) and ventral (What) paths. Their outcomes are Objecfhood and Agentivity (Who system), while the ventro-parietal How system affords Shape Affordance, before the objecthood Color What system. Among the corresponding <
8 babbling cycles of a child, one can predict the range of durations of her pointing strokcs: in-between 2-3 syllables in a point, that is a universal trend for the word ... point.
References
Aboitiz F., & Garcia, R. (1997). The evolutionary origin of the language areas in the human brain. A neuroanatomical perspective, Brain Research Reviews, 25,381-396. Abry, C., Sato, M., Schwartz, J-L., Lcevenbruck, H. & Cathiard, M-A. (2003). Attention-based maintenance of speech forms in memory: The case of verbal transformations. Behavioral and Brain Sciences, 26:6,728-729. Abry, C., Vilain, A. & Schwartz, J.-L. (2004). (
9 the verbal transformation effect. Percepfion & Psychophysics, 68:3, 458474. Schwartz, J.L., Boe, L.J. & Abry, C. (2007). Linking the DispersionFocalization Theory (DFT) and the Maximum Utilization of the Available Distinctive Features ( W A F ) principle in a Perception-for-Action-Control Theory (PACT). In M.J. SolC, P. Beddor & M. Ohala (Eds.), Experimental Approaches to Phonology (pp. 104-124). Oxford: Oxford University Press.
THE SUBCORTICAL FOUNDATIONS OF GRAMMATICALIZATION GIORGOS P. ARGYROPOULOS Language Evolution and Computation Research Unit, UniversiQ of Edinburgh
40 George Square, Edinburgh, EH8 9LL, Scotland, U K [email protected] The present paper raises the so far unaddressed question of the neurolinguistic processes underlying grammaticalization operations. Two adaptive mechanisms are presented, based on current research on the subcortical contributions to aspects of higher cognition: The cerebellar-induced Kalman gain reduction in linguistic processing, and the basal ganglionic re-regulation of cortical unification operations.
1.
Introduction
The neuroanatomy of either the domain-general cognitive phenomena underlying grammaticalization, e.g., “ritualization” (Haiman, 1994), “automatization” (Givbn, 1979; Bybee, 1998), or the particular psycholinguistic processes, e.g., Pickering and Garrod’s (2004) dialogical “routinization”, has hardly attracted any attention in the literature. Haiman’s (1994, p.25) comment, that “the physiology of ritualization in human beings is unknown”, is rather suggestive. The desideratum, then, is to move from the sine qua non of the neural grounding of such putative domain-general cognitive phenomena to a neurolinguistics of grammaticalization, by introducing I-language adaptation processes (both representational fine-tuning and executional optimization) in accordance with changing E-language properties.
2. The Explanandum of Grammaticalization Grammaticalization, “an evolution whereby linguistic units lose in semantic complexity, pragmatic significance, syntactic freedom, and phonetic substance” (Heine & Reh, 1984, p.15), is a manifestation of the “Reducing Effect” of repetition in linguistic behaviour (Bybee & Thompson, 2000): “Univerbation” (Lehmann, 1995), i.e., the gain in syntagmatic bondedness (e.g., hac hora (Latin) > ahora (Spanish)), “phonetic attrition” (Givbn, 1979), i.e., the minimization of articulatory gestures (e.g., going to > gonna), 10
11
desemanticization, i.e., the loss of (lexical) meaning of a particular item (e.g., future marker “will” loses the meaning of desire), are the fundamental aspects of this process. Because of its desemanticization, the particular item occurs in a greater contextual variety, inviting additional inferences, inducing its “contextinduced reinterpretation” (Heine, Claudi, & Hunnemeyer, 1991). As a result, such item behaviorally deviates from its particular category, i.e., it is decategorialized, (e.g., the V > P cline in English).
3. From Grammaticalization to the Cerebellum and the Basal ganglia Automatization, however, the cognitive basis of grammaticalization, is known to rely on the basal ganglia (BG) and the cerebellum (CB) (e.g., Thach, Mink, Goodkin, & Keating, 2000): Signals from the cerebral cortex are optimized on the basis of their reward value (reinforcement learning) and their accuracy (supervised learning) through the BG and CB loop circuits, respectively (Doya, 1999). The Cerebellar CorticoNuclear MicroComplex (CNMC) (Ito, 1984), i.e., the CB adaptive unit that learns based on error signals, becomes an internal model, an “emulator” (Grush, 2004), with signal-transfer characteristics identical to those of the copied cortical system (figure 1). Maximized reliance of the CB Kalman Filter (Paulin, 1989) on the predictions of an accurate internal model, i.e., low gain of the Kalman regulator (KG), drastically economizes on attentional-executional resources. On the other hand, the BG “sculpting process” (Graybiel, 2000) induces the context-sensitive fluent gating in a “winner-takesall” fashion of competing motor actions, via inhibition-disinhibition processes.
Figure 1. A simple cerebellar feedforward emulator (e.g., Wolpert, Miall, & Kawato, 1998). The predictions of the internal model are constantly updated, based on the error signals of the discrepancy induced by the actual sensory feedback.
12
4.
The Neurolinguistic Grounding of Grammaticalization
I propose that the neurolinguistic grammaticalization is the CB-induced processing, and that the one for the categories is the BG adaptive regulation
basis of the Reducing Effect in KG reduction in multilevel linguistic formation-deformation of probabilistic of unification operations.
4.1. Cerebellar-induced Kalman Gain Reduction in Linguistic Processing
The CB as a neural analog of a dynamical state estimator (Paulin, 1989) provides a highly plausible basis for Pickering and Garrod’s (2007) Kalman filter-processor (figure 2). Suggestively, CB error-signaling is involved in sentence processing (Stowe, Paans, Wijers, & Zwarts, 2004). Lack of performance optimization (interpretable as KG reduction) for CB patients in linguistic tasks is well established (Fiez, Petersen, Cheney, & Raichle, 1992), while in CB aphasiology the notion of “neurofunctional redundancy” has been invoked for the CB (emulated) linguistic representations: CB aphasia is significantly milder than classical aphasic syndromes, owing to maximal prefrontal cortical compensation. (Fabbro et al., 2004).
Figure 2. The CB as a domain general Kalman Filter (Paulin 1989) meets Pickering and Garrod (2007): Minimization of the gain of the Kalman regulator via routinization corroborates the reliance on a top-down (expectation-based) processing modality; “shallow processing” (Barton & Sanford, 1993)and “good-enough representations” (Ferreira, Bailey, & Ferraro. 2002) are suggestive cases.
13
4.1.1. Chunking andphonetic attrition Univerbation is a case of chunking (e.g., Haiman, 1994; Bybee, 1998), i.e., the creation of compound behavioural units the interior of which exhibits minimal attentional and executional costs. Chunking is a well-established CB-induced cognitive function: CB deficits exhibit lack of practice-induced facilitation (e.g., LaForce & Doyon, 2001), and decomposition of motor behaviour (e.g., Thach et al., 2000). In the same spirit, phonetic attrition is the linguistic instance of the CB-induced minimization of articulatory stiffness in motor behaviour (e.g., Wolpert, Miall, & Kawato, 1998). Suggestively, Ackermann and Hertrich (2000) emphasize the CB’s role in the acceleration of ‘orofacial gestures. The “Probabilistic Reduction Hypothesis” (Gregory, Raymond, Bell, Fosler-Lussier, & Jurafsky, 1999) precisely describes the articulatory reduction of the predictable (emulated) linguistic items in speech production. 4.1.2. Semantic bleaching and proceduralization of conceptual representations
Semantic bleaching has been attributed to habituation processes: the organism ceases to exhibit the same response strength to frequently occurring stimuli (Haiman, 1994). A strong neural candidate is the attenuation of the actual sensory consequences as compared with the CB predictions (Blakemore, Wolpert, & Frith, 2000). Gating of sensory information heavily involves the BG (see section 4.2). “Shallow processing” (Barton & Sanford, 1993) and “goodenough representations” (Ferreira, Bailey, & Ferraro, 2002) capture aspects of minimized attentional costs in semantic processing that a routinization-induced low KG modality may achieve. To the extent that processing efficiency increases, semantic representations of words and constructions are underspecified, and ultimately bleached. However, semantic bleaching expands the contexts of occurrence of linguistic items, inviting non-conventional inferences (Heine et al., 1991). While such higher cognitive inferential processes should heavily involve the cognitively demanding exploration of the temporoparietal cortex (the putative conceptual repository), grammaticalization does not occur but with the “proceduralization” of the conceptual representations that such nonconventionalized inferences invoke. Procedural encoding provides the “necessary processing constraint on the interpretation of an associated conceptual representation” (Nicolle, 1998, p.23). Characteristically, while it “performs the same role in constraining or guiding the interpretation of the utterance that an increase in the number of lexical items can have” (Lapolla, 2003, p. 1 3 9 , procedural encoding is “automatically recovered (in addition to being merely activated on decoding)” (Nicolle, 1998, p.23).
14
Proceduralization reflects KG minimization in semantic processing (figure 3): The “cognitive cerebellum” may (redundantly) emulate the subconscious “mental background”, e.g., the rules of a game, constraining the conscious, cortical “mental foreground”, e.g., planning for a winning strategy (Thach, 1998): A CNMC might connect to the cerebral loop as a reliable copy of the thought model in the temporoparietal areas, with the thought process being alternatively conducted by the frontal areas acting on the CNMC rather than on the temporoparietal areas, adaptively avoiding the conscious effort needed for the exploration of cortical loci (Ito, 2000). H 0 1J
7’ I
N I
z A ?’ I
0
N Figure 3. Routinization-induced proceduralization “cerebellarization” of cognitive repertoires.
of
conceptual
encoding
meets
the
4.2. Striatal Regulation of Cortical Unification Operations
The fuzziness of syntactic categoriality, emphasized by grammaticalization theorists (e.g., Givbn, 1979), has recently attracted researchers from computational/psycholinguistic probabilistic modeling (Zuraw, 2003 for a review), encouraging the definition of categoriality on the basis of the particular constructions that each item occurs in. In Pulvermuller’s (2002) neuronal syntax, lexical categories are defined by the set of the very complements lexical categories require, i.e., by their “sequence regularities” (ibid.). An efficient parser thus gates candidates for unification based on the context-sensitive inhibitory strengths of their connections to their competitors; this is directly reflected in Vosse and Kempen’s (2000) model, and is implementable by Pulvermuller’s (2002) “striatal regulation of cortical activity”. Characteristically, Walenski, Mostofsky, & Ullman (2007) report particularly speeded processing of procedural (both linguistic and non-linguistic) knowledge for Tourette’s syndrome subjects, attributing it to their BG
15 abnormalities in the inhibition of frontal cortical activity. Grossman, Lee, Morris, Stern, and Hurtig (2002) found a correlation between sentence comprehension and Stroop task performance in Parkinsonians, while Hochstadt, Nakano, Lieberman, and Friedman (2006) attributed their compromised capacity of parsing relative clauses to “deficits in cognitive set-switching’’ or “underlying inhibitory processes”. Inhibition and reinforcement underlie probabilistic representation: BG patients exhibit deficient probabilistic category learning (Knowlton et al., 1996). In its acquisition phase, striatal activation is involved for normal individuals (Poldrack, Prabhakaran, Seger, & Gabrieli, 1999). Thus, grammaticalization-induced decategorialization, becoming manifest with alterations in E- language distributional patterns, is efficiently monitored by BG reinforcement learning, via the dopamine-mediated regulation of the inhibitory strengths among syntactic variants that compete for unification with a particular linguistic item (figure 4).
---
Grammaticalization-induced change in E-language
€-language jistributional Jatterns for a linguistic item
AI*ID (x%) BI‘ (YYO)
AI*ID
(XYO)
BPlG (w%)
N ’ l D (z%) ‘ID
(U%)
BI’IG (w%)
(V%)
J - J . ~ J - I . I ~ L J - I I ~ L ~ J - L D O P A M I N E .M f D I A T E D R E I N F O R C E M E N T L E A R N I N G LLL-III-II-m-LL~-m-LI I-language inhibitory relations
Striatal re-regulation of cortical unification operations
Figure 4. t(l)-t(Z): a member of category ti frequently co-occurs with the sequence B/*/,which triggers the strengthening of its probabilistic representation in the frontostriatal circuit, and thus the strength of the inhibitory signals sent to the competing alternatives: a gradual “obligatorification” (Lehmann, 1995). t(2)-(3): a member of category A initiating the sequence A/*/D becomes optional, i.e., outcompeted in the winner-takes-all BG selections for cortical linguistic unifications.
5.
Conclusion
I proposed two fundamental neurolinguistic mechanisms grounding grammaticalization operations: a) the cerebellar-induced Kalman gain reduction in linguistic processing, and b) the basal ganglionic adaptive regulation of cortical unification operations.
16
Acknowledgments My special thanks to Prof. Jim Hurford, Dr. Thomas Bak, Prof. Mike Paulin, Dr. Holly Branigan, Dr. Patrick Sturt, Dr. Andrew Smith, and Dr. Anna Parker. I gratefully acknowledge the Pavlos and Elissavet Papagiannopoulou Foundation for the financial support for my PhD studies.
References Ackermann, H., & Hertrich, I. (2000). The contribution of the cerebellum to speech processing. Journal of Neurolinguistics, I3,95-116. Barton, S.B., & Sanford, A.J. (1993). A case-study of anomaly detection: shallow semantic processing and cohesion establishment. Memory and Cognition, 21, 477487. Blakemore, S.-J., Wolpert, D., & Frith, C. (2000). Why can’t you tickle yourself? NeuroReport, I I , 1 1- 16. Bybee, J. L. (1998) A functionalist approach to grammar and its evolution. Evolution of Communication, 2,249-278. Bybee, J. L., & Thompson, S. (2000). Three frequency effects in syntax. Berkeley Linguistics Society, 23,6585. Doya, K. (1999). What are the computations of the cerebellum, the basal ganglia and the cerebral cortex? Neural Networks, 12,961-974. Fabbro, F., Tavano, A,, Corti, S., Bresolin, N., De Fabritiis, P., & Borgatti, R. (2004), Long-term neuropsychological deficits after cerebellar infarctions in two young adult twins . Neuropsycholog ia , 4 2 , 53 6-545. Ferreira, F., Bailey, K. G. D., & Ferraro, V. (2002). Good-enough representations in language comprehension. Current Directions in Psychological Science, I I , 1 1-15. Fiez, J.A., Petersen, S.E., Cheney, M.K., & Raichle, M.E. (1992). Impaired non-motor learning and error detection associated with cerebellar damage. Brain, II5, 155-178. G i v h , T. (1979). On Understanding Grammar. New York: Academic Press. Graybiel A. M. (2000). The basal ganglia. Current Biology, 10,509-1 1. Gregory, M. L., Raymond, W. D., Bell, A,, Fosler-Lussier, E., & Jurafsky, D. (1999). The effects of collocational strength and contextual predictability in lexical production. Proceedings of the Chicago Linguistic Society, 99, 151-166. Grossman, M., Lee, C., Morris, J., Stem, M.B., & Hurtig, H.I. (2002). Assessing resource demands during sentence processing in Parkinson’s disease. Brain and Language, 80, 603-616. Grush, R. (2004). The emulation theory of representation: motor control, imagery, and perception. Behavioral and Brain Sciences, 27,377435. Haiman, J. (1994). Ritualization and the development of language. In William Pagliuca (Ed.), Perspectives on Grammaticalization, (pp. 3-28). Amsterdam: John Benjamins. Heine, B., & Reh, M. (1984). Grammaticalization and Reanalysis in African Languages. Hamburg: Helmut Buske. Heine, B., Claudi, U., & Hiinnemeyer, F. (1991). From cognition to grammar. Evidence from African languages. In E. C. Traugott and B. Heine (Eds.), Approaches to grammaticalization, Vol. 2 (pp. 149-187). Amsterdam: Benjamins. Hochstadt, J., Nakano, H., Lieberman, P., & Friedman, J. (2006). The roles of sequencing and verbal working memory in sentence comprehension deficits in Parlunson’s disease. Brain and Language, 97,243-25.
17
Ito, M. (1984). The Cerebellum and Neural Control. New York Raven Press. Ito, M. (2000). Neural control of cognition and language. In A. Marantz, Y. Miyashita, and W. O’Neil (Eds.), Image, language, brain (pp. 149-162). Cambridge, MA: MIT Press. Knowlton B.J., Squire, L.R., Paulsen, J.S., Swerdlow, N.R., Swenson, M., & Butters, N. (1996). Dissociations within non-declarative memory in Huntington’s disease. Neuropsychology, 10,538-48. LaForce, R., & Doyon, J. (2001). Distinct contribution of the striatum and the cerebellum to motor Icarning. Brain and Cognition, 45, 189-250. LaPolla, R. J. (2003). Why languages differ: Variation in the conventionalization of constraints on inference. In D. Bradley, R.J. LaPolla, B. Michailovsky, and G. Thurgood ( a s . ) , Language Variation: Papers on Variation and Change in the Sinosphere and in the Indosphere in Honour of James A . Matisoff (pp. 113-144). Canberra: Pacific Linguistics, Australian National University. Lehmann, C. (1995). Thoughts on grammaticalization. Munich: Lincom Europa. (First published as akup 48, Institut fur Sprachwissenschaft, LJniversitat zu Koln, 1982). Nicolle, S. (1998). A relevance theory perspective on grammaticalization. Cognitive Linguistics, 9(1), 1-35. Paulin, M.G. (1989). A Kalman filter theory of the cerebellum. In M.A. Arbib and S.-I. Amari (Eds.), Dynamic Interactions in Neural Networks: Models and Data (pp. 239259). New York Springer. Pickering, M.J., & Garrod, S. (2004). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27, 169-225. Pickering, M. J., & Garrod, S. (2007). Do people use language production to make predictions during comprehension? Trends in Cognitive Sciences, 11(3), 105-1 10. Poldrack. R. A., Prabhakaran, V., Seger, C. A,, & Gabrieli, J. D. (1999). Striatal activation during acquisition of a cognitive skill. Neuropsychology, 13(4), 564-574. Pulvermuller, F. (2002). The neuroscience of language: On brain circuits of words and serial order. Cambridge: Cambridge University Press. Stowe, L.A., Paans, A.M.J., Wijers, A.A., & Zwarts, F. (2004). Activations of “motor” and other non-languagc structures during sentence comprehension. Brain and Language, 89,290-299. Thach, W.T. (1998). What is the role of the cerebellum in motor learning and cognition? Trends in Cognitive Sciences, 2(9), 33 1-337. Thach, W. T., Mink, J. W., Goodkin, H. P., & Keating, J. G. (2000). Combining versus gating motor programs: Differential roles for cerebellum and basal ganglia. In M. S. Gazzaniga (Ed.), Cognitive neuroscience: A reader (pp. 366375). Oxford: Blackwell. Vosse, T., & Kempen, G. (2000). Syntactic structure assembly in human parsing: a computational model based on competitive inhibition and a lexicalist grammar. Cognition, 75,105-143. Walenski, M., Mostofsky, S.H., & Ullman, M.T. (2007). Speeded processing of grammar and tool knowledge in Tourette’s syndrome. Neuropsychologia. 45,2447-2460. Wolpert, D.M., Miall, R.C., & Kawato, M. (1998). Internal models in the cerebellum Trends in Cognitive Sciences, 2(9), 338-347. Zuraw, K. (2003). Probability in Language Change. In R. Bod, J. Hay and S. Jannedy (Eds.), Probabilistic linguistics. (pp. 139-176). Cambridge, MA: MIT Press.
PRAGMATICS AND THEORY OF MIND: A PROBLEM EXPORTABLE TO THE ORIGINS OF LANGUAGE TERESA BEJARANO Department of Philosophy, Logic, Philosophy of Science. University of Sevilla (Spain) According to Pragmatics, the speaker’s intention in declarative speech is to complete or correct the hearer’s belief. But the age at which children begin to produce this type of communication is prior to their success in the ‘false belief test. Here, after proposing a solution to the problem’s hard version -i.e., to early declarative replies-, I will pose it at the level of language origin. A threat of vicious circle and a basic dilemma will be encountered.
1. The problem
The pragmatics derived more or less directly from Grice states that in declarative (or, called otherwise, predicative) communication the intention of the speaker is to complete, correct or update (or, in the case of a lie, to alter) the belief that he, the speaker, assumes the hearer to have. Therefore, the grasp of alien beliefs is a fundamental requirement for the child to produce such type of communications. But the age in which children begin to produce them is far prior to their success in the false alien belief grasping test (This test presents a child the following scenario: Maxi puts his cake in the kitchen cupboard and leaves the room. While he is away, his mother moves the cake to a drawer. Maxi returns. Then the experimenter asks: ‘Where will he look for his cake?’). This problem was detected years ago. Risjord (1996) chose to reject pragmatic statements about predicative communication. Breheny (2006) defends the distinction between an initial linguistic competence, which a child who fails the Maxi test would have, and an elaborated competence, which would agree with the statement derived from pragmatics. The issue of initial competence is explored in Liszkowski (2006): At 12 months old some of children’s pointing gestures intend to provide information (telling an adult the location of something the adult is looking for) and therefore require children to have already attributed ‘a state of unawareness of a certain visual field’ to the addressee. This 18
19
attribution is clearly less demanding than the ability of grasping alien beliefs: according to Tomasello, Call, Hare (2003), chimpanzees can attribute a current, and also an immediately past, visual perception to a fellow (although I would add that they cannot do it if the fellow is looking at them). But according to Breheny or Liszkowski, there still remains an even harder problem. It is in the earliest predicative replies where Theory of Mind and Gricean pragmatics seem to unavoidably clash. This is the issue on which I want to focus my attention. Here, I am going to first of all propose a solution at the ontogenetic level. I will then pose the problem at the level of the historic origin of language. While the child’s learning data cannot in any way be projected onto historic origin, some problems, on the other hand, which, like this one, have been posed in ontogenesis, are, in my view, more usefully exportable.
2. Second-person belief: An easier path for alien belief grasping As seen, up to now the age in which children begin to pass the Maxi-style tests has been taken as an immovable premise. Certainly, the data are clear and certain. Samples of that test have been given to children from many various cultures, the result being always the same (Wellman et al., 2001). However, I propose that the test constitutes only one road, and a particularly difficult one, to alien belief grasping. Reception via language is an easier road. Many authors propose a strict parallelism between alien belief grasping and the mastering of mental-state terms (Bartsch & Wellman, 1995) or of subordinate syntax (de Villiers, in press). Certainly, Maxi’s belief has no linguistic expression outside complements embedded under belief verbs. Nevertheless, that is not at all applicable to all alien beliefs (Frith & de Vignemont, 2005: “The distinction between second-person and third-person perspective is overlooked in the mind-reading literature”). Let’s suppose that a speaker approaches me and says ‘The cat is on the mat’ while I can see that there is nothing on the mat. Such false belief comes to me by means of that simple assertion, and there is no need for me to ‘redescribe’ it (or, more concretely, to make it explicit as a complement embedded under the verb think or the behavioural and jointly mental verb say). The very message that I hear expresses for me my interlocutor’s false belief (cf. Harris, 1996). Given that the understanding of a simple sentence is achieved by the child much earlier than age 4, the alleged clash between pragmatics and Theory of Mind has vanished.
20 3. Applying the easy way for alien belief grasping to historic origin
The second level on which we must treat the problem is that of historic origin. Let’s go back to the previous example. When I heard ‘The cat is on the mat’, I fulfilled the requirement to produce myself predicative communications (in this case ‘It is not on the mat’, or ‘There is nothing on the mat’). But if we are going to focusing on the historic origin of predicative communication, that example involves a clear threat of an infinite regress or vicious circle. We cannot place as a requirement the understanding of a predicative communication (Certainly, we often use predications when we are not caring about alien beliefs. Emotional unloading, a desire of showing off one’s education, etc. can often suffice. But could these uses be the first ones?). Certainly, interrogatives are also a completely adequate means for a speaker to communicate insufficiency (not falsehood in this case, but only insufficiency) of their own belief. However, it is not at all clear that turning to interrogation could solve the situation. Firstly, because interrogation, both total and partial, requires that the speaker is assuming the existence of differences between their own mind (not only visual field) and the alien mind - i.e., between the scarce knowledge that the speaker has about the subject and the superior knowledge that he assumes the hearer to have. This requirement, compared with that of alien belief grasping, could be just as demanding. Secondly, because we know that, at least in children, interrogative speech appears after predicative speech. So what could get us out of this vicious circle? We need a message whose reception is able to reveal to the audience the false belief fostered by the speaker but whose communicative functions are neither declarative nor interrogative. Could that happen? I believe it could. A vocative or a petition can reveal their speaker’s false belief. Somebody calls somebody else, who the hearers know is not present. That is enough for the hearers to grasp the callmaker’s false belief. The same result follows if an object, which the hearers know is not available, is required. This is what we needed in order to solve the problem on the historic level. But we have not reached this point by speculating with the problem’s needs. On the contrary, this type of access to alien false belief is highly frequent in children. Children receive messages of all kinds, with predicative and interrogative functions as well as with petition or call functions. However, it is very frequent that their earliest predications respond to a petition. I will give one example amongst many others. An adult and a child are playing with wooden blocks. Later, the adult requests the child, who has the box
21
with the blocks in his hands, ‘Give me more blocks! More!’ In that moment the child, who still has the box in his hands, sees that it is empty and says (Spanish) ‘Mas (=more) no’. As you can see, predication cannot be any simpler. The child repeats the adult petition’s keyword, and adds the easiest predicate (or rhema) of all: ‘no’. This ‘no’ rejects both the alien speech act and the alien belief involved in this a c t a This, I stress, is an example which has been truly observed in children. But my point is that also in historical origin that is the way in which the first grasping of alien beliefs would have occurred. Even a language which is still unable to carry out predicative and interrogative communication could prompt the hearer to grasp the speaker’s false belief. The threat of the vicious circle has vanished. 4. There was no need for previous syntax, but a sign with precise referential link was required
We must highlight another requisite - a requisite for the grasping of alien beliefs about which we have not yet talked. This requisite, which is undoubtedly fulfilled on the ontogenetic level, becomes more problematic when we project it onto historical origin. In order to see what this requisite is, let’s start with the following example. In a chimpanzee clan a youngster calls her mother. It is known that when they hear a youngster’s shout, clan members guide their sight to the youngster’s mother. Now, we must suppose that clan adults have seen that the mother is trapped at a considerable distance. The question is, of course, whether we are authorised to attribute to those adult chimpanzees the grasping of the youngster’s false belief. We wonder about this when deciding on a translation of the youngster’s shout. Does it really correspond to the vocative ‘Mum!’? Could it not correspond as faithfully (or as unfaithfully) to a petition for maternal care? The pragmatic meaning of the youngster’s shout is clear, but its semantic value is not at all. This should not surprise us in the slightest. The semantics of language is shaped by syntax. There is no linguistic meaning which could escape from the
-
a
There, logicians would say that if somebody requires an object, we should infer that he believes that such object is available in their surroundings. Thus, it would be necessary in this case to add an inferential process to the linguistic comprehension process. That, I stress, is what logicians say. Given that logic only contemplates what is explicit, it is clear that explicitness of alien belief involves an additional cognitive step. However, explicitness, necessary for logical formalisation, is not at all necessary for comprehension. The child can grasp perfectly his mother’s belief as he hears his mother’s message. So, logicians’ verdict - that there would be a greater effort on the part of the hearer when facing the petition message - does not count for our proposals.
22 obligation of falling into some of the ‘parts of speech’. Certainly, language puts at our disposal semantic options, and, more concretely, most languages offer us the verb ‘run’ and also the noun ‘runner’, for example.b However, there is no linguistic sign which in its specific use does not fall into one specific option. If we want refer to a scene without using signifiers which are ‘parts of the speech’, then we could only resort to finger-pointing, or to drawing the scene. If we remain within language borders, it will be impossible. The semantics of our language is configured for and by syntax, and so it can only offer us ‘parts of speech’. Therefore, the treason of any linguistic translation of the youngster’s shout is a completely unavoidable fact. But if that treason and that mistake are absolutely unavoidable in any linguistic translation, they are equally real and serious. We must reject them. The youngster’s shout signifies both calling their mother and a petition for care, and because it means both things at the same time, it does not properly mean one thing or the other. That shout is prior to such differentiation. We are now in a position to oppose the conclusion that adult chimpanzees would be grasping the youngster’s false belief. But it is not chimpanzees that we are interested about. Let’s go back to the historical origin of alien belief grasping. What we can add now is a new requirement. We had mentioned earlier that the earliest language (a non-predicative and non-syntactic language) would have been able to reveal the emitter’s false belief to the hearers of a message. Petitionlcall pre-syntactic messages would have been enough, just as on the ontogenetic level the mother’s petition for more blocks is enough for the child to grasp her mother’s false belief. That was, I stress, our proposal. But now we must specify that that language, as simple as we conceive it, would have to already possess a referential connection which is well-adjusted between sign and reference in the world.
5. What would a pre-syntactic linguistic sign be like? Trying not to take for granted our own schemes How would signs have originally arrived to that referential connection which discards ambiguity between calling their mother and care petition, and which consequently allows the hearer to grasp the speaker’s false belief’? Or, outside this example, let’s focus on discarding the ambiguity between a petition for a hammer and the command to hit with whatever we randomly find, or between a Those semantic-syntactic options do not have to necessarily coincide with the difference nounherb seen in languages around us. But in one way or another, semantic-syntactic options seem to be an absolute linguistic universal (Hurford, 2007).
23 petition for a rope and the command to tie something with whatever object can serve this purpose.‘ The point I want to stress is that if the original signs belonged to a language without syntax, then those signs would not have at all guaranteed membership to any of the ‘parts of speech’. Were there true meanings of objects and actions, or (cf. Hurford, 2007) of categories and egocentrically distal locations, in such a language? That is an arguable matter nowadays. My bet (cf. Bejarano, in press, chapter 12) is the following. Conceptual semantics is always a reflection of syntactic language. Certainly, prelinguistic children discriminate between different agents and also between different categories of events (Pulverman et al., 2006). Likewise, in many animal species two separable ‘where’ and ‘what’ neural pathways exist. In short, I completely agree that babies and animals with brains possess concepts. However, conceptual semantics involves an additional ability. Unlike in perceptive compositionality, in syntactic compositionality the elements are focused on separately - agent I action, or location I category, etc. (Cf. subitizing versus the human process of counting, an analogous contrast of two types of compositionality). This is what, in my view, prelinguistic beings lack. But if we choose this hypothesis I have just exposed, then a problem for my proposal arises. Here are the two horns of the dilemma. On the one hand, in order to pay attention to the first access to false alien belief, we need to pay attention to a non-predicative language, which would contain communicative functions of petition or call (i.e., functions which try to immediately make the world adapt to the speaker’s wishes). That language would probably remain confined into a pre-syntactic state and, consequently, its signs would not be included in any ‘part of speech’. But, on the other hand, for the reception of such a language to provoke alien belief grasping, its messages would have to be free from the above-mentioned ambiguity. How could we obviate the danger of ambiguity if we do without syntax and without the semantics shaped by syntax? This is the key question for the proposal.
‘ The ambiguity normally referred to is not this one, but that of Quine’s Gavagai. But the Quinean one, ‘rabbit’ / ‘left hind-foot o f a rabbit’, is much more superficial. It is worth pointing out how for the alarm calls of vervets, double interpretation ‘leopard’ / ‘run for the outer branches’ has been repeatedly stressed. On the other hand, this type of ambiguity tends to be forgotten for the origin of linguistic signs. An example of this tendency is Burling (2005), although he seems to heal his forgetfulness in p. 130, for example.
24 6. Protodeclaratives in their disambiguating role
Although this whole problem is more virulent in the context of historic origin, I believe that a glance at the child could be fruitful. In the child at the holophrastic stage we find a very peculiar communicative function which basically disappears after this stage. Very frequently, the child points with his finger to an object which he is clearly not demanding, and while he looks alternatively at the object and at the hearers, he produces the object’s name (This corresponds to the first of the two types of declarative holophrase, i.e., to that which appears in the holophrastic stage, cf. Goldin-Meadow, 1999). These productions of the child are undoubtedly a magnificent resource for learning.d But it is also the very exact solution that we needed. The protodeclarative holophrase guarantees a referential connection between sign and object. At an ontogenetic level, that mission docs not cease to appear. But it is mostly on the historic origin level where that mission and victory against ambiguity is highly important. Thanks to the protodeclarative, ‘rope’ would have become just rope, and ‘mum’, just mum.e So, when a hearer receives a message in which the speaker requires an object which is not available or calls an individual who is absent, those messages, despite all their primitivism, could reveal to the hearer the speaker’s false belief. (In the origin, was the protodeclarative a resource exclusively for children? We do not know. But we could think that the child’s inability to participate in technical tasks would facilitate the bleaching of children’s messages, both in their production and in their understanding). But this importance that we have attributed to the protodeclarative in historic origin of alien belief grasping, this importance in the road to predicative and syntactic language, forces us also to value the importance of a factor which
It is not only that through it children check their lexical acquisition. On top of that, adults will probable respond with a complete sentence about the object. Some of their words would probably be new to the child. But -here is the crucial point- such words would be received by children in the best imaginable circumstances for their learning. Children know that the adult message is connected with the referred object. So, protodeclarative holophrases do not only provide the child the pleasure of feeling integrated into the linguistic community. They are an instrument through which that integration is consolidated. This would have been the earliest ‘bleaching’ and grammaticalization. In protodeclarative holophrases a sign with no communicative usefulness of its own is achieved. This type of holophrase, as opposed to the one accompanying the imperative pointing, does not demand (and, also as opposed to the Liszkowski’s silent pointing, does not inform). But thanks to this uselessness, the sign becomes apt to be combined with others. Likewise, in protodeclarative holophrases, the (originally omnipresent) intonation of call/petition disappears, and, consequently, the sign becomes apt to be produced with different intonations.
25 is fundamental for the appearance of protodeclarative holophrases. These holophrases would not have a reason for existence if signs were not an object for imitative, cultural learning. For an innate shout, a protodeclarative would be an absurd and unthinkable mechanism. The child with the protodeclarative holophrase practices both meaning and articulatory-phonetic pattern. The conclusion to which we arrive regarding all this is that the feature of learnt and imitated which is typical of linguistic signs, this feature, I stress, of arbitrariness and artificiality, becomes crucial for the transit from the pre-language prior to syntax to a finally predicative and syntactic language. References
Bartsch, K. & Wellman, H. (1995). Children talk about the mind. Oxford U. P. Bejarano, T. (2007). En busca de lo exclusivamente humano. A. Machado Libros, S. A. Breheny, R. (2006). Communication and Folk Psychology. Mind and Language, 21,74-107. Burling, R. (2005). The talking ape. Oxford U. P. de Villiers, J. (in press). The interface of language and Theory of Mind. Lingua. Frith, U. & de Vignemont, F. (2005). Egocentrism, allocentrism, and Asperger syndrome. Consciousness and Cognition, 14,719-738. Goldin-Meadow, S . (1999). The role of gesture in communication and thinking. Trends in Cognitive Sciences, 3,419-429. Harris, P. (1996). Desires, beliefs and language. In P. Carruthers & P. K. Smith (Eds.), Theories of theories of mind (pp. 200-222). Oxford U. P. Hurford, J. (2007). The Origin of Noun Phrases. Lingua, 117, 527-542. Liszkowski, U. (2006). Infant pointing at 12 months. In N. J. Enfield & S. C. Levinson (Eds.), Roots of human sociality (pp. 153-178). Berg. Pulverman, R., Hirsh-Pasek, K., Golinkoff, R. M. Pruden, S. & Salkin, S . (2006). Conceptual Foundations for Verb Learning. In Hirsh-Pasek & Golinkoff (Eds.), Action meets word (pp. 134-1 59). Oxford U. P. Risjord, M. (1 996). Meaning, belief and language acquisition. Philosophical psycho lo^, 9,465-475. Tomasello, M., Call, J. & Hare, B. (2003). Chimpanzees understand psychological states -the question is which ones and to what extent. Trends in Cognitive Sciences, 7 , 153-156. Wellman, H. M., Cross, D. & Watson, J. (2001). Metaanalysis of theory-of-mind development. Child Development, 72,655-684.
TWO NEGLECTED FACTORS IN LANGUAGE EVOLUTION DEREK BICKERTON University of Hawaii
How language evolved has been a focus of attention for researchers in a dozen different disciplines over the last two decades. The fact that no viable theory of language evolution has yet emerged is unquestionably due in part to the difficulty of the problems involved, but these problems have been exacerbated by neglect of two crucial factors. One is is the precise nature of the differences between language and nonhuman communication systems (henceforth NCSs) and the other, the ecological gap between human ancestors and other ape species. Most work on language evolution, whether by linguists (Hurford 2007) or non-linguists (Johansson 2005, Pollick and de Waal 2007), takes as a usually unquestioned assumption that both NCSs and language are best examined under the rubric of communication. This is natural; both do communicate, and to regard both as primarily “means of communication” seems to offer an enticing prospect of comparative studies that will shed valuable light on how language evolved. One consequence is a persistent search among NCSs for “precursors of’ or “stepping-stones towards” language. However, it has often been observed that communication is not the only or even necessarily the first function of human language. It has been less frequently observed that the same is true of NCSs. Most NCS signals are simply hypertrophied versions, culled by natural selection, of behaviors primarily designed not to communicate information to the recipient but to confer fitness on the sender. For instance, the placatory noises and postures produced by lower-status social animals in confrontations with higher-status conspecifics do not convey any information that benefits the latter; such signals existence only because they prolong the life (hence the procreative potential) of their sender. Contrast this with what happens in language. If I explain to someone that a certain type of footprint belongs to a dangerous predator, the fitness benefit to me, as sender of the message, is (apart from what I might gain from future reciprocity) zero. However, the receiver directly benefits in terms of potentially life-saving (hence procreation-extending) information. In short, NCSs primarily 26
27 benefit the sender, while language primarily benefits the receiver (the latter fact having already been widely noted as presenting problems for evolutionary accounts of language.) Once we view so-called “NCSs” as vehicles for the enhancement of individual (sender) fitness rather than the transmission of information per se, their features, often puzzling when NCSs are seen as communication, begin to make more sense. For instance, one factor that militates against any “progressive” view of NCS-to-language development is the absence of any evolutionary development in the complexity or scope of NCSs. NCS signals, regardless of species, have only three domains of reference: mating, social interaction and survival (Hauser 1996). While NCSs of “higher” species may have more signals than those of “lower”, this difference stems from a broader behavioral range rather than any specifically cognitive increment. Also there is considerable overlap between systems in different phyla (some fish and insects have more signals than some mammals) and the difference between NCSs with most and fewest signals falls into the lower double digits (Wilson 1972). Why would a primate species with much greater cognitive power (thus presumably much more to communicate) have only a few more signals than some humble amphibian? What we have here can hardly be described as “stepping stones” to anything-certainly not towards a more comprehensive or more developed communicative system. Communication is not the primary goal of NCSs. NCSs are sets of signals, each designed to promote an individual’s (inclusive) fitness, and selected not as a step up some communicative ladder but in response to needs and fitness requirements of a particular species in a particular niche. A very relevant example is found in “functional reference”. Examples of such reference, such as vervet alarm calls, are often hailed as precursors of words. But if this is so, it becomes hard to explain why in the species closest to humans functional reference is almost entirely absent, while it occurs not merely among vervets but also among species with little claim to intelligence, such as chickens. Great apes lack signals with functional reference because such signals-predator warnings and some food calls-are not required by the niches they occupy. In contrast, while language as a whole may be adaptive, not a single unit of human language has, in and of itself, any impact on the fitness of the sender, or any linkage to specific types of fitness-significant event in the way that NCS signals are linked. (It is, perhaps, significant that the few types that might look like exceptions in language-exclamations like “Help!”, etc.--typically occur in isolation, unlike the vast majority of words, just as NCS signals do; also, like NCS signals, such words cannot combine in order to form larger wholes.)
28 The inability of NCS signals to combine into larger, more precisely meaningful sequences is another NCS characteristic that would be arbitrary and counterintuitive if such signals were primarily communicative and located on some kind of continuum leading to language. Lack of signal combinability in NCSs does not result from limits on the cognitive capacity or processing skills of particular species. It follows logically from the fact that all signals, being directly adaptive, are of their nature welded to highly-specific situations, and meaningless outside those situations. Moreover, these adaptive responses are complete in themselves, so for NCS signals to modify or combine with one another is impossible, and would be dysfunctional if it were possible. Claims that responses of Diana monkeys to calls by Campbell’s monkeys (Zuberbuhler 2003) are precursors of syntax disregard the fact that the second signal does not modify but simply cancels the first signal-something that never occurs in language--and that no human language involves interspecies communication. Moreover, the Campbell’s monkeys have no intention of communicating to the Diana monkeys; the Diana monkeys respond adaptively to Campbell alarm calls as they would to any non-communicative aspect of the environment. There are, of course, both cognitive and purely physiological pre-requisites for (as opposed to precursors oj) language. But these are, without exception, necessary but not sufficient pre-requisites, and few if any owe their existence to NCSs. For instance, the ability to distinguish between different predator species underlies NCSs with more than a single alarm call, but that ability did not come from any NCS development-rather, NCSs capitalized on pre-existing capacities for categorization based on sensory data. In other words, prerequisites for language grew, not out of prior modes of communication, but directly from exigencies placed on various species by the particular niches they found or developed. Consequently, the difference between NCSs and language is no mere matter of degree. The two have different functions and work in different ways, hence the gap between language and NCSs is not merely wider than, but different from, what a majority of researchers has assumed, rendering overly optimistic every proposal for bridging the gap that has so far been made. Does this mean we can rule out any connection between NCS and language? Certainly not. The idea that language sprang up beside a pre-human NCS as a separate system, originating from and based upon some prior “language of thought”, is not an unattractive one; it has attracted me (Bickerton 1990) and continues to attract others ( e g Chomsky 2005). However, that idea is based on another error, one for which there is insufficient space to discuss here: belief in continuity between human and non-human cognition. In fact, language had somehow to be conjured out of a pre-existing NCS, since there was nowhere else to get it from. But such a development could not have come
29
from any straightforward development in NCSs; it could only have been due to some rogue factor that radically distorted the pre-human NCS, serving as a wedge for the introduction of properties that would radically alter the pre-human NCS and eventually cause language to separate from it. The necessity for a scenario along these lines should have been apparent from the fact that no trace of any development leading to language can be found in any other living species, no matter how closely related to us. Consequently, language is likeliest to have arisen from some set of circumstances affecting human ancestors but no other species. In order to discover such circumstances, researchers have to pay attention to what can be determined, from the fossil, archaeological, and climatic records, about the unique course of human evolution. But for some reason, researchers in have been extremely unwilling to consider these records. Hurford (2007) is undoubtedly the most thoroughgoing attempt so far to examine the circumstances in which language emerged, yet it contains only a couple of fleeting references to pre-human species, and even these involve issues-sexual dimorphism in erectus and vocal behavior in habilines-that are both controversial and peripheral to the main issues in language evolution. Two factors may have encouraged the neglect of the paleoanthropological record: the often sketchy, vague, and ambiguous nature of the evidence that record provides, and genetic closeness between humans and apes. The latter strongly suggests that apes might serve as a rough model of what human ancestors might have been like, and a living species certainly makes a more attractive model than a sprinkling of bones. However, some things have to be weighed before we commit ourselves to the straight-line, ape-to-human model of evolution that most researchers have adopted. First, genetic closeness does not entail behavioral closeness. Genes are not destiny; genes plus environments are destiny. In different environments and ecologies, the same genes may express themselves in very different ways, and in no case are the links between genes and behavior simple and direct. Second, while the record of human evolution leaves much to be desired, some facts are well established, and these cannot simply be ignored. The most minimal survey of the fossil and archaeological records should suffice to show that any straight-line model of human evolution is not just inadequate but misleading. The habitat of great apes is tropical forest (with small numbers of chimpanzees living on its edges). The habitat of human ancestors was first mosaic woodland and later open savanna. These habitats and their increasing distance from ape habitats dictated very different patterns of behavior with respect to social life, foraging and predation.
30 With regard to predation, human ancestors were exposed to a wide variety of predators that included eagles (Berger and Clarke 1995) and even giant weasels (Anderson 2004) as well as many members of the Canidae, Felidae and Hyaenidae families larger and more dangerous than their modern counterparts. For at least the first half of the period since the last common ancestor of humans and chimpanzeeshonobos, proto-humans were considerably smaller than modern humans and lacked any artifacts usable in self-defense. Consequently they would have spent more time watching for and evading predators than modern apes, and thus less time on the intense social interaction found among those relatively predator-free species. With regard to foraging, terrains inhabited by proto-humans were more hostile than those inhabited by apes, and destined to become still more so as drying trends continued through the closing years of the Pliocene. While apes are (and presumably were) able to subsist almost entirely on fruit, nuts and leaves, proto-humans would have had to become omnivores, with a consequent diversification of foraging strategies. While apes have (and presumably had) relatively small day-ranges, day-ranges of proto-humans must have exceeded these by several hundred percent. Time spent searching for food and returning to safe havens at night would also have reduced the time available for social interaction, Machiavellian strategizing and the like. The straight-line, ape-to-human model of human evolution implicitly assumes the following:
1. 2. 3.
Social intelligence is the most developed form of intelligence among apes. Language must owe its origins to the most developed form of ape intelligence. Hence, language must have developed from a pre-human social intelligence intermediate between that of apes and humans.
Of these assumptions, the first is based on sound and extensive evidence. However, the second and third have no empirical support while the third depends crucially on what the paleontological evidence shows to be highly improbable-that if x represents the level of ape social intelligence and 3 x represents the level of human social intelligence, the level of social intelligence in the species that first developed language must have been 2x. Regardless of whether the major driving force is increased tactical deception (Whiten & Byrne 1988), the need for a grooming substitute in larger groups (Dunbar 1993), or understanding the intentions of others (Tomasello et al. 2005), it is reasonable to suppose that increases in social intelligence arise
31 (perhaps can only arise) from a more intense and complex social life. However, all available evidence suggests that the social lives of proto-humans could hardly have been more complex than those of apes, and quite likely were less complex, since anti-predation strategies and increased foraging ranges would have sharply reduced the time available for socializing. Moreover, while the intention-reading and social-strategizing so often cited as driving social intelligence increments are in turn driven by within-group competition, the foodpoor, predator-infested environments typically inhabited by proto-humans placed a premium on co-operation, and made within-group competition a risky, sometimes dangerous pursuit. 1 am not suggesting that proto-humans ceased to be competitive-clearly they did not. But competition had to be reined in if groups were to survive in the hard times at the latter end of the Pliocene. Moreover, one of the things evolutionary theory will eventually have to explain is the high level of cooperation among humans as compared with its quite low level among our closest relatives. If it were possible to link the origins of language with the origins of cooperation, we could safely leave increased social intelligence and theory of mind to develop after, not before, the birth of language. The bottom line here is that if we want to understand how language arose, the most important thing to remember is that not even the first step towards language took place anywhere except in the human lineage--not even in our most closely related species. It follows that any search for language origins should look, not at things we share with other primates, but at things we don’t share-at the differences between our ancestors and the other great apes that arose as a result of the very different niches these species occupied over the last few million years. It has been suggested that the considerable behavioral differences between chimpanzees and bonobos stemmed from a single ecological factor-the availability of edible herbs in bonobo but not in chimpanzee habitats (Wrangham & Peterson 1996). Why would we not expect the far wider ecological differences between apes and proto-humans to yield far greater behavioral differences? It lies beyond the scope of this presentation to show how cooperation, language and distinctively human cognition could all have arisen from a single source, or even to deal in any detail with the processes involved (for a full account, see Bickerton, in press). However, that source most likely would have been displacement, rather than the learning or arbitrariness so often claimed. Arbitrariness already exists in NCSs-it would be hard to imagine non-arbitrary alarm calls-and to learn rather than innately develop signals still wedded to specific situations would not in and of itself cause any significant change in either the structure or the functions of an NCS. Displacement, however,
32
breaches the most significant limitation of NCSs signals-that these can relate only to things within the immediate sensory range of sender and receiver at the moment of signaling, and therefore don’t even have the possibility of becoming symbolic units. However, displaced signals, even if in the beginning these too were tied to specific situations, open the possibility not only of reference unlimited by space and time, but also of creating true symbols-signals that, unlike NCS signals, do not express feelings or indicate (re)actions, but refer to objective aspects of the environment just as words do. Only some internal change in a proto-human NCS could have triggered such a development, and only some difference in proto-human niches could in turn have caused such a change. But if we persist in ignoring the real differences between language and NCSs and ignoring the paleontological record, we simply close off the most promising avenue towards discovering how language began.
References Anderson, K.I. (2004). Elbow-joint morphology as a guide to forearm function and foraging behaviour in mammalian carnivores. Zoological Journal of the Linnean Society 142.1.91-104. Berger. L., & R.B. Clarke (1995). Eagle involvement in accumulation of the Taung child fauna. Journal of Human Evolution, 29.275-299, Bickerton, D. (in press). Adam’s Tongue: How humans made language, how language made humans. New York: Farrar, Straus & Giroux. Chomsky, N. (2005). Some simple evodevo theses: how true might they be for language? Paper presented as the Morris Symposium on Language Evolution, SUNY at Sony Brook. Dunbar, R.I.M. (1993). Co-evolution of newortical size, group size and language in primates. Behavioral and Brain Sciences 16.68 1-735. Hauser, M.D. (1996). The evolution of communication. Cambridge, MA: MIT Press. Hurford, J. (2007). The origins of meaning. Oxford University Press. Johansson, S. (2005). Origins of language: constraints on hypotheses. Amsterdam: Benjamins. Pollick, A.S., & F.B.M. de Waal(2007). Ape gestures and language evolution. Proceedings of the National Academy of Sciences 104.19. 81 84-89. Tomasello, M., M. Carpenter, J. Call, T. Behne & H. Moll (2005) Understanding and sharing intentions: the origins of cultural cognition. Behavioral and Brain Sciences 28.675-735. Whiten, A., & R.W. Byrne. (1988). Tactical deception in primates. Behavioral and Brain Sciences 11.233-44.
33 Wilson, E.O. (1972) Animal communication. In W.S.-Y. Wang, ed., The emergence of language: Development and evolution, 3-1 5. New York: Freeman. Wrangham, R., & D. Peterson (1996) Demonic males; Apes and the origins of violence. Boston: Houghton Mifflin. Zuberbiihler, K. (2005) Linguistic pre-requisites in the primate heritage, In M. Tallerman, ed., Language origins: Perspectives and evolution, 262282. Oxford University Press.
EXPRESSING SECOND ORDER SEMANTICS AND THE EMERGENCE OF RECURSION
JORIS BLEYS Arti9cial Intelligence Laboratory, Vrije Universiteit Brussel. Pleinlaan 2, Brussels, 1050, Belgium jorisb @arti.vub.ac.be Although most previous model-based research has not moved beyond first-order semantics, human languages are clearly capable of expressing second-order semantics: the meanings expressed in a sentence do not only consist of conjunctions of first-order predicates but also predicates that take other predicates as an argument. In this paper we report on multi-agent language game experiments in which agents handle second-order semantics. We focus our discussion on how this type of research is able to provide fundamental insights in how properties of humanlanguage-like properties could once have emerged. For recursion, this might have happened as a side-effect of agents trying to reuse previously learned language structure as much as possible.
1. Introduction
Although research on the emergence of communication systems with similar features as human natural language has shown important progress, the complexity of the meanings considered so far remains limited. Experiments either use simple categories (Steels & Belpaeme, 2005), conjunctive combinations of categories (Wellens, 2008) or predicate-argument expressions (Batali, 2002; De Beule, 2008). Natural languages are clearly capable of expressing second order semantics (Dowty, Wall, & Peters, 1981). For example, the adverb “very” in “very big” modifies the meaning of the adjective, it is not a simple conjunction of the predicates ‘very’ and ‘big’. Moreover the same predicate (e.g. ‘big’) can often be used in different ways, for example to further restrict the set of possible referents of a noun (as in “the big ball”), to state a property of an object (as in “the ball is big”), to reify the predicate itself and make a statement about it (as in “big says something about size”), to compare the elements of a set (as in “this ball is bigger than the others”), etc. The specific usage of a predicate in a particular utterance is clearly conveyed by the grammar, so any theory for the origins and evolution of grammar must address second order semantics. The present paper reports progress on how a communication system could emerge to express second order semantics, building further on previously presented research (Steels, 2000; Steels & Bleys, 2005, 2007) in which we adopted 34
35 the language game paradigm. In the current paper we do not consider the problem how second order semantics could emerge (discussed in more detail by Van den Broeck (2008)), but rather focus on the question whether recursive structure could arise and how.
2. Grounded semantic constraint networks The semantics of the utterances in the following experiment are not represented in a standard logic, but in an alternative formalism, Incremental Recruitment Language (IRL). In this view the meaning of a sentence is a semantic constraint network that the speaker wants the hearer to resolve in order to achieve the communicative goal selected by the speaker. The basic nodes of these networks are primitive constraints which are provided by the experimenter. Each primitive constraint has a number of arguments which can be bound to a certain variable. Variables are denoted using a question mark prefix. If a variable appears as an argument to more than one constraint, it means the value for this variable is constrained by more than one constraint. (equal-to-cpntext ?sl)
(prototype 7protol [block]) /
(uniquselernent ?ol ?s2)
Figure 1. On the left: a hypothetical world. On the right: a valid semantic constraint network to identify the topic (marked in grey for clarity).
In production the speaker has to find a semantic constraint network that is suitable to achieve the communicative goal (e.g. identifying the topic) it selected. An example of such a network, shown on the right in Figure 1, is able to identify the block in the world depicted on the left. The more complex the world (for example by adding a second block), the more complex the semantic constraint network will need to be in order to achieve this goal (for example by adding another filtering operation). This network needs to be encoded in a serial utterance which has to be decoded by the hearer in such a way that when it runs the constraint propagation algorithm over the network, it is able to achieve the communicative goal of the speaker. More details, for instance on the exact inner workings of any primitive constraints reported in this paper, or on how semantic constraint networks are constructed, can be found in Steels (2000), Steels and Bleys (2005) and Van den Broeck (2008).
36 3. Mapping semantic constraint networks onto language
The next question we have to answer is how the agents are supposed to encode such a semantic constraint network over a serial interface. We use Fluid Construction Grammar (FCG) as our substrate for this mapping. FCG is a computational formalism inspired by the general theory of construction grammar which states that each linguistic rule should be a pairing of syntax and semantics (Goldberg, 2003). In the experiments we report in this paper, the semantics of a rule consist of different parts of the semantic constraint network andor the variable equalities between these parts. The syntactic side is governed by syntactic categories andor word-order constraints andor simple forms (words).
/
(select-elements701 ?s2 ?sell)
\
(selector ?sell [random])
Figure 2. On the left: an example of semantic constraint network. On the right: a complete productionlparse tree to encode/decode this semantic constraint network. Each unit contains information on both semantics (m)and syntax (0. The bottom layer contains the semantic entities (entity units), the middle layer contains the primitive constraints that use these entities directly (functional units) and the top layer contains all other operations (contextual unit).
As shown in Figure 2 our basic approach is to divide the semantic constraint network in three layers of unitsa: (a) entity units containing the semantic entitiesb, (b) the functional units which make direct use of such a semantic entity and (c) contextual units which contain any remaining operations of the semantic constraint network that do not make direct use of any semantic entity. This program would be useful in a world in which there is more than one block, but the goal is to identify any block. As a rule of thumb, the reader can assume that each rule introduces one new unit in the production/parse tree. In production, syntactic information is added to the production tree: which words will be used to express certain semantic entities, to which syntactic category does each unit belong and which word order should be applied when this tree is transformed into an utterance. During interpretation, each word in the utterance introduces a new semantic entity. Based on the lexical aIdeally, the agents should come up with this division themselves as they try to reuse as much linguistic knowledge as possible. This is part of our future research agenda. bAt this moment the agents assume that each semantic entity is captured in exactly one unit.
37 categories of these entities, the hearer is able to add the layer of functional units. Finally, the information on the word order constraints augmented with the information of the syntactic categories allows the agent to select the right contextual rule which adds extra primitive constraints to the network, but more importantly also connects all primitive constraints by introducing variable equalities (for example between the first argument of FILTER-SET-PROTOTYPE and the second argument of SELECT-ELEMENTS) in the network shown in Figure 2. We have devised learning operators allowing both speaker and hearer to learn these divisions which are reported in more detail in Steels and Bleys (2007). For the sake of clarity we choose to focus on the construction of the syntactic category system and we abstract away implementation details which might distract the reader from the main hypothesis we are proposing, namely that recursive rules might emerge as a side-effect of agents trying to reuse as much of their previously constructed language knowledge as possible.
4. Three steps towards the emergence of recursive rules The construction of the system of syntactic categories is fairly simple: the agents try to reuse any syntactic category which would allow the reuse of a previously learned rule‘. If no such syntactic category is found, the agent decides to construct a new syntactic category. This basic mechanism results in a one-on-one mapping between syntactic and semantic categories at the lexical level, but at all other levels syntactic categories are only invented when needed and one cannot easily reconstruct a similar mapping unless one takes into account the linguistic development of each agent.
4.1. Starting from scratch The first time an agent has to expresshnterpret a semantic constraint network (similar to the one depicted in Figure 1, which could represent the semantics of a sentence like “ball”) it has no syntactic categories and hence it needs to invent two new syntactic categories. One specifies the syntactic association between the entity unit and the functional unit (e.g. noun), and the other one specifies the association between the functional unit and the contextual unit (e.g. noun-constituent). This kind of process is schematically shown on the left hand side of Figure 3. Let’s suppose the agent now has to expresshterpret a variation of this semantic constraint network in which the semantic entity is a prototype of a pyramid instead of one of a block. This provides a first opportunity for the agents to reuse a syntactic category because if the syntactic category of the entity unit for the pyramid would be identical to the one of the entity unit of the block (e.g. noun), CTechnically a syntactic category in a rule is first a variable which will get a binding if any other rule requires a specific syntactic category using the unification engine of FCG. This variable will be replaced by the value of this binding before it is added to the actual rule-set of the agent.
38 ................
:................ ?Y :-
,............... ?X : I?..! ...........
................
................ ?x:
j q
,...............
:................ ?x :+
Figure 3. On the left: invention of a new syntactic categories A, B and C. On the right: reuse of a syntactic category (A) as expected by the rule (B -+ A).
it would allow reuse of all the other syntactic categories (and rules) it constructed for the previous semantic constraint network. This process is schematised on the right hand side of Figure 3 and typically occurs at the level of syntactic categories linking entity units and functional units.
4.2. Substituting a primitive constraint Let’s now consider a semantic constraint network in which a primitive constraint that does not take a semantic entity as direct argument (e.g. UNIQUE-ELEMENT) is substituted by one that does so (e.g. SELECT-ELEMENTS) as illustrated in Figure 2. Let’s suppose this network corresponds to the semantics of a sentence like “a ball”. The contextual rule of the previous example is now useless as it contains a primitive constraint, namely UNIQUE-ELEMENT, that is not even part of the semantic constraint network at hand. The agents have to invent a new contextual rule, but not all hope is lost, because they can reuse every other previously introduced category (and rule) if they incorporate the syntactic category they previously used to associate the functional unit with the contextual unit (e.g nounconstituent). This process is shown in Figure 4, and typically occurs at the level of syntactic categories linking functional units and contextual units.
...........................
Figure 4. Invention of a new syntactic category (D) while reusing a previously learned syntactic category (B).
4.3. Adding a primitive constraint The final semantic constraint network we have to consider is one that is achieved by starting from the one we introduced in Figure 2 and adding an extra primitive constraint, FILTER-SET-CATEGORY, in between two existing ones, namely
39 FILTER-SET-PROTOTYPE and SELECT-ELEMENTS, which could represent the semantics of a sentence like “a big ball”. Using the same learning strategy as introduced in Section 4.2 the agents could learn a new contextual rule which combines three subunits into one new unit as shown in the middle of Figure 5.
Figure 5. To the left and middle: reuse of two previously known syntactic categories (B and D) and invention of a new one (F) in a similar fashion as in Section 4.2. To the right: another solution which additionally is capable of reusing the contextual rule introduced in Section 4.2 by adding a truly recursive rule (B --t FB).
But the agents can do better by exploiting another learning strategy which allows agents to combine any number of units into one unitd. In the example shown on the right of Figure 5 , the agents are able to come up with a rule that allows them to reuse the contextual rule introduced in Section 4.2. This particular rule combines two units, one belonging to syntactic category F (e.g. adjectiveconstituent) and the other to B (e.g. noun-constituent). The syntactic category to which this new combination unit should belong, determined by the deduction mechanism introduced at the beginning of Section 4, is syntactic category B (e.g. noun-constituent), as any other category would block the reuse of the contextual rule. As this syntactic category is equal to one of the rule’s constituents, this new rule is truly recursive.
5. Multi-agent simulation We briefly introduce the results of a multi-agent simulation in which our hypothesis is implemented in Figure 6. The complexity of the semantic constraint networks increases over time (depicted by the learning stage). Each increase in complexity introduces a period of stress in the communication system that is resolved (as shown by the communicative successe). Invention and agreement on the level of the word-semantic entity associations, and on the level of specific word order constrains, is shown in the overshoot and stabilisation of the lexicon size and the number of functional and contextual rules, respectively. dAt a semantic level, the rule introducing this unit should also take care of the necessary variable equalities between the primitive constraint in the two subunits it combines. eCommunicative success is defined as a game in which the hearer was able to achieve the communicative goal selected by the speaker.
The most important observation lies in the transitions in where the complexity of the semantic constraint network is increased but the language does not need to be expanded in order to deal with this extra complexity. The transition to learning stage 4 (between 7k and 8k games) in which an extra filtering operation is added to the semantic constraint networks the agents have to express, exhibits this phenomenon.
1
0.8
0.6
0.4
0.2
0 communicative success (left) lexicon size (right) functional rules (right)
-
------........
contextual rules (right) learning stage (rtght)
----
Figure 6 . Graph showing the basic measures for our multi-agent simulation. Bottom axis shows number of interactions; left axis shows scores between 0 and 1; right axis shows number of rules.
6. Conclusion We presented our main hypothesis, namely that hierarchical rules can become recursive as a side-effect of language users who try to reuse as much of their previously gained linguistic knowledge as possible. We supported the plausibility of our hypothesis by showing a clear analysis of how hierarchical rules can become recursive and by showing the results of a multi-agent experiment, which demonstrated that the rules learned by a population of agents are truly recursive. Although our simulation results depend on many other factors, like for instance the increase of semantic complexity and the specific structure of the semantic constraint networks we have used, we have provided a proof of concept of our main hypothesis. We have shown that it is adequate for the emergence of a recursive syntactic category system for hierarchical rules.
41
Acknowledgements I am indebted to my supervisor, professor Luc Steels, as he provided the main idea and hypothesis put forward in this paper. I am also grateful to all members of the Artificial Intelligence Laboratory at the Vrije Universiteit Brussel and the Sony Computer Science Laboratory in Paris for their insightful discussions and comments. I would like to thank the anonymous referees for their thoughtful reviews. Joris Bleys is financed through a fellowship of the Institute of the Promotion of Innovation through Science and Technology in Flanders (IWT-Vlaanderen). References Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In T. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models. Cambridge University Press. De Beule, J. (2008). The emergence of compositionality, hierarchy and recursion in peer-to-peer interactions. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), The evolution of language: Evolang 7 (pp. 75-82). World Scientific. Dowty, D. R., Wall, R., & Peters, S. (1981). Introduction to Montague semantics. D. Reidel Publishing Company. Goldberg, A. E. (2003). Constructions: a new theoretical approach to language. Trends in Cognitive Sciences, 7(5),219-224. Steels, L. (2000). The emergence of grammar in communicating autonomous robotic agents. In W. Horn (Ed,), ECAZ2000 (pp. 764-769). Amsterdam: 1 0 s Press. Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences, 28,469-529. Steels, L., & Bleys, J. (2005). Planning what to say: Second order semantics for Fluid Construction Grammars. In A. B. Diz & J. S. Reyes (Eds.), Proceedings of CAEPZA '05. Lecture Notes in AZ. Berlin: Springer Verlag. Steels, L., & Bleys, J. (2007). Emergence of hierarchy in Fluid Construction Grammar. In Proceedings of the SLEA workhop at the 9th European conference on Artificial Life, Van den Broeck, W. (2008). Constraint-based compositional semantics. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), The evolution of language: Evolang 7 (pp. 338-345). World Scientific. Wellens, P. (2008). Coping with combinatorial uncertainty in word learning: A flexible usage-based model. In A. D. M. Smith, K. Smith, & R. Ferrer i Cancho (Eds.), The evolution of language: Evolang 7 (pp. 370-377). World Scientific.
UNRAVELLING THE EVOLUTION OF LANGUAGE WITH HELP FROM THE GIANT WATER BUG, NATTERJACK TOAD AND HORNED LIZARD
RUDOLF BOTHA University of Stellenbosch. Stellenbosch 7600, South Africa and Institute of Linguistics. Utrecht Universicv. Janskerkhof 13. 3521 BL Utrecht, the Netherlands
1. On a field that is “going places”
Consider the following comments on the state in which the field of language evolution finds itself at present:
... the field of language evolution finally emerged from its long hiatus us a legitimate urea of scientific enquiry during the last decade of the twentieth century. (Christiansen and Kirby 2003:300) ... the scientific study of language evolution seems to be coming of age. (Fitch 2005: 194) ... the evolution of language represents an exciting and rupid!v growing field. (Fitch 2005: 222) . . . what is perhaps most noticeable is the growing maturity of this rupidtv developingfield. (Tallerman 2005: 1) But the field has certain advantages. Whether we like it or not, it’s going places. (Bickerton 2007a: 524) These comments paint a distinctively positive picture of the present state of the field. Such comments don’t come as bald pronouncements, of course: they are normally accompanied by supporting observations and they are often qualified in explicit terms. The qualifications refer to what are portrayed as obstacles, challenges or unsolved problems. Some o f these are quite formidable on construals such as that by Derek Bickerton (2007a, 2007b). But the gist of comments such as those quoted above is that in modern work on language evolution significant progress has been 42
43 made in unravelling this phenomenon. If this is so, a natural question is: What needs to be done to achieve further significant progress? Various answers to this question can be found in the literature, many of which reflect what the above-mentioned obstacles, challenges or unsolved problems are taken to be. Thus, the steps that are needed for making the desired further progress include the following: Proceeding from a sound understanding of what language is. (Bickerton 2003: 79-80; Christiansen and Kirby 2003:14 ) Giving up key dogmas in linguistics and other fields such as archaeology. (Newmeyer 2003: 73-75) Proceeding from more adequate analyses of the aspects of language (structure) whose evolution is at issue. (Bickerton 1998: 252ff.; Bickerton 2003: 87-88) Extending the body of (robust) evidence bearing on core questions about language evolution. ( cf. Botha 2003: 4-5 for references) Making sure that accounts of language evolution are compatible with tenets of evolutionary theory. (Tallerman 2005: 2) Developing further an integrated theoretical framework (both conceptual and mathematical) capable of encompassing the many complex issues that an eventual theory of language evolution must resolve. (Fitch 2005: 222) Avoiding the unproductive bickering that often still mars the discussion. (Fitch 2005: 222). I will argue in my paper that many of the required steps identified in the literature reflect a fundamental condition which can be stated as follows: Accounts of language evolution need to be underpinned in a more overt (2) and systematic way by pertinent theory. This condition, I will show, is necessary for constraining the arbitrariness of such accounts. And in support of (2), I will cite potentially interesting accounts which lack overt theoretical sustenance. These accounts include claims which in skeletal form can be stated as follows:
Phenomenon A is a precursor to languagela component of language. Phenomenon B is a remnant of A . Phenomenon C is a linguistic fossil. Feature D of language/of a precursor to language is an exaptation. Feature E of language shows signs of complex design. ProcessiMechanism F played a role in the evolution of language. G was a stage in the evolution of language. Property H represents the hallmark of full/modern language. Property I is not unique to language. Data about phenomenon J provide evidence foriagainst facet K, L or M of language evolution. Since it is not possible to discuss all of these claims in the available time, I will focus on (3) (d), (9,(g), and (j),building on work that has been reported elsewhere (Botha 2003, 2006a, 2006b, 2007, to appear a, to appear b, to appear c, to appear d). For purposes of illustration, I present below the outlines of my discussion of one of these skeletal claims, namely (3) (d): Feature D of language/of a precursor to language is an exaptation. 2. Enter the bugs and the beasts
On various recent accounts of language evolution, some feature of language has the evolutionary status of an exaptation. Thus, included in a recent OUP volume, there are various chapters - e.g., Arbib (ZOOS), McDaniel (2005), and Carstairs-McCarthy (2005) - in which “processes of exaptation are explicitly discussed or implicitly assumed”, according to the editor, Maggie Tallerman (20055). For instance, on Michael Arbib’s (2005: 34) account of the brain mechanisms that gave humans a language-ready brain, these mechanisms evolved from the mirror system for grasping in the common ancestor of monkey and human. That is, as phrased by Tallerman (2005: 5), “A brain area known as F5 in non-human primates is known to regulate the manual grasping of objects, and this area is known to be homologous (i.e. it has the same evolutionary origin) as part of the human Broca’s area: thus, neural structure with one function may have been exapted for (aspects) of language”. The question, now, is: What are the requirements which an exaptationist account of a feature of language needs to meet? This question cannot be answered in a satisfactory way without drawing on a non-ad hoc theory of what exaptations, as
45 distinct from other products of evolutionary processes, are. On Stephen Jay Could’s (1991 : 43) theory, an exaptation is “. . . a feature now useful to an organism, that did not arise as an adaptation for its present role, but was subsequently co-opted for its present function”. In the case of a first type of exaptations, the co-opted features arose initially through natural selection; in the case of a second type, the co-opted features were by-products of adaptive processes (Could and Vrba 1982: 5-7; Could 1991: 53). Elaborating on these definitions, Buss et al. (1998) and Andrews et al. (2002) show that an exaptationist account of the evolution of a feature needs to satisfy two categories of requirements. The first bears on the content of such an account. Thus, according to Buss et al. (1998: 542), an exaptationist account needs to specify: (4) (a) The features that got co-opted, indicating whether they were originally adaptations or by-products. (b) The causal mechanisms - e.g., natural selection - responsible for the co-opting. (c) The new (biological) function of the co-opted features, i.e., the manner in which they contribute to the solution to an adaptive problem. Requirements of the second category incorporate the evidentiary and other scientific standards that the claims making up an exaptationist account need to meet. Considered of particular importance is the requirement that, for a feature to be able to be assigned the status of an exaptation. it must first be shown not to be an adaptation (Danemiller 2002: 512; Delporte 2002: 514; Jones 2002: 251). In this connection, Buss et al. (1998: 542) observe that “ ... the mere assertion that this or that characteristic is an exaptation encounters the same problem that Could ( 1 991) leveled against adaptationists - the telling of ‘just-so’ stories”. So, judged by requirements at issue, how good are exaptationist accounts such as those included in the volume introduced by Tallerman (2005)? The short answer is that these accounts turn out to be incomplete. That is, these accounts arc found not to:
(5)
(a) (b) (c)
include all the claims which exaptationist accounts should express in terms of requirements (4)(a)-(c). give cxplicit consideration to the evidcntiary standards that evolutionary accounts need to meet in order to qualify as “exaptationist”. include evidence for all the claims expressed explicitly or implicitly by them, e.g., the claim that the feature at issue originated as an exaptation and not an adaptation or a by-product.
46
Another observation may be added here: these accounts typically are not framed in terms of clearly defined technical notions such as “exapt/exaptation” and “COopt”; instead, they tend, to use vague notions such as “evolved from”, “provide the scaffolding for”, “seized upon”, “is a natural precursor”, “overlaid on”, “(developed) on top o f ’ and the like. This further observation, along with those stated as (5)(a)(c), apply more widely than just to the expositions included in the Tallerman volume. So much is clear from an examination of other publications by Arbib (Arbib 2005, 2006; Rizzolatti and Arbib 1998) and Carstairs-McCarthy (CarstairsMcCarthy 1999; Botha 2003: 81-91). In addition, various exaptationist accounts by other scientists have also been found to be incomplete and/or inexplicit in the relevant ways (Botha 2003: 68-81). The findings (5)(a)-(c) may be questioned from a number of perspectives, including the following one: (6) Requirements such as those considered above for exaptationist accounts may sound fine at the level of metascientific analysis but they are unrealistically restrictive at the level of “real work”. Defenders of this consideration would find themselves, to their dismay (no doubt), having to contend with creatures such as giant water bugs, natterjack toads, and horned lizards. That is, (6) is at odds with “real work” that has been done on the origin of the behaviour of giant waterbugs referred to as “flash flood escape” (Lytle and Smith 2004), on the origin of salinity tolerance in natterjack toads (GomezMestre and Tejedo 2005), and on the origin of the long horns of many species of horned lizards (Young et al. 2004; Agosta and Dunham 2004; Brodie et al. 2004a; Brodie et al. 2004b; Fouts 2004; Christy 2004). That is, in the course of some “real work”, scientists have in a deliberate way constructed and appraised exaptationist accounts of the origins of these phenomena. In so doing, these scientists have considered in an explicit way the components that such accounts should have, the evidentiary standards that such accounts should meet, and the pertinence and accuracy of the evidence that has been furnished -both for and against. But there is another consideration which might be invoked in attempting to set aside findings (5)(a)-(c): Exaptationist accounts of features of language cannot be expected to (7) meet the requirements that apply to exaptationist accounts of nonlinguistic biological phenomena. This consideration may be referred to as the “Reversed Bifurcation Thesis”, since it expresses an idea which, in a sense, is the opposite to the “Bifurcation
47
Thesis” attributed by Noam Chomsky (1980: 16) to Donald Hockney. Rephrased suitably, this thesis says that: (8) Theories in the domain of language and mind need to meet requirements dismissed (as too strict) in the natural (and biological) sciences. (Incidentally, the Bifurcation Thesis was rejected by Chomsky and Hockney.) The Reversed Bifurcation Thesis might appear “obvious” at first blush. But, to give it substance, it needs to be formulated more precisely and it requires proper justification. Specifically, this thesis - or some other version of it - cannot be simply stipulated: principled reasons for adopting it need to be given. Otherwise, the thesis would be no more than a protective device, making it impossible to subject exaptationist accounts of features of language to critical appraisal. In justifying the thesis, questions such as the following would have to be addressed: What are the minimum requirements which such accounts need to (9) (a) meet? (b) Why should these requirements be considered sufficient for work in a “legitimate area of scientific enquiry”, in a field that “seems to be coming of age” or in one that is characterized by “growing maturity”? To what extent would exaptationist accounts be non-arbitrary if they (c) met these requirements? In sum: Exaptationist accounts of the evolution of properties of language need to be underpinned by a theory of exaptation which is principled and from which, therefore, appropriate standards of adequacy for such accounts can be derived. Only such a theory can determine what kinds of evidence in principle do or do not bear on the claims expressed by such accounts. References
Agosta, S.J. and Dunham. A.E. (2004). Comment on “How the horned lizard got its horns”. Science, 306, 230. Andrews, P.W., Gangestad, S.W. and Matthews, D. (2002). Adaptationism - how to carry out the exaptationist program. Behavioral and Brain Sciences, 25, 489-553. Arbib, M.A. (2005). From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics. Behavioral and Brain Sciences, 28, 105- 167.
Arbib, M.A. (2006). The Mirror System Hypothesis on the linkage of action and languages. In M A . Arbib (Ed.), Action to language via the mirror neuron system (pp. 3-47), Cambridge University Press. Bickerton, D. (1998). Catastrophic evolution: the case for a single step from protolanguage to full human language. In J.R. Hurford, M. Studdert-Kennedy, and C. Knight (Eds.), Approaches to the evolution of language (pp. 341-358), Cambridge: Cambridge University Press. Bickerton, D. (2003). Symbol and structure: a comprehensive framework for language evolution. In M.H. Christiansen and S. Kirby (Eds.), Language evolution (pp. 77-93), Oxford: Oxford University Press. Bickerton, D. (2007a). Language evolution: A brief guide for linguists. Lingua, I 17, 510-526. Bickerton, D. (2007b). Review of M. Tallerman (Ed.), Language origins: perspectives on Evolution. Oxford: Oxford University Press. 2005. Journal of Linguistics, 43, 259-264. Botha, R. (2003). Unravelling the evolution of language. Amsterdam: Elsevier. Botha, R. (2006a). On the Windows Approach to the evolution of language. Language & Communication. 26, 129- 143. Botha, R. (2006b). Pidgin languages as a putative window on language evolution. Language & Communication, 26, 1-14. Botha, R. (2007). On homesign systems as a potential window on language evolution. Language & Communication, 2 7, 4 1-53. Botha, R. (to appear a). Prehistoric shell beads as a window on language evolution. Language & Communication, 27. Botha, R. (to appear b). On modeling prelinguistic evolution in early hominins. Language 6; Communication, 28. Botha, R. (to appear c). On musilanguagel’Hmmmmm’ as an evolutionary precursor to Language. Language & Communication, 28. Botha, R. (to appear d). Theoretical underpinnings of inferences about language evolution: the syntax used at Blombos Cave. In R. Botha and C. Knight (Eds.), The Cradle of Language, Oxford: Oxford University Press. Brodie 111, E.D., Young, K.V., and Brodie Jr.,E.D. (2004a). How did the horned lizard get its horns? Science, 305, 1909-1 9 10.
49
Brodie 111, E.D., Young, K.V., Brodie Jr., E.D. (2004b). Response to comment on “How the horned lizard got its horns”, Science, 306, 230.
Buss, D.M., Haselton, M.G., Schakleford, T.K., Bleske, A.L., and Wakefield, J.C. ( 1 998). Adaptations, exaptations and spandrels. American Psychologist, 53, 533548.
Carstairs-McCarthy, A. (1 999). The origins of complex language: an inquiry into the evolutionary beginnings of sentences, syllables, and truth. Oxford: Oxford University Press. Carstairs-McCarthy, A. (2005). The evolutionary origin of morphology. In M. Tallerman (Ed.), Language origins: perspectives on evolution (pp. 166-1 84), Oxford: Oxford University Press. Chomsky, N. (1980). Rules and representations. New York: Columbia University Press. Christiansen, M.H., Kirby, S. (2003). Language evolution: consensus and controversies. TRENDS in Cognitive Sciences, 7, 300-307. Christy, J.H. (2004). Letter on “How did the horned lizard get its horns?”. Science, 305, 1909. Danemiller, J.L. (2002). Lack of evidentiary criteria for exaptations. Behavioral and Brain Sciences, 25, 5 11-5 12. Delporte, P. (2002). Phylogenetics and the exaptationist programme. Behavioral and Brain Sciences, 25, 5 14. Fitch, W.T. (2005). The evolution of language: a comparative review. Biology and Philosophy, 20, 193-230. Fouts. W.R. (2004). Letter on “How did the horned lizard get its horns?’. Science, 305, 1909. Gomez-Mestre, I. and Tejedo, M. (2005). Adaptation or exaptation? An experimental test of hypotheses on the origin of salinity tolerance in Bufo calamita. Journal ofEvolutionaty Biology, I S , 847-855. Gould, S.J. (1 991). Exaptation: a crucial tool for evolutionary psychology. Journal of Social Issues, 47, 43-65. Gould, S.J. and Vrba, E.S. (1982). Exaptation: a missing term in the science of form. Paleobiology, 8, 4-15.
50 Jones, O.D. (2002). Allocating presumptions. Behavioral and Brain Sciences, 25, 251. Lytle, D.A. and Smith, R.L. (2004). Exaptation and flash flood escape in the giant water bugs. Journal of Insect Behavior, 17, 169- 178. McDaniel, D. (2005). The potential role of production in the evolution of syntax. In M. Tallerman (Ed.), Language origins: perspectives on evolution (pp. 153-165), Oxford: Oxford University Press. Newmeyer, F. J. (2003). What can the field of linguistics tell us about the origins of language? In M.H. Christiansen and S. Kirby (Eds.), Language evolution (pp. 5876), Oxford: Oxford University Press. Rizzolatti, G. and Arbib, M.A. (1998). Language within our grasp. Trends in Neurosciences, 2 I , 188-194. Tallerman, M. (2005). Introduction: language origins and evolutionary processes. In M. Tallerman (Ed.), Language origins: perspectives on evolution (pp. 1-10), Oxford: Oxford University Press. Young, K.V., Brodie Jr., E.D., Brodie 111, E.D. (2004). How the homed lizard got its horns. Science, 304, 65.
LINGUISTIC ADAPTATIONS FOR RESOLVING AMBIGUITY
TED BRISCOE AND PAULA BUTTERY
Computer Laboratory and Research Centre for English and Applied Linguistics University of Cambridge We motivate a model of human parsing and ambiguity resolution on the basis of psycholinguistic and typological data. Analysis of spoken and written corpora suggests that ambiguity is a factor in the choice of relativization strategy for English and supports the model’s predictions. Within an evolutionary account of language, we predict that languages will adapt over time so that prosodic and syntactic systems are organised to minimize processing cost according to this model.
1. Introduction We present evidence that, for English, ambiguity is an active factor in the choice of relativization strategy and that, in speech, prosody plays a role in resolution of ambiguity over the internal role of the relativized constituent. The evidence is based on (semi-)automatic analysis and comparison of automatically-parsed written and spoken portions of the British National Corpus (BNC, Leech, 1992) and of the prosodically-transcribed Spoken English Corpus (SEC, Taylor & Knowles, 1988). The results are evaluated with respect to a model of parsing complexity and syntactic disambiguation (Briscoe, 1987, 2000) building on Combinatory Categorial Grammar (Steedman, 2000) and this model is in turn motivated by an evolutionary account of linguistic coevolutionary adaptation of the syntactic and phonological prosodic systems to a solution which minimizes processing cost. To our knowledge this is the first work which investigates linguistic adaptations aimed at reducing ambiguity while making testable predictions about linguistic organization.
2. Psycholinguistic Data It is well known that subject relative clauses (SRCs), where the relativized constituent is internally a subject (see (la)) are less complex than non-subject ones (NSRCs),such as (lb)). (1) a The guy whohhat likes me just smiled b The guy who/that/O I like e just smiled 51
52
This is explained by sentence complexity metrics which incorporate some notion of locality between ‘filler’ and ‘gap’ (Gibson, 1998; Hawkins, 1994, 2004) We use filler to refer to the relative pronoun, if present, or the nominal head modified by the NSRC. We use gap to refer to the canonical position of the tiller in the NSRC - e.g. wholguy and e respectively in (2b)). In addition, NSRCs exhibit unbounded dependencies, which are also known to be both potentially highly ambiguous (Church, 1980) and psycholinguistically complex (Gibson, 1998). (2a) and (2b) illustrate that NSRCs can contain multiple ambiguous gaps (e?) with unbounded material between filler and gap, and between ambiguous gaps.
(2) a The guy who I think you want e? to succeed e? just smiled b The guy who I want e? to think that the boss will succeed e? just smiled The psycholinguistic consensus is that there is a parsing preference for early potential gaps because reading times after potential gap positions are slowed if the gap is filled locally or if the filler is semantically implausible (Stowe, 1986). Gibson (1998) argues that a locality-based complexity metric predicts this result if the human parser chooses the least complex analysis, when lexical frequency or semantic plausibility considerations do not dictate otherwise. (3a) is a mild garden path, probably because want occurs five times more often with VPinf than NP+VPinf complementation.a Certainly, if we substitute ask, as in (3b), which exhibits a far stronger preference for NP+VPinf complementation, then the effect disappears. (3) a The guy who I wanted to give the present to Sue refused b The guy who I asked to give the present to Sue refused In (4a) and (4b), there are clear garden path effects for most readers when the actual gap at the end of the RCs is incorrectly filled by three books. (4) a I gave the guy who I wanted to give the books to three books b I wouldn’t give the guy who was reading three books Once again, the frequency-based lexical preference for no direct object with want, and the fact that read is used transitively almost twice as often as intransitively might explain these preferences, overriding any (default) structural preference for the first possible gap. However, as succeed occurs about 4.5 times more often intransitively than transitively, frequency effects in (2a) between succeed and want are in conflict. Early resolution of the ambiguity at the point of the first potential aThis and the following estimates of the relative frequency of subcategorization frames are based on the VALEX lexicon (Korhonen, Krymolowski, & Briscoe, 2006).
53 gap and before the second verb has been processed therefore predicts at least an initial preference for the late gap attachment, but the preferred interpretation is for the early gap with succeed interpreted intransitively as ‘win’. The lack of an apparent garden path effect here is unexplained under the Gibson/Stowell account.
3. ’Qpological Data Moving from psycholinguistic preferences of interpretation to typology, Hawkins (1994, p323f) explains the crosslinguistical lack of initial subordinators in prenominal RCs by arguing that the advantage of marking the onset of the embedded clause is offset by the remaining ambiguity over whether the embedded clause is a sentential complement or RC. Kuno (1974) considers the unattested strategy of marking both boundaries of RCs with subordinators and suggests this is dispreferred because it leads to patterns of unbounded nested dependencies similar to those in centre-embedded constructions. In the CCG model, placement of a single subordinator at the opposite end of the RC to the modified head creates equivalent complexity via creation of an additional unbounded dependency, if the subordinator must be syntactically linked to the head (i.e. has a CCG category like (N/N)/(SIXP)). Thus under our account of complexity (or that of Gibson or Hawkins), this is a non-optimal strategy for resolving such potential ambiguity. In English, this strategy applied to (4b) might look like (Sa) where an additional subordinator tath occurs at the right boundary of the RC.
(5) a I wouldn’t give the guy who was reading tath three books b I wouldn’t give the guy who was reading three books tath another then this If tath is the mirror image of that and has CCG category (SIXP)\(N/N) blocks any local ambiguity concerning the correct role of three books as illustrated in (5b), but it also increases the syntactic complexity of RCs potentially unboundedly by introducing an additional syntactic dependency between it and the head of the relative, guy here. Thus, there is a trade-off between resolving ambiguities syntactically and the overall syntactic complexity of RC constructions. 4. The Role of Prosody
In both Japanese prenominal RCs and English postnominal RCs there is evidence that in speech the RC boundary at the opposite end to the head is often marked by a prosodic boundary (PB, often a major tone group 1 intonational phrase boundary, but possibly a minor / intermediate one; Venditti, Jun, & Beckman, 1996). Assuming the human speech processor generates a metrical analysis of the input independently of the parser, but the latter can take account of extrasyntactic information, the alignment of PBs with syntactically unmarked RC boundaries provides an efficient means for languages to mark the other RC boundary. Warren (1999) reviews psycholinguistic evidence that PBs are exploited by the human
54 parser to resolve syntactic indeterminacies, and Nagel, Shapiro, and Nawy (1994) argue that actual gaps are always marked by PBs. Thus, (5a) and (5b) would both be resolved in speech by the occurrence of a PB as indicated by (1 I) in (6a) and (6b).
(6)
a I gave the guy who I wanted to give the books to I I three books b I wouldn’t give the guy who was reading I I the book
A potential problem, though, as Straub, Wilson, McCollum, and Badecker (200 1) show, is that intonational /major PBs occur at the end of NSRCs and not medially, as would be required in one interpretation of (3) and in (4). However, Cooper and Paccia-Cooper (1980) and Warren (1985) provide some evidence from sentence production experiments that minor / intermediate boundaries, marked principally by syllable-lengthening, occur on the predicate preceding medial gaps in NSRCs as in (7a) versus (7b). (7)
a The guy who I want I to succeed I I just smiled b The guy who I want to succeed I I just smiled c The guy who I wanna succeed I I just smiled
The lack of the medial PB when the actual gap is later licenses optional cliticization of to or reduction to wanna as in (7c) but blocks it in (7a) in the metrical framework assumed here, subsuming this well-known phenomenon into a more general account of ambiguity resolution.
5. TheModel We can account for the data discussed above in a model which integrates CCG with a (1,])bounded-context parser which embodies default structural preferences for late closure and late gaps via a preference for shift over reduce whenever both parsing actions are possible in the current context, but which uses lexical frequency, semantic plausibility or prosodic information to override this preference at the point when the parsing indeterminacy arises. Figure I illustrates the state of the parser at the onset of the shift-reduce conflict for (7). The relative pronoun in cell 2 can be combined with the constituent in cell 1 (forward composition), but the lookahead item can be combined (forward composition) with the constituent in cell 1, so shift is preferred. However, either a lexical preference for the (S/NP)NP category for you want and/or a PB marked by lengthening of want could override the default parse action and force the early gap interpretation. The complexity and ambiguity metric is given in Figure 2. For the configuration in Figure 1, ignoring earlier material, the cost associated with cell 1 is 4 (3 shifts and one reduce to reach this state), and that with cell 2 is 2 (reset after the previous reduce action to 1 multiplied by the 2 CCG categories).
55
Stack Cells
2
1
(who) S/(S/NP)
(I want) (S/NP)NP SNP
Lookahead
Input Buffer
L to VPNP
succeed
Figure 1. Shift-reduce Conflict for (7))
After each parse action (Shift, Reduce, Halt): 1. Assign any new Stack entry in the top cell (introduced by Shift or Re-
duce) a cost of 1 multiplied by the number of CCG categories for the constituent it represents 2. Increment every Stack cell’s cost by 1 multiplied by the number of
CCG categories for the constituent it represents 3. Push the sum of the current costs of each Stack cell onto the Cost-
record When the parser halts, return the sum of the Cost-record which gives the total cost for a derivation. Figure 2. The Cost Algorithm
Similarly to the metrics of Hawkins (1994,2004) and Gibson (1998), the cost metric represents the load on working memory during language processing and predicts that costs increase with the length of grammatical dependencies and with the degree of ambiguity (i.e. the numbers of putative dependencies within a sentence) up to the point where extrasyntactic information can be deployed to resolve them (see Briscoe, 1987, 2000 for more details). However, the parser’s default preferences (contra Gibson) select analyses whjch increase stack-depth and hence complexity. That is, in the absence of extrasyntactic information that a potential gap is the actual gap, the parser delays attachment. This strategy actually reduces processing cost provided that language is organized to override parsing defaults when they lead to the wrong analysis. So the model places adaptive pressure on grammatical systems to evolve in such a way that PBs (andor lexical and semantic information) are available at the onset of ambiguities which require non-default interpretations. The method of integration of PBs into the analysis makes different predictions from that of (Steedman, 2000), as it relates PBs to parse actions not to CCG categories. For instance, the ‘adverbial’ category, Steedman associates with PBs would not block combination of you want and to... in Figure I , as required for
56
the analysis of you want simply to.... Our model predicts that the placement of PBs is mediated as much by ambiguity resolution as by structural and informational mapping constraints per se and thus departs from the dominant tradition (e.g. Selkirk, 1984), which Steedman follows. Where such constraints underdetermine the location of PBs, it makes more fine-grained and correct predictions (see also the experiments reported in Snedeker and Trueswell (2003).
6. CorpusAJsage-based Predictions
Our model predicts a complexity hierarchy of (SRCs < NSRCs) < (unambiguous NSRCs < ambiguous NSRCs) A (short NSRCs < long NSRCs), and thus that in speech NSRCs will mark an actual gap with a PB, particularly if it is ambiguous and not resolvable given effects of local semantic plausibility, lexical frequency or parsing preferences, and that in writing the lack of PBs may lead to avoidance of ambiguous NSRCs unresolved semantically or contextually. These predictions differ from and are more fine-grained than the general observations that written language is complex than speech (e.g. Biber, 1988). We tested them by automatically extracting RCs from parsed versions of the BNC and SEC corpora, by automatically categorizing wh-RCs into SRCsNSRCs, and manually analysing samples of that(-less) RCs, as well as the correlation of PBs with gaps in NSRCs in the SEC. We found a lower ratio of RCs overall and of NSRCs to SRCs in speech (1:4.34 wrt. vs. 1:6.24 spk., signif. x2 p M 0), as might be expected. However, the proportion of ambiguous to unambiguous (single verb) NSRCs was identical (1 :7.88 wrt vs. 1:7.89 spk. non-signif. x2 p M 1) and though longer NSRCs containing longer intervening NPs, parentheticals, and so forth occur in writing (e.g. (8a)) the mean word length of RCs was not significantly different (6.35 wrt., 6.19 spk., z-score p = 0.8). These results suggest that there is no reduction on complexity of NSRCs in speech compared to writing. (8) a The business that JR, director...of restructuring at M, sees e as promising b where there are limited reserves I of some non-renewable resource I as . . . Ambiguous medial gaps in NSRCs in the SEC are not marked with PBs where this would lead to the wrong interpretation (35 were found, 32 have no following PB, 3 are marked by minor PBs but these occur in wh-adverbial RCs like (8b) in which the CCG analysis predicts early ‘non-configurational’ attachment to the verb - e.g. Pickering & Barry, 1991). Actual but ambiguous medial gaps are marked with minor PBs and RC-final ambiguous gaps are marked with major ones (40 were found, 39 were followed by PBs in the annotation, leaving one putative counter example which may be an annotation error). These results suggest that ambiguity reduction and prosodic disambiguation play a role in the form of NSRCs observed in speech and writing. The fact that syntactically ambiguous
57
NSRCs occur with equal frequency in writing and speech suggests that in writing there must be a greater reliance on contextual or semantic resolution of ambiguity in the absence of PBs, and this needs investigating further.
7. Discussion and Conclusions Language interpretation involves a decoding step, based on properties of meaning conveyed grammatically, and an inferential step which further constrains and refines meaning by integrating contextual and background knowledge. In general, there is a trade-off between these two steps where more coding usually leads to increased articulatory or production costs, while less coding increases ambiguity and requires a greater degree of inference. We have argued that enriched syntactic encoding in RC constructions to remove some ambiguities would lead to increased processing complexity. However, a strategy of parallel encoding of the same information in the prosodic system (required independently as a component of speech processing) achieves the same effect with little additional cost during the decoding step. It only requires that the parser have access to the location of PBs at the onset of a syntactic ambiguity. This allows parsing to proceed nearlydeterministically, reducing the costs of ambiguity without increasing the need for inference. The evidence reviewed here from psycholinguistic work, typological work and the novel corpus-based investigations we report suggests that human language processing does incorporate default syntactic ambiguity resolution strategies, that these can be overridden by extrasyntactic information, including PBs at the onset of ambiguities, and that language usage does support the model in that PBs do occur in speech in the predicted locations, and written and spoken usage does reflect the predicted cost hierarchy. Briscoe (2000) demonstrates that if a cost algorithm very similar to that of Figure 2 is incorporated into a simulation of language evolution, then languages adapt to reduce syntactic complexity in a manner which predicts many well-known typological universals. A version extended with this cost metric would show that languages will adapt to align prosodic and syntactic information to reduce ambiguity. There may be other models that make similar predictions, but these will need to emphasise the causal role of ambiguity much more than other extant models.
References Briscoe, E. (1987). Modelling human speech comprehension: a computational approach. Wiley. Briscoe, E. (2000). Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device. Language, 76(2),245-296. Church, K. (1980). On memory limitations in natural language processing, Indiana University Linguistics Club.
58
Cooper, W., & Paccia-Cooper, J. (1980). Syntax and speech. Harvard University Press. Gibson, E. (1998). Linguistic complexity: locality of syntactic dependencies. Cognition, 68, 1-76. Hawkins, J. (1994). A performance theory of order and constituency. Cambridge: Cambridge University Press. Hawkins, J. (2004). Eficiency and complexity in grammars, Oxford: Oxford University Press. Korhonen, A., Krymolowski, Y.,& Briscoe, E. (2006). A large subcategorization lexicon for natural language processing applications. In Proceedings of the Fourth International Conference on Language Resources and Evaluation. Genova, Italy. Kuno, S. (1974). The position of relative clauses and conjunctions. Linguistic Inquiry, 5,117-136. Leech, G. (1992). 100 million words of english: the british national corpus. Language Research. Nagel, H., Shapiro, L., & Nawy, R. (1994). Prosody and the processing of fillergap sentences. Journal of Psycholinguistic Research, 23,473485. Pickering, M., & Barry, G. (199 1). Sentence processing without empty categories. Language and Cognitive Processes, 6,229-259. Selkirk, E. (1984). Phonology and syntax: the relation between sound and structure. MIT Press. Snedeker, J., & Trueswell, J. (2003). Using prosody to avoid ambiguity: effects of speaker awareness and referential context. Journal of Memory and Language, 48, 103-130. Steedman, M. (2000). The syntactic process. MIT Press. Stowe, L. (1986). Evidence for on-line gap location. Language and Cognitive Processes, I , 227-245. Straub, K., Wilson, C., McCollum, C., & Badecker, W. (2001). Prosodic structure and wh-questions. Journal of Psycholinguistic Research, 30, 379-394. Taylor, L. J., & Knowles, H. (1988). Manual of information to accompany the sec corpus. Venditti, J., Jun, S., & Beckman, M. (1996). Prosodic cues to syntactic and other linguistic structures in japanese, korean and english. In J. Morgan & K. Demuth (Eds.), Signal to syntax. Lawrence Erlbaum. Warren, P. (1985). The temporal organisation and perception of speech. Unpublished doctoral dissertation, Department of Linguistics, University of Cambridge. Warren, P. (1999). Prosody and language processing. In S. Garrod & M. Pickering (Eds.), Language processing. Psychology Press.
MODELLING LANGUAGE COMPETITION: BILINGUALISM AND COMPLEX SOCIAL NETWORKS
XAVIER CASTELLO, VICTOR M. EGUfLUZ, and MAXI SAN MIGUEL 1F1SC, Institut de Fisica lnterdisciplinaria i Sistemes Complexos (CSIC-Universitat de les llles Balears), Palma de Mallorca, E-07122, Spain [email protected]
LUCiA LOUREIRO-PORT0 Dep. Filologia Espanyola. Moderna i Llatina, Universitat de les llles Balears, Palma de Mallorca, E-07122, Spain [email protected]
RIITTA TOIVONEN, J. SARAMAKI and K. KASKI Laboratory of Computational Engineering, Helsinki University of Technology, Helsinki, PO.Box 9203, 02015 HUT Finland [email protected] In the general context of dynamics of social consensus, we study an agent based model for the competition between two socially equivalent languages, addressing the role of bilingualism and social structure. In a regular network, we study the formation of linguistic domains and their interaction across the boundaries. We also analyse the dynamics on a small world network and on a network with community structure. In all cases, a final scenario of dominance of one language and extinction of the other is obtained (dorninunce-e~rincrionstare). In comparison with the regular network, smaller times for extinction are found in the small world network. In the network with communities instead, the average time for extinction does not give a characteristic time for the dynamics, and metastable states are observed at all time scales.
1. Introduction Language competition occurs today worldwide. Different languages coexist within many societies and the fate of a high number of them in the future is worrying: most of the 6000 languages spoken today are in danger, with around 50% of them facing extinction in the current century. Even more striking is the distribution of speakers, since 4% of the languages are spoken by 96% of the world population, while 25% have fewer than 1000 speakers. New pidgins and Creoles are also emerging, but their number is relatively small compared with the language loss rate (Crystal, 2000). In this scenario, and beyond Weinreich 's Languages in 59
60 Contact (Weinreich, 19.53), numerous sociolinguistic studies have been published in order to: (1) reveal the level of endangerment of specific languages (Tsunoda,
2005); (2) find a common pattern that might relate language choice to ethnicity, community identity or the like (0’Driscoll, 2001); and (3) claim the role played by social networks in the dynamics of language competition, which has given rise to a monographic issue of the International Journal of Sociology (De Bot & Stoessel, 2002a). Abrams and Strogatz model for the dynamics of endangered languages (Abrams & Strogatz, 2003) has triggered a coherent effort to understand the mechanisms of language dynamics outside the traditional linguistic research. Their study considers a two-state society, that is, one in which there are speakers of either a language A or a language B. This seminal work belongs to the general class of studies of population dynamics based on nonlinear ordinary differential equations for the populations of speakers of different languages. In addition, other studies implement discrete agent based models with speakers of many languages (Stauffer & Schulze, 2005) or few languages (Stauffer, Castello, Eguiluz, & San Miguel, 2007), as reviewed in (Schulze & Stauffer, 2006). Language competition, then, belongs to the general class of processes that can be modelled by the interaction of heterogeneous agents as an example of collective phenomena in problems of social consensus (San Miguel, Eguiluz, Toral, & Klemm, 2005). In this respect, a specific feature of language dynamics is that agents can share two of the social options that are chosen by the agents in the consensus dynamics. In the present work, these are the bilingual agents, that is, agents that use both language A and B, who have been claimed to play a relevant role in the evolution of multilingual societies (Wang & Minett, 200.5). In this work we are interested in the emergent phenomena appearing as a result of a self-organised dynamics in the case of two equally prestigious competing languages. With the aim of elucidating possible mechanisms that could stabilise the coexistence of these languages, we wish to discuss the role of bilingual individuals and, following Milroy, the effects of social structure (Milroy, 2001) in the process of language competition. To this end, and along the lines of the original proposal by Minett and Wang (Wang & Minett, 2005), we study an agent based model that incorporates bilingual agents on different networks: a regular network, a small world network, and a social type network with community structure (Toivonen et al., 2006). We compare the results obtained with our work on the agent-based version of Abrams-Strogatz two-state model (Stauffer et al., 2007). This way we will provide a quantitative analysis that is wanting in the field of sociolinguistics, as noted by de Bot and Stoessel (De Bot & Stoessel, 2002b).
2. The Bilinguals Model We consider a model of two socially equivalent (i.e. equally prestigious) competing languages in which an agent i sits in a node within a network of N individuals
61
and has ki neighbours. It can be in three possible states: A , agent using a language A; B , agent using language B; and AB, bilingual agent using both, A and B. The state of an agent evolves according to the following rules: at each iteration we first choose one agent i at random, and, then, we compute the local densities of language users of each linguistic community in the neighbourhood of agent i: of (l=A,B , AB; i = l , N ; 0 : o : = 1). The agent i changes its state of language use according to the following transition probabilities b:
+
+ "AB
Equation (1) gives the probabilities for an agent to move away from a monolingual community to the bilingual community AB. They are proportional to the density of monolingual speakers of the other language in its neighbourhood. On the other hand, equation ( 2 ) gives the probabilities for an agent to move from the bilingual community towards one of the monolingual communities. Such probabilities are proportional to the density of speakers of the adopting language including bilinguals (1 - af = a! a t B , 1, j=A,B ; 1 # j ) . It is important to note that a change from being monolingual A to monolingual B or vice versa always implies an intermediate step through the bilingual community. The transition probabilities (1) and ( 2 ) are fully symmetric under the exchange of A and B , which is consistent with the fact that both languages are socially equivalent in terms of prestige. We recover the agent-based version of Abrams-Strogatz two-state model when bilinguals are not present. In this model, an agent essentially imitates language use of a randomly chosen neighbour. The corresponding transition probabilities are the following:
+
P~,A.-B =
' B
~0~
,
' A
P ~ , B + A = -oi 2
(3)
For a quantitative description of the emergence and dynamics of linguistic spatial domains we use the ensemble average interface density ( p ) as an order parameter. This is defined as the density of links joining nodes in the network which are in different states (San Miguel et al., 2005). The ensemble average, indicated as (.), denotes average over realizations of the stochastic dynamics starting from "Note that we always refer to language use rather than competence. Therefore, an agent using two languages might stop using one of them as this becomes less spoken in its social vicinity. bNon-equivalent languages were considered in the original version of the model (Wang & Minett, 2005). The prefactor 1/2 corresponds to the special case of equivalence between A and B.
62 different random distributions of initial conditions. During the time evolution, the decrease of p from its initial value describes the ordering dynamics, where linguistic spatial domains, in which agents are in the same state, grow in time. The minimum value p = 0 corresponds to a stationary configuration in which all the agents belong to the same linguistic community.
3. Results 3.1. Regular and small world networks The bilinguals model has been extensively studied in two-dimensional regular networks, and small world networks (Castellb, Eguiluz, & San Miguel, 2006). In two-dimensional regular networks, and starting from a randomly distributed state of the agents, spatial domains of each monolingual community are formed and grow in size (Fig 3.1). This is known in the physics literature as coarsening. Meanwhile, domains of bilingual agents are never formed. Instead, bilingual agents place themselves in a narrow band between monolingual domains (Fig 3.1). Finally a finite size fluctuation drives the system to a dominance-extinctionstate, where all the agents become monolingual, while the other monolingual community together with the bilingual agents face extinction. Average interface density ( p } decays as a power law ( p ) t-7, y cu 0.45 (Caste116 et al., 2006). This indicates that the growth law found for the bilinguals model is compatible with the well known exponent 0.5 associated with domain growth driven by mean curvature and surface tension reduction observed in SFKI (spin flip kinetic Ising model) (Gunton, San Miguel, & Sahni, 1983). The characteristic time to reach an extinction of one of the languages 7 scales with system size as T N'.'. A very different behaviour is found for the agent based Abrams-Strogatz model, where bilingual agents are not present: coarsening is slower ( ( p ) (lnt)-') and driven by interfacial noise.
-
N
-
Figure 1. Formation and growth of monolingual domains. Starting from random initial conditions, snapshots of a typical simulation of the dynamics in a two-dimensional regular network of 2500 individuals. t=O, 2, 20, 200 from left to right. Grey: monolinguals A, black: monolinguals B, white: bilinguals.
To study the effect of long range social interactions, which are one of the ba-
63 sic characteristics of social networks, we next consider a small world network (Watts & Strogatz, 1998). There, T In N (Caste116 et a]., 2006). For the agent based Abrams-Strogatz model, the long range connections inhibit the formation and growth of monolingual domains by producing long-lived metastable states. Metastable states are those where we find dynamical coexistence of the two languages during a long but finite time, after which the system drops to a dominanceextinction state. However, in the bilinguals model, when we moved towards a small world network by adding long range connections to the two-dimensional regular network, bilingual agents destroy a metastable state of dynamical coexistence, and slow down coarsening, although domains keep growing in size. In addition, they speed up the decay to extinction of one of the languages due to finite size fluctuations (Caste116 et al., 2006). N
3.2. Social type network with community structure
Community structure is a prominent characteristic of real social networks which may crucially affect social dynamics, and in particular, language competition. A combination of random attachment with search for new contacts in the neighbourhood has proved fruitful in generating cohesive structures (Toivonen et al., 2006). We choose this model, because it produces well-known features of social networks, such as assortativity, broad degree distributions, and community structure. The most important result regarding this topology (Caste116 et al., 2007), is the behaviour of the characteristic time to reach a dominance-extinction state. To this end, we analyse the fraction f ( t ) of runs still alive at any time t, i.e. the fraction of runs which have not reached the dominance-extinction state. We average over different realizations of the network, and several runs in each. For the agent based Abrams-Strogatz model, the fraction of alive runs decreases exponentially. Results are more interesting for the bilinguals model: f ( t ) appears to have power law behaviour f ( t ) t P a ,Q M 1.3. Since the exponent a < 2, the average decay time for the bilinguals model does not give a characteristic time scale, but alive realizations which have not reached the dominance-extinction state are found at any time scale. The difference between the agent based Abrams-Strogatz model and the bilinguals model is better understood by looking at snapshots of the dynamics (Fig. 3.2) which show the characteristic behaviour for each of the models, starting from random initial conditions (t = 0). In the former (left), the homogeneous domains of nodes with the same option appear to follow the community structure, but a particular community (topological region) may change the language adopted by the community rather quickly (t = 50,60,70). At variance with this behaviour, in the bilinguals model (right) spatial linguistic domains grow and homogenise steadily in a community without much fluctuation. For this dynamics, communities that have adopted a given language, and which are poorly linked to the rest of the network, take a long time to be invaded by the other language, actN
64
ing therefore as topological traps. As an example of this we show two long lived trapped metastable state at t = 430 and t = 1000, where the interface density stayed relatively stable for a prolonged period ( w 100 and 1000 time steps, respectively). Again, these different behaviours reflect in the community structure two different interfacial dynamics: interfacial noise driven dynamics for the agent based Abrams-Strogatz model, and curvature driven dynamics for the bilinguals model with agents in the Al3 state at the interfaces. N
"-
Figure 2. Snapshots of a single run of the dynamics, with nodes in state A in black, B in grey, and AB in white circled in black. Simulations start from random initial conditions. Left: Abrams-Strogatz agent-based model. Right: Bilinguals model (example of a simulation leading to metastable states).
4. Conclusionand Further Research We have analysed the bilinguals model (in comparison to the agent-based version of Abrams-Strogatz model) in different topologies. Although the final state
65 of the system is always a homogeneous state where one of the languages faces extinction, the transient towards this final state depends crucially on the network structure. This comes to complement, from an agent based modelling point of view, the importance of social networks in the processes of language contact already claimed in (Milroy, 1987). Within the limitations and assumptions of the model, the analysis of the small world phenomenon, which is characteristic of the current interconnected societies, might be an ingredient which accelerates language extinction. This effect might be related to an overall globalisation process in which not only languages, but also whole cultures tend to homogeneity rather than diversity. However, the study of the dynamics in the social type network with communities, which is the network that mimics the features of real social networks in a closer way, shows that there exist metastable states at all time scales. This indicates that in presence of bilingual individuals, minority languages might survive within some communities for very long periods of time when the social network displays community structure. We are currently studying, in a two-dimensional regular network, the case of competition between non-equivalent languages (s # 0.5), and the effect of perturbating the linear transition probabilities of the model. In general, when one of the languages has a higher prestige, and starting from random initial conditions, the system always evolves towards the extinction of the less prestigious language. This happens in any of the two models studied in this paper. We are also studying the perturbation of the linear transition probabilities of the model. On the one hand, when perturbating them in such a way that agents change his language with a probability larger than in the linear case we get coexistence for some range of the exponent we use as parameter (we call it volatility), even in the case of nonequivalent languages. We call this regime high volatility regime (i.e., agents easily change language use). On the other hand, when perturbating the model in such a way that agents change his language with a probability smaller than linear, we got flatter linguistic borders and slower growth of linguistic domains, and the times of extinction in both models increase significantly (low volatility regime; i.e., agents have larger inertia to change language use). We are also currently analysing a model we have proposed, where language is taken as a property of the social interaction (link) instead o f a feature of the agent (node), getting a new perspective from what is usually assumed in agent based models regarding language competition. A new interpretation of where we find language in the network, and the emergence of different degrees of bilingualism in the interfaces between monolingual domains, are some of the novelties that we learn from this new approach to language competition.
Acknowledgements We acknowledge financial support of the European Commission through the NESTComplexity project PATES (043268).
66
References
Abrams, D. M., & Strogatz, S. H. (2003). Modelling the dynamics of language death. Nature, 424,900. Castello, X., Eguiluz, V, M., & San Miguel, M. (2006). Ordering dynamics with two non-excluding options: Bilingualism in language competition. New Journal of Physics, 8,308-322. Castell6, X., Toivonen, R., Eguiluz, V. M., Saramaki, J., Kaski, K., & San Miguel, M. (2007). Anomalous lifetime distributions and topological traps in ordering dynamics. Europhysics Letters, 79,66006. Crystal, D. (2000). Language death. Cambridge: Cambridge University Press. De Bot, K., & Stoessel, S. (Eds.). (2002a). Internationaljournal of the sociology of language (Vol. 153). Berlin and New york: Mouton de Gruyter. De Bot, K., & Stoessel, S. (2002b). Introduction: Language change and social networks. International Journal of the Sociology of Language, 153, 1-7. Gunton, J. D., San Miguel, M., & Sahni, P. (1983). Phase transitions and critical phenomena. In C. Domb & J. L. Lebowith (Eds.), (Vol. 8, pp. 269-446). London: Academic Press. Milroy, L. (1987). Language and social networks. Oxford and New York: (2nd ed) Basil Blackwell. Milroy, L. (2001). Theories on maintenance and loss of minority languages. In J. Klatter-Folmer & P. van Avermaet (Eds.), (pp. 39-64). Miinster and New York: Waxmann. 0’ Driscoll, J. (2001). A face model of language change. Multilingua, 20,245. San Miguel, M., Eguiluz, V., Toral, R., & Klemm, K. (2005). Binary and multivariate stochastic models of consensus formation. Computer in Science and Engineering, 7,67-73. Schulze, C., & Stauffer, D. (2006). Recent developments in computer simulations of language competition. Computing in Science and Engineering, 8,60-67. Stauffer, D., Castello, X., Eguiluz, V. M., & San Miguel, M. (2007). Microscopic abrams-strogatz model of language competition. Physica A , 374,835-842. Stauffer, D., & Schulze, C. (2005). Microscopic and macroscopic simulation of competition between languages. Physics of Life Reviews, 2,89-116. Toivonen, R., Onnela, J., Saramaki, J., Hyvonen, J., KertCsz, J., & Kaski, K. (2006). Physica A , 371(2). Tsunoda, T. (2005). Language endangerment and language revitalisation. Berlin and New York: Mouton de Gruyter. Wang, W. S.-Y., & Minett, J. W. (2005). The invasion of language: emergence, change and death. Trends in Ecology and Evolution, 20, 263-269. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ’small-world’ networks. Nature, 393,440-442. Weinreich, U. (1953). Languages in contact. New York Linguistic Circle of NY.
LANGUAGE, THE TORQUE AND THE SPECIATION EVENT TIMOTHY J CROW SANE PO WIC, University Department of Psychiatry, Warneford Hospital, Oxford OX3 7JX, UK 5 to 6 million years ago, 3.5 megabase of DNA duplicated from the long arm of the X to create a hominid-specific stratum on the Y short arm. This “saltation” is a candidate for
the speciation event for Australopithecus. Within the transposed block a gene pair Protocadherin X. and Protocadherin Y - has been subject to accelerated evolution, with 16 amino acid changes in the Y protein and five in the X. The latter include the introduction of two sulphur containing cysteines, that are likely to have changed the function of the molecule. The sequence changes on the X, affecting both males and females, are seen as secondary to chromosomal rearrangements on the Y. (four deletions and a paracentric inversion), the latter representing the initiating events in successive speciations, and the former representing the sexually selected phase of accommodation that establishes a new mate recognition system, with asymmetry (the cerebral torque) the autapomorphy for Homo sapiens.
1.1. Darwinian Gradualism and Huxley’s Doubt
Shortly after publication of The Origin TH Huxley wrote to Darwin that he hoped Darwin had not loaded himself ‘with an un-necessary difficulty in adopting ”Natura non facit saltum” so unreservedly’. Thus was initiated a division of opinion amongst evolutionists between those, who follow Darwin in strict gradualism, and those following Huxley who take a saltational view of transitions and a more discrete and categorical view of the nature of species. What is a species ? According to the Biological or Isolation Species Concept: a species is a ‘group of actually or potentially interbreeding natural populations that are reproductively isolated from other such groups’ (Mayr 1963). A more specific and functional definition is that a species is ‘the most inclusive population of individual biparental organisms which share a common fertilisation system’ (Paterson 1985). Thus, according to Paterson, a species is defined by a ‘specific mate recognition system’. I will argue that this concept is crucial to resolving Huxley’s doubt. 67
1.2. Saltations Modulated By Sexual Selection
In the history of challenges to the gradualist version (eg Bateson 1894, De Vries 1901), the most forceful dissident was Goldschmidt (1940), who formulated the concept of the “Hopeful Monster” the outcome of a ‘macromutation’, doomed to maladaptive failure but that just conceivably might succeed. But the greater the magnitude of the saltational change the less likely it is to have survival value, and the greater the difficulty in identifying a mate. Thus a proposal, which challenged Darwinian gradualism and appeared to conflict with the principle of natural selection, has been widely disregarded. But here Darwin’s (1871) juxtaposition of The Descent of Man and the theory of sexual selection suggests a new possibility. Darwin made no specific proposal, but a role for sexual selection in modifying a primary change in a sexually dimorphic feature to establish a new species boundary has been argued in relation to Hawaiian Drosophilid species by Kaneshiro (1 980) and Carson (1997). The general principle is that change that differentiates the sexes is species-specific, can be saltational, at least in one sex, rapid in either sex, and often idiosyncratic in defiance of the environmental adaptivity of natural selection. Chromosomal theories of speciation (White 1978, King 1993) have been criticised on the grounds that rearrangements are frequent, often without phenotypic effect, and sometimes present as polymorphisms within a population. Here it is argued that it is not chromosomal change in general that plays a role in speciation but change on the sex chromosomes. Such changes are associated with sexual dimorphisms, that are species specific and necessary to the construction of a mate recognition system,. Furthermore non-recombining regions of X-Y homology can account (as in the case of lateralization in humans, see below) for quantitative differences in a characteristic between males and females, such as are plausible substrates for sexual selection. The Y chromosome in mammals (and the W in birds) is not necessary for survival. There are large inter-specific differences: while the X is the most stable chromosome across species [Ohno’s law (1967)l the Y is by far the most variable. The mammalian Y therefore can be seen as a test-bed of evolutionary change. One possibility is that the primary change in speciation takes place on the Y, and if located in a region of homology with the X correlated but independent change in the two sexes could explain the type of runaway sexual
69 selection envisaged by Fisher (1930). Thus a primary and saltational change in one sex is selected by the members of the other sex to initiate a phase of sexual selection and define a new mate recognition system. The sequence can be represented: Y Chr rearrangemenofemale selection of mutant males>)( Chr sequence change>new mate recognition system The mate recognition system in man is defined by the dimension of asymmetry (see eg Peters et al, 2006). The clue to the location of the gene comes from sex chromosome aneuploidies. Individuals who lack an X chromosome [XO, Turner’s syndrome] have non-dominant hemisphere [spatial] deficits on cognitive testing. Individuals with an extra X [XXY, Klinefelter’s, and XXX syndromes] have verbal or dominant hemisphere deficits (Table 1). Thus an asymmetry determinant is present on the X chromosome. . A hormonal explanation will not account for the similarity of the changes in XXY individuals, who are male, and XXX individuals, who are female. That the gene is present also on the Y chromosome (Crow 1993) is demonstrated by the verbal deficits and/delays that are observed in XYY individuals (Geerts et al. 2003). Table 1 . Neuropsychological impairments associated with sex chromosome aneuploidies
The hypothesis is further strengthened by evidence that Turner‘s and Klinefelter’s syndrome individuals have corresponding deviations in anatomical asymmetry (Rezaie et al. 2004). A role for an X-Y homologous gene is consistent with the presence of a sex difference - brain growth is faster (Kretschmann e t al. 1979) and lateralisation to the right is stronger (Crow et al. 1998) in females. Females have greater mean verbal fluency and acquire words earlier (Maccoby & Jacklin. 1975) than males.
1.3. The Xq21.3flp Duplication When we come to consider where such a gene might be located an important lead is a major chromosomal rearrangement: A 3.5Mb contiguous block of sequences from the X chromosome long arm was duplicated onto the Y chromosome short arm. That event now dated at 6 million years (Williams et al.
70
2006) is therefore a candidate for the transition from a great ape hominid precursor to Australopithecus. The homologous block thus created was subsequently subject tofour deletions, and was split by a paracentric inversion (by a recombination, presently undated, of LINE-1 elements (Schwartz et al. 1998, Skaletsky et al. 2003)) to give two blocks of homology in Ypll.2 (Figure 1). Two regions on the human Y chromosome short arm thus share homology with a single region on the human X chromosome long arm (Xq21.3) (Lambson et al. 1992, Sargent et al. 1996). Genes within this region are therefore present on both the X and Y chromosomes in Homo sapiens but on the X alone in other great apes and primates. I
1
Figure 1. The Xq21.3Np duplication. The first event was a duplication of 3.5 megabases from the Xq21.3 region of the X. long arm to the Y chromosome short arm now dated at 6 million years, ie coincident with the separation of the chimpanzee, and hominid lineages. The second event inverted most of the duplication and some of the pre-existing Y short arm but has not been dated. An explanation for the retention of the duplicated block on Yp can be sought in its gene content.
Three genes are known to be expressed from the region (Figure 2); PABPCS, a poly (A)-binding protein whose Y gametologue has been lost during hominid evolution; TGIF2LX and Y, (homeobox-containing genes with testisspecific expression) and ProtocadherinX (PCDHI 1X) and ProtocadherinY (PCDHI 1Y). PCDH11X and Y (each comprising seven extracellular cadherin motifs, a short transmembrane region and an intracellular cytoplasmic tail) that code for cell adhesion molecules of the cadherin superfamily are of note because both forms of the gene have been retained and are highly expressed both in fetal and adult brain (Yoshida et al. 1999, Blanco et al. 2000) including the germinal layer of the cortex (T.H. Priddle, personal communication). The protein products of this gene pair are thus expected to play a role in intercellular communication perhaps acting as axonal guidance factors and influencing the connectivity of the cerebral cortex.
71 X
Y
Figure 2 . Alignment of the homologous regions on the X long arm and Y chromosome short arm to show four deletions (a to d) on the Y and the three genes within the block. The Y chromosome is described as consisting of strata revealing points in evolution at which blocks were added to the chromosome from the autosomes or the X (Skaletsky et al. 2003). For the reasons given above, non-recombining regions of XY homology have particular significance because they are relevant to sexual dimorphisms and the formation of mate recognition systems. Because it is the most recently added component this block on the Y might be referred to as the ‘hominid strip’.
telomere
centromere
t
4TGIFLX r PABPCS
4PCDHX
-1
-1
Y centromere
X
telomere
P C D H l l X c “longest” NM-032968 Skaletsky ,e t ol. 2003 Nature 423 6942: 025 -037 1 2 3
4
5
6
7 8
9
10
11
Exow
CM-2 motif,CTQECLIYGHSDACW
Figure 3. Exon structure, ectodomain repeats and cytoplasmic functional motifs of the long form of ProtocadherinX.
X
72
The structure of the ProtocadherinXY molecule (figure 3) reveals points of interest - 1) the ectodomain comprises seven Protocadherin repeats, structures that interact with the same features on the surface of another cell to generate adhesive forces, 2) a beta-catenin binding site is consistent with the report (Chenn et al. 2003) that this molecule interacts with one that is involved in gyrification of the human cerebral cortex, 3), the protein phosphatase 1 a binding site indicates a role in axo-dendritic synapse formation, 4), the dodecapeptide repeat motif is specific to this molecule. The protein coded by this gene is the only Protocadherin that includes both beta-catenin and protein phosphatase 1a binding sites. It is plausible that it acts to form synapses and as an axonal guidance factor. Three arguments suggest that the Xq21.3Np translocation was relevant to hominid evolution -l), the timing of the original duplication relative to the chimpanzeehominid bifurcation.2), the sequence changes in Protocadherin Y, and particularly Protocadherin X since the duplication.3), the case for an XY homologous determinant of cerebral asymmetry. While argument 1 relates to the Australopithecus speciation event, argument 3 relates to the sapiens event. The implication is that the presence of the homologous block on the Y-chromosome created a field for genetic innovation in the hominid lineage; it could perhaps be referred to as the ‘hominid strip ‘. References
Bateson, W. (1894). Materials for the Study of Variation. New York: MacMillan. Blanco, P., Sargent, C.A., Boucher, C., Mitchell, M., & Affara, N. (2000). Conservation of PCDHX in mammals; expression of human XN genes predominantly in the brain. Mammalian Genome, 11, 906-914. Carson, H.L. (1997). Sexual selection: a driver of genetic change in Hawaiian Drosophila. Journal of Heredity, 88, 343-352. Crow, T.J. (1 993). Sexual selection, Machiavellian intelligence and the origins of psychosis. Lancet, 342, 594-598. Crow, T.J., Crow, L.R., Done, D.J., & Leask, S.J. (1998). Relative hand skill predicts academic ability: global deficits at the point of hemispheric indecision. Neuropsychologia, 36(12), 1275- 1282. Darwin, C. (1859). On The Origin of Species By Means of Natural Selection: or, The Preservation of Favoured Races in the Struggle for Lye. London: John Murray.
73 Darwin, C. (1871). The Descent of Man, and Selection in Relation to Sex facsimile of original published in 1981 by Princeton University Press, New Jersey). London: J Murray. De Vries, H. (1901). Die Mutationstheorie. Leipzig: Verlag von Veit. Fisher, R.A. (1 930). The Genetical Theory of Natural Selection. Oxford: Oxford University Press. Geerts, M., Steyaert, J., & Fryns, J.P. (2003). The XYY syndrome: a follow-up study on 38 boys, Genetic Counselling, 14(3) 267-279. Goldschmidt, R. (1 940). The Material Basis of Evolution, (Reprinted 1982 edn,) New Haven: Yale University Press. Kaneshiro, K.Y. (1980). Sexual isolation, speciation and the direction of evolution. Evolution, 34,437-444. King, M. (1993). Species Evolution: the role of chromosome change Cambridge: Cambridge University Press. Kretschmann, H.F., Schleicher, A., Wingert, F., Zilles, K., & Loeblich, H.-J. (1979), Human brain growth in the 19th and 20th century. Journal of the Neurological Sciences, 40, 169-188. Lambson, B., Affara, N.A., Mitchell, M., & Ferguson-Smith, M.A. (1992). Evolution of DNA sequence homologies between the sex chromosomes in primate species. Genomics, 14, 1032-1040. Maccoby, E.E. & Jacklin, C.N. (1975). The Psychology of Sex Differences. Oxford: Oxford University Press. Mayr, E. (1963). Animal Species And Evolution. Cambridge MA: Harvard University Press. Ohno, S. (1967), Sex Chromosomes and Sex-Linked Genes. Berlin: SpringerVerlag. Paterson, H.E.H. (1985). The recognition concept of species, In E. S. Vrba (Ed.), 4 edn, Species and Speciation (pp. 2 1-29). Pretoria: Transvaal Museum Monograph. Peters M, Reimers S, Manning JT (2006) Hand preference for writing and associations with selected demographic and behavioral variables in 255,100 subjects: The BBC internet study Bruin and Cognition, 62: 177-1 89 Rezaie, R., Roberts, N., Cutter, W.J., Murphy, D.C.M., Robertson, D.M.W., Daly, E.M. et al. (2004). Anomalous asymmetry in Turner's and Klinefelter's syndromes - further evidence for X-Y linkage of the cerebral dominance gene. American Journal of Medical Genetics (Neuropsychiatric Genetics), 130B[1], 102-103. Sargent, C.A., Briggs, H., Chalmers, I.J., Lambson, B., Walker, E., & Affara, N.A. (1996). The sequence organization of Yp/proximal Xq homologous
74
regions of the human sex chromosomes is highly conserved. Genomics, 32, 200-209. Schwartz, A., Chan, D.C., Brown, L.G., Alagappan, R., Pettay, D., Disteche, C., McGillivray, B. et al. (1998). Reconstructing hominid Y evolution: Xhomologous block, created by X-Y transposition, was disrupted by Yp inversion through LINE-LINE recombination. Human Molecular Genetics, 7,l-11. Skaletsky, H., Kuroda-Kawaguchi, T., Minx, P.J., Cordum, H.S., Hillier, L., Brown, L.G., Repping, S. et al. (2003). The male-specific regions of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423(6942), 825-837. White, M.J.D. (1978). Modes of Speciafion. San Francisco: W H Freeman. Williams, N.A., Close, J., Giouzeli, M., & Crow, T.J. (2006). Accelerated evolution of Protocadherinl I N Y : A candidate gene-pair for cerebral asymmetry and language, American Journal of Medical Genetics (Neuropsychiatric Genetics), 141B, 623-633. Yoshida, K. & Sugano, S. (1999). Identification of a novel protocadherin gene (PCDHII) on the human XY homology region in Xq21.3. Genomics, 62, 540-543.
THE EMERGENCE OF COMPOSITIONALITY,HIERARCHY AND RECURSION IN PEER-TO-PEER INTERACTIONS
JOACHIM DE BEULE Arti9cial Intelligence Lab, Free Universiw of Brussels, Brussels, 1050, Belgium [email protected] It is argued that compositionality. hierarchy and recursion, generally acknowledged to be universal features of human languages, can be explained as being emergent properties of the complex d)namics governing the establishment and evolution of a language in a population of language users, mainly on an intra-generational time scale, rather than being the result of a genetic selection process leading to a specialized language faculty that imposes those features upon languagc or than being mainl) a cross-generational cultural phenomenon. This claim is supported with results from a computational language game experiment in which a number of autonomous software agents bootstrap a common compositional and recursive language.
1. Introduction
Compositionality, hierarchy and recursion are universal features of language. By allowing the combination of words into hierarchical phrases which can then recursively be combined into larger phrases, these features allow to make infinite use of finite means in language. Therefore, and because they introduce regularities in a language, they also make a language easier to learn. In short, they may increase a language’s fitness as well as that of individual language users. The question remains then how they are selected for. The mechanism explored in this paper focusses on the increased usability aspect. Language is compositional, hierarchical and recursive because it serves a purpose, and if a feature of language is productive and allows for more effective communication, then individual language users will prefer it over less effective means of communication (Croft, 2007). The effectiveness of an element of language can of course not be isolated from its learnability and the fact that the entire language community should agree upon it. Hence, like in nativism (Hauser, Chomsky, & Fitch, 2002), the capacity for e.g. recursion is assumed, but it need not be language specific, thereby rendering the problematic question of how it could have evolved for language obsolete. Moreover, this capacity need not be part of a universal grammar imposing itself upon language. Instead, it simply needs to be available for language to recruit 75
76 (Steels, 2007). Similarly, we do acknowledge that multi-generational mechanisms like iterated learning (Smith, Kirby, & Brighton, 2003) can be shaping forces of language. However, these only act as second order effects on top of the first order dynamics governed by usability considerations. To support these claims, a number of computational language game experiments were carried out (Steels, 2002) using the framework of Fluid Construction Grammar (De Beule & Steels, 2005). Such an experiment consists of repeatedly picking a random speaker and hearer from a population of agents (simulated language users) and letting them commuoicate about scenes. After each interaction both agents update their language inventories to improve their communicative skills. Most of the details of the simulations and the results will be discussed in the rest of the paper, for more information the reader is referred to (De Beule, 2007). 2. Experimental Setup
2.1. Scenes and Topics
The scenes about which the agents need to communicate would in English be described by sentences like “Tall blond John kicks beautiful Mary”: they always involve two participants each fulfilling either the agent or patient role in an event. Both participants may also be further specified by features (like ‘tall’, ‘blond’ and ‘beautiful’ in the example.) Scenes are presented to the agents in the form of logical conjunctions of predicates, e.g. the example scene would be presented as:
t a l l ( z ) A blond(rc) A J o h n ( z ) A k i c k ( z , y) A b e a u t i f u l ( y ) A Mary(y) The number of different event-, participant- and feature-type predicates was set to (three times) five. However, an arbitrary number of feature predicates may be present in a scene description” according to a binomial distribution with average and standard deviation set to one feature predicate per participant. The speaker agent does not necessarily describe the entire scene to the hearer: possible topics also include both event participants together with zero or more of the features assigned to them in the scene. On average a topic description contains 2.75 predicates. For example, the above scene specifies 14 topic descriptions, including ‘John(z)’, ‘ t a l l ( z ) A J o h n ( z ) ’etc. Note that the latter description specifies that the arguments to the t a l l ( . ) and John(.) predicates are equal. Such co-reference relations also need to be expressed. This can be done using a holistic word (i.e. one word covering both predicates at once, including the equality of their arguments) or else with several words plus a number of grammatical constructions specifying an ordering between them, see e.g. Steels (2005). “Some of them may be the same as in ‘tall tall John’.
77
Every interaction, a random scene and associated topic are generated and presented to the speaker. The hearer is only presented with the scene, not the topic. Evidently, he does get to see the utterance generated by the speaker for describing the topic, but only after an efficient communication system has been established will the hearer be able to successfully parse it and hence know what the topic was.
2.2. Language Model An agent’s lexicon consists of a number of bi-directional wordmeaning mappings. The meaning of a word may be any combination of predicates. All agents start-off with empty lexicons. Whenever a speaker agent needs to verbalize a topic description, he introduces at most one new word covering all predicates at once for which no word is known yet. Different speaker agents may propose different words for the same meaning. Therefore, every word has an associated synonymy score which is updated according to the well-known lateral-inhibition scheme (Steels, 2002). An utterance is presented to the hearer as a single string, i.e. without word boundaries. He decomposes it into words again according to the entries in his lexicon. He only proceeds when (presumably) at most one word is unknown, otherwise the interaction fails and the speaker decreases the scores of the words used. Hearer agents do not know the topic so they can not infer the intended meaning of a word from one interaction only. Therefore, every word/meaning mapping also has an associated probability score representing its estimated correctness. These are updated according to the cross-situational learning algorithm as described in (De Beule, De Vylder, & Belpaeme, 2006). In short, this algorithm allows to combine the information about the meaning of words gained in different situations, while at the same time allowing to cope with inconsistencies caused by changes in word meanings. Agents prefer those wordmeaning mappings with maximum associated synonymy times probability scores. The score of a multiple word analysis is determined as the product of the scores of all words involved. Hence, if one holistic word with high score covers the entire topic description then it might be preferred. If however several more atomic words that only together cover the entire topic description have a higher combined score then these will be preferred. Hybrid combinations are also possible. After lexicon lookup, all predicates in the topic description are covered by a word (speaker side) or all words in the utterance contribute a number of predicates (hearer side.) The orderings among the words in an utterance expres co-reference relations. The way in which a particular word ordering corresponds to argument equalities in the meaning is determined by the grammar and is something the agents need to agree upon. As was the case for words, speaker agents may introduce new rules of grammar as they need them, and hearers will try to adopt
78
thkm if possible. And just as agents may use and propose both holistic and atomic words, they may also use and propose different types of grammar rules. Below are schematically shown a number of example rules for combining words covering predicates of the type specified on the right hand side of the rules (P stands for Participant,F for Feature,E for Event and S for Scene): P(X) P(X) S(X,Y) S(X,Y) Type-l42(X,Y) Type-36(X) Type-726(X) Type-76(X,Y)
CC-
C-
<<-
F(X) P(X) F(X) P(X) F(X) P(X) E(X,Y) P(Y) E(X,Y) P(X) F(Y) P(Y) F(X) E(X,Y) F(X) F(X) Type-36(X) P(X) F(X) P(Y)
(1) (2) (3) (4)
(5) (6) (7) (8)
For example, the first rule specifies that if a word or phrase covering a meaning of type feature is directly followed by another word or phrase covering a meaning of type participant, then their arguments should be made equal and the result is a phrase of type participant. Hence, each rule introduces hierarchical structure allowing the subsequent application of rules until all co-reference relations (argument equalities) are expressed and all words are fully ordered. The combination of a number of feature type phrases with a phrase of type participant again results in a participant type phrase if they all have identical arguments (rules 1 and 2 but not rule 8.) Only a limited number of type combinations result in simple result types like this (see rules 1-4.) Most combinations result in the creation of new types (e.g. rules 5-7) which can themselves also be used in other rules (rule 7.) Every agent maintains a private grammar and type system. Both the rules (1) and (2) are recursive, but only the first one allows to express an arbitrary number of feature predicates in combination with a participant predicate. Note that agents not only need to agree upon what constituents to take together (what elements should be on the right hand side of the rules), but also upon their order. For example, some agents may propose the SVO-like rule (2), while others may initially prefer the VSO-like rule (4)b. Probability (correctness) and preference (synonymy) scores are used both for reaching a consensus and for determining what analysis to prefer, similar to what happens while learning the meaning of words and during lexicon lookup. 3. Results Figure 1 shows the evolution of the communicative success for different population sizes measured as a running average. From this graph it can be concluded that bHowever, note that rule (4) requires that a feature-type phrase precedes the object participant-type phrase.
79
Time (Number of Interactions per Agent)
Figure 1. Evolution of the communicative success for different population sizes. (all graphs averaged over 10 independent runs with error bars 1 standard deviation wide.) Time was rescaled such that at time t an agent has had on average n,t interactions with n, the population size. The inset shows a detailed portion of the larger graph.
the agents in any case do succeed in evolving a successful communication system. Figure 2 shows the evolution of the number of predicates in a topic description divided by the number of words in the utterance, measured as a running average. After about 100 interactions per agent, only words are used that have exactly one predicate in their meaning. Put differently: the agents prefer to use compositional language. In contrast to what is the case for communicative success, population size has no influence. This shows that the decision as to go compositional can be made independently from the one about what specific words to use and hence already after a fixed number of interactions per agent rather than after a number proportional to the population size. Recall that compositional language requires grammar. As it turns out, after about 800 interactions per agent, only rules with result type P a r t i c i p a n t or S c e n e are used (like example rules (1) to (4) but not the 0thers.Y Moreover, and again after about 800 interactions per agent, the surviving rules only contain 2 and 3 constituents respectively (i.e. like example rules (1) and (3) but not (2) and (4).) This means that the agents not simply prefer to use compositional and hence grammatical language, but, more specifically, they prefer recursive grammar rules that introduce the maximum amount of hierarchy. ‘Because of space limitations we could not include the relevant graphs here, the interested reader is referred to (De Beule, 2007).
80 3
5 Agents L---, 10Agents 8--* i 15 Agents L -8 20 Agents $..e--:
-=
2.8 26 24
e
p
22
P 3
2
8
i
i*
3
16
a
14 12
1
0
2w
dW 600 Time [Number 01 Interactions per Agent)
800
1000
Figure 2. The number of predicates in the topic description divided by the number of words in the utterance. For a completely holistic language this would be 2.75 (the average number of predicates in a topic description.) A fully compositional language would give 1, which is the value to which all graphs converge.
4. Discussion and Conclusion
The simulation results confirm that a language can become compositional, hierarchical and recursive simply because language users want to be understood. There is no need to resort to a language faculty dictating these features upon language or to a multi-generational mechanism like iterated learning. One thing that might appear to be in contradiction with these findings is that natural languages remain partially holistic. Natural meanings are clearly correlated and hierarchically organized. In contrast, the world model considered in the experiments is not. Hence, one cannot expect holistic words to survive because such words simply are of not much use. If however certain combinations of predicates would appear more frequently in scene descriptions than others, then it would be useful to have specific, holistic words for them. This was indeed confirmed in another series of experiments in which the same setup was used as described in this paper except that certain correlations between meaning predicates were introduced. As a result, the emerging languages remained partially holistic (see Figure 3.) In a third series of experiments the effect of a population turnover was investigated. It should be clear that such a turnover is not required to explain the emergence of compositionality, hierarchy or recursion. However, since language evolution is a stochastic process, and since iterated learning was shown by others to be a shaping force of language, there are indeed measurable effects. But these are only of second order compared to the first order effects described in this paper,
81 I
2
& .......................
I
-I
........................................................................................
I
I 0
2000 p5-00 01
02
4000
+
x
*
6000 BWO 10000 12000 14WO Time (Number at Inferactions per Agent)
03 04 05
. 0
06 07
0
08
-
A
16040
l8WO 09 10
20000
.
v
Figure 3. Evolution of the number of predicates covered per used word for different values of a correlation parameter ‘p5’. If p5 equals zero then the experimental setup is identical to the one described in this paper. Increasing values of ‘p5’ correspond to increasing amounts of correlations between otherwise uncorrelated predicates across scene descriptions. For example, if p5>0, then certain participant type predicates will ulways be accompanied by specific feature type predicates (and possibly others.) In topic descriptions they can still occur separately. One can clearly see that larger values of p5 result in on average more predicates per used word, meaning that the agents prefer to use holistic words for frequently occurring combinations of predicates.
meaning that they are much smaller and only act on a much larger time scale (see Figure 4). To conclude then, we have shown that the (near) universality of productive features of language like compositionality, hierarchy and recursion can be explained as being an emergent property of the complex dynamics governing the establishment and evolution of a language in a population of generally intelligent interlocutors trying to increase their communicative skills. This happens mainly on an intra-generational time scale. These findings nullify explanations that see natural or cross-generational selection as the main shaping forces of language. References Croft, W. (2007). Language structure in its human context: new directions for the language sciences in the twenty-first century. In P. Hogan (Ed.), Cambridge encyclopedia of the language sciences. Cambridge: Cambridge University Press. De Beule, J. (2007). Compositionality, hierarchy and recursion in language, a case study in fruid construction grammar. Unpublished doctoral dissertation, VUB Artificial Intelligence Lab. De Beule, J., De Vylder, B., & Belpaeme, T. (2006). A cross-situational learning
82 1
09
0.8
g
0.7
6
0 85
L
05 0.0with turnover 0.0 no turnover 0.2 with turnover ~
0.2 no IUrnOVel
u.1
07’
0
500
1000
1500
+ x
*
0
I 2000
0 0
2000
4000 6000 Time (Number 01 Interactions per Agent)
8000
300
Figure 4. Evolution of the average lexicon compositionality in a fixed population of 5 agents (curves labeled ‘no-turnover') and in a population in which after every 600 interactions (i.e. after on average 120 interactions per agent) the oldest agent was replaced by a new one (curves labeled ‘with turnover’) and for two different experimental settings (see caption of Figure 3 and De Beule (2007) for details.) All curves are based on averaging the inverse of the number of predicates covered by words known by agents. including the words not used anymore hut still remembered from earlier phases in the experiment. One can clearly see that, in the case of a turnover, compositionality increases every next epoch, resulting in a larger final degree of compositionality for the curves labeled ‘with turnover’ compared to the ones labeled ‘no turnover’.
algorithm for damping homonymy in the guessing game. In L. M. R. et al. (Ed.), Art$cial life n (p. 466-472). MIT Press. De Beule, J., & Steels, L. (2005). Hierarchy in fluid construction grammar. In U. Furbach (Ed.), Proceedings of ki-2005 (Vol. 3698, p. 1-15). Berlin: S pri nger-Verlag. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298, 1569-1579. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: a framework for the emergence of language. Art$cial Life, 9(4), 3710386. Steels, L. (2002). Grounding symbols through evolutionary language games. In A. Cangelosi & D. Parisi (Eds.), (pp. 21 1-226). New York, NY, USA: Springer-Verlag New York, Inc. Steels, L. (2005). What triggers the emergence of grammar? In Aisb’05: Proceedings of the second international symposium on the emergence and evolution of linguistic communication (eelc’05) (p. 143-150). Steels, L. (2007). The recruitment theory of language origins. In C. Lyon, C. L. Nehaniv, & A, Cangelosi (Eds.), Emergence of communication and language (p. 129-150). Berlin: Springer.
CAUSAL CORRELATIONS BETWEEN GENES AND LINGUISTIC FEATURES - THE MECHANISM OF GRADUAL LANGUAGE EVOLUTION DAN DEDIU Linguistics and English Language, Universiy of Edinburgh, 40 George Square Edinburgh, EH8 9LL. UK The causal correlations between human genetic variants and linguistic (typological) features could represent the mechanism required for gradual, accretionary models of language evolution. The causal link is mediated by the process of cultural transmission of language across generations in a population of genetically biased individuals. The particular case of Tone, ASPM and Mzcrocephalin is discussed as an illustration. It is proposed that this type of genetically-influenced linguistic bias, coupled with a fundamental role for genetic and linguistic diversities, provides a better explanation for the evolution of language and linguistic universals.
1.
Language evolution as a gradual, accretionary process
There are many controversies concerning the evolution of language from a primitive, language-less state shared with the rest of the animal kingdom, to a derived one characterized by modem language, specific to our own species, Homo sapiens". Simplifying, the main divides across the field seem to concern: i. the nature of the transition: catastrophic (Crow, 2002a, b) versus graduallaccretionary (Pinker & Jackendoff, 2005; Smith, 2006; or Hurford, 2003); ii. the timing: recent (Crow, 2002a, b) versus ancient (Hurford, 2003); iii. the protolanguage: holistic (Kirby, 2000 or Wray, 2000, 2002) versus synthetic (Tallermann, 2006, Bickerton 2000), but few theories address all these aspects at the same level of detail.
As argued extensively elsewhere (Dediu, 2006; Dediu, 2007), the model of human evolution considered, whether explicitly or implicitly, strongly See, for example, the multitude of opinions expressed during the EvoLang conferences: Hurford, Studdert-Kennedy & Knight, 1998; Knight, Studdert-Kennedy& Hurford, 2000; Cangelosi, Smith & Smith, 2006; or in Christiansen & Kirby, 2003, to cite just a few relevant works.
83
a4
constrains the class of language evolution models envisageable. For example, a strong adherence to the extreme Out-of-Africa with Replacement modelb (Stringer & Andrews, 1988) will favor a recent, catastrophic view of the evolution of language, in the vein of Crow (2002a, b). Of course, there is also a much weaker reciprocal influence, informing the human evolutionary models with data, theories and speculations originating from the considered model of language evolution. If we suspend the assumption of a punctual speciation event for modern humans, the consequences for language evolution models are overwhelming (Dediu, 2006, 2007). This requirement of a recent speciation event has placed strong constraints on both the temporal span and the amount of diversity (genetic and cultural) available for evolving language(s), which has lead, in turn, to a very limited set of compatible proposals, namely either a hopeful monster (FOXP2, protocadherinXY) or a pure[v cultural process (compositionality as a result transmission bottlenecks). However, even with a large degree of phenotypic plasticity (especially at the neural level), it is hard to accept that a single lucky mutation could have created modern language out of a radically different precursor, irrespective of the proposed mechanisms (Mithen’s (1 996) “cognitive fluidity”, Crow’s (2002b) “lateralization”, etc.). The difficulty stems from the current data and theories concerning the biological limits of phenotypes generated by catastrophic mutations (West-Eberhard, 2003; Dawkins, 1997; Skelton, 1993; Gerhart & Kirschner, 1997), the behavioral genetics of language arguing for an important genetic component (accounted for by many genes with small effects, comprising both generalists and specialists, most of them involved in more than one aspect of language, or, generally, cognition; Stromswold, 200 1 ; Bishop, 2003; Fisher, Lai & Monaco, 2003; Plomin & Kovas, 2005) and the fact that the indissoluble link between modern humans and modern language, based on a specifically modem “package”, does not seem to hold (Dediu, 2006, 2007). The theories arguing for a purely cultural process (Kirby, 2000) still seem to implicitly assume that biological evolution provided the cognitive processes (potentially, non-language specific) required for a proper cultural evolution of language. But, given the apparently very general requirements (Kirby, 2000; Brighton, 2003), one is left to wonder if this really demands a modem brain at all (Dediu, 2007). Positing a recent, punctual origin of modem humans in Africa and subsequent spread across the world with replacement of the preexisting humans and without admixing with them.
85
Given the previous discussion and the more detailed criticisms in Dediu (2006, 2007), it is proposed that a gradual, accretionary model for language evolution, covering an extensive period of timec, offers a better alternative. This model is fundamentally based on genetic and cultural diverse populations, involved in a dynamic network of interactions (Dediu, 2007), whereby populations are continuously expanding, contracting, becoming extinct and being replaced, but in permanent contact with other such populations, and part of regional and global networks of genetic and cultural exchanges. Inter-individual and interpopulation diversities are thus essential ingredients, and not just some form of noise which must be filtered out in order to gain access to the core, universal properties of interest. It can be said that it is through complex interactions between diverse components that universals arise, in the first place. However, the only missing fhdamental ingredient for such gradual, accretionary models of language evolution is represented by the small, Darwinian genetic changes.
2.
Linguistic and genetic causal correlations -the case of linguistic tone, ASPM and Microcephalin
More explicitly, how do these gradual, accretionary steps of language evolution look like? They must certainly involve both genetic and linguistic coordinated small changes, but so far, there seems to be no explicit model of their nature and dynamics. Dediu & Ladd (2007) present a potential case of such a relationship, concerning two brain growth and development related genes, ASPM and Microcephalin, and linguistic toned. ASPM and Microcephalin are two human genes whose deleterious mutations cause primary recessive “high-functioning” microcephaly (Gilbert, Dobyns & Lahn, 2005; Cox ef al., 2006). Moreover, the evolution of these two genes has been accelerated in the lineage leading to humans (-2 favorable changes/million years), with Microcephalin evolving preponderantly during the early and ASPM during the late stages of human evolution (Gilbert, Dobyns & Lahn, 2005). Thus, these two genes represent strong candidates for key players in the evolution of human-specific traits, and, I totally disagree with placing convenient discontinuous boundaries (Dediu, 2006, 2007), but if such a punctual event is required, probably the emergence of Homo could be taken as the onset for this scenario (Dediu, 2007). This account is based on Dediu & Ladd (2007) and Dediu (2007).
86
even if their exact fimctions are not yet clear, they seem to be critical regulators of brain growth and development in humans. Two “derived” haplogroups, one for each of these two genes, have been recently identified‘, showing signs of ongoing natural selection in humans (Mekel-Bobrov et al., 2005; Evans et al., 2005). They seem to have appeared recently (-5 and 37 thousand years ago, respectively) and MCPH-D even seems to have introgressed into the modem human lineage from a different archaic form (Evans et al., 2006), thus representing one of the strongest arguments against Recent Out of Africa with Replacement (Dediu, 2007). To date, the naturally selected phenotypic effects of these haplogroups have not been found: they seem not to be connected with intelligence (Mekel-Bobrov et al., 2007), brain size (Woods et al., 2006), head circumference, general mental ability, social intelligence (Rushton et al., 2007), or the incidence of schizophrenia (River0 et al., 2006).
Linguistic tone is a typological linguistic feature (Haspelmath et al., 2005) which, in broad terms, reflects the usage of pitch to convey differences in meaning at the level of the word (Yip 2002: 1; Dediu & Ladd, 2007). Tone is a very complex topic in linguistics and its typology is still debated, especially the case of so-called “pitch-accent languages” (Yip, 2002 or Dediu, 2007:29 1-293). Geographically, tone languages tend to be clustered in sub-Saharan Africa, East and South-East Asia and Central AmericaICaribbeanlAmazonia (Maddieson, 2005; Dediu & Ladd, 2007) and, historically, tone can be acquired and lost through ordinary processes, like the effects of voicing contrasts in obstruents (Yip, 2002:35-38; Hyman, 1978). The proposal of Dediu & Ladd (2007) is that ASPM-D and MCPH-D might determine a very small bias at the individual level in the acquisition or processing of linguistic tone, bias which can be amplified in a population through the cultural transmission of language across generations, and manifested in differences between the languages spoken by such populations. They support this hypothesis by the fact that the population frequencies of ASPM-D and MCPH-D correlate negatively with the use of linguistic tone by that population (see Fig. l), even after geography and shared linguistic history have been controlled for. This correlation is highly significant and important when compared with a sample of 983 genetic variants covering the nuclear genome and 26 linguistic features representing various aspects of phonology, Denoted in the following as ASPM-D and MCPH-D, respectively
87
morphology and syntax (Dediu & Ladd, 2007; Dediu, 2007). These facts suggest that the correlation between tone, ASPM-D and MCPH-D is not satisfactorily explained by the “usual suspects”, namely contact (genetic and linguistic), migrations or descent from a common ancestor (or a combination thereof). However, exactly these factors represent the explanation for most of the language-gene correlations detected to date.
Figure 1: Tone (open squares) and non-tone languages (filled squares) versus the population frequency of ASPM-D (horizontal axis) and MCPH-D (vertical axis).
on
0.1
0.2
03
OA
05
0.6
That such biases can work has been suggested by both computer (Smith, 2004; Nettle, 1999)f and mathematical (Kirby, Dowman & Griffiths; 2007) models, but, if confirmed by further experimental studiesg, this would represent the first case of a genetically-influenced linguistic bias manifest at the population level. 3.
Linguistic and genetic causal correlations - the mechanism of gradual, accretionary language evolution?
This type of bias could represent the kind of mechanism required to underlie gradual, accretionary accounts of language evolution, whereby small genetic changes can appear, influence the capacity for language in diverse populations and possibly became part of the universal, species-wide linguistic capacity. Such small genetic changes which have a linguistic biasing effect, not necessarily as their primary phenotype, can become fixed across the entire Homo supiens species due to either genetic drift or natural selection. In the second case, it is possible that the phenotypic trait under natural selection to be totally unrelated to language (as Dediu & Ladd (2007) suggest is A recently conducted computer simulation (Dediu, in preparation) seems to suggest that only certain
types of genetically-influenced linguistic biases can become manifest through cultural transmission. Such a study is currently in preparation, focusing on adult speakers.
88
the case with ASPM-D and MCPH-D), or, it is also possible that it is exactly the gene's effects on language which determine its increase in frequency. Whichever the exact scenario, such a language-biasing genetic variant will induce a change in the linguistic landscape. Moreover, future genetic variants will act in this modified linguistic landscape and their fate will be influenced by the particular history of previous mutations. This complex accretionary process, involving interactions between many genetic variants and linguistic states across evolutionary time, represents a more plausible account for the evolution of language, being able to better accommodate the data and theories originating in evolutionary biology, genetics, behavior genetics and linguistics. Therefore, to return to the three main controversies presented in the beginning, the model proposed here argues for a gradual/accretionary transition, involving a long stretch of evolutionary time and, as argued by Smith (2006), such a gradual model can also help settle the dispute concerning the nature of the protolanguage (holistic vs. synthetic). Acknowledgements
The author was funded by an ORS Award, a Studentship from the College of HSS, the University of Edinburgh and an ESRC Postdoctoral Fellowship. References
Bickerton, D. (2000). How protolanguage became language. In Knight, Studdert-Kennedy & Hurford (Eds.). Bishop, D. V. M. (2003). Genetic and environmental risks for specific language impairment in children. Intern. J. Ped. Utorhinolaryng. 675 1, S 143-S157. Brighton, H. (2003). Simplicity as a driving force in linguistic evolution. PhD. Thesis, The University of Edinburgh. Cangelosi, A., Smith, A.D.M. & Smith, K. (Eds.) (2006). The Evolution of Language. London: World Scientific. Christiansen, M.H. & Kirby, S. (Eds.) (2003). Language Evolution, Oxford University Press. Cox, J., Jackson, A,, Bond, J., Woods, C. (2006). What primary microcephaly can tell us about brain growth. Trend. Molec. Med 12, 358-366. Crow, T. (2002a), Introduction. In Crow, T. (Ed.), The Speciation of Modern Homo Sapiens (pp. 1-20). Oxford: Ofxord University Press. Crow, T. (2002b), Sexual selection, timing and an X-Y homologous gene: Did Homo sapines speciate on the Y chromosome? In Crow, T. (Ed.), The Speciation of Modern Homo Sapiens (pp. 197-216). Oxford: Ofiord University Press.
89
Dawkins, R. (1997). Climbing Mount Improbable. Penguin Books. Dediu, D. (2006). Mostly out of Africa, but what did the others have to say?, In Cangelosi, Smith & Smith (Eds.) (pp. 59-66). Dediu, D. (2007). Non-Spurious Correlations between Genetic and Linguistic Diversities in the Context of Human Evolution, PhD Thesis, The University of Edinburgh. Dediu, D. & Ladd, D.R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, ASPM and Microcephalin. PNAS 104, 10944-10949. Evans, P. D., Gilbert, S., Mekel-Bobrov, N., Vallender, E., Anderson, J., VaezAzizi, L. ef al. (2005). Microcephalin, a Gene Regulating Brain Size, Continues to Evolve Adaptively in Humans. Science 309, 1717 - 1720. Evans, P.D., Mekel-Bobrov, N., Vallender, E.J., Hudson, R.R. & Lahn, B.T. (2006). Evidence that the adaptive allele of the brain size gene microcephalin introgressed into Homo sapiens from an archaic Homo lineage. PNAS 103, 18178-18183. Fisher, S., Lai, C.S. & Monaco, A.P. (2003). Deciphering the genetic basis of speech and language disorders. Annual Review of Neuroscience 26, 57-80. Gerhart, J. & Kirschner, M. (1997). Cells, embryos and evolution: toward a cellular and developmental understanding of phenotypic and evolutionary adaptability. Massachusetts: Blackwell Science. Gilbert, S. L., Dobyns, W. B. & Lahn, B. T. (2005). Genetic links between brain development and brain evolution. Nut. Rev.: Genetics 6, 581-590. Haspelmath, M., Dryer, M. S., Gil, D. & Comrie, B. (2005). The World Atlas of Language Structures. Oxford University Press. Hurford, J. (2003), The language mosaic and its evolution. In Christiansen, M. & Kirby, S. (Eds.) pp. pp. 38-57. Hurford, J. R., Studdert-Kennedy, M. & Knight C. (Eds.) (1998). Approaches to the Evolution of Language - Social and Cognitive Bases, Cambridge University Press. Hyman, L.M. (1978). Historical tonology. In V.A. Fromkin (Ed.) Tone: A Linguistic Survey (pp. 257-269). London: Academic Press. Kirby, S. (2000). Syntax without Natural Selection: How compositionality emerges from vocabulary in a population of learners. In Knight,, StuddertKennedy & Hurford (Eds.) pp. 303-323. Kirby, S., Dowman, M. and Griffiths, T. (2007). Innateness and culture in the evolution of language. PNAS 104, 524 1-5245. Knight, C., Studdert-Kennedy, M. & Hurford, J.R. (Eds.) (2000). The Evolutionary Emergence of Language - Social function and the origins of linguisticform, Cambridge University Press. Maddieson, I. (2005). Tone. In Haspelmath, Dryer, Gil, D. & Comrie (Eds.) Mekel-Bobrov, N., Gilbert, S. L., Evans, P. D., Vallender, E. J., Anderson, J. R., Hudson, R. R. et al. (2005). Ongoing Adaptive Evolution of ASPM, a
90
Brain Size Determinant in Homo sapiens. Science 309, 1720 - 1722. Mekel-Bobrov, N., Posthuma, D., Gilbert, S.L., Lind, P., Gosso, M.F., Luciano, M., et al. (2007). The ongoing adaptive evolution of ASPM and Microcephalin is not explained by increased intelligence. Human Molecular Genetics 16, 600 - 608. Mithen, S. (1996). The prehistory of the mind: A search for the origins of art, science and religion. London: Thames & Hudson. Nettle, D. (1999). Using Social Impact Theory to simulate language change. Lingua 108,95-117. Pinker, S . & Jackendoff, R. (2005). The faculty of language: what's special about it? Cognition 95, 201-236. Ploniin, R. & Kovas, Y. (2005). Generalist genes and learning disabilities, Psychological Bulletin 13 1, 592-6 17. Rivero, O., Sanjuin, J., Molto, M.-D., Aguilar, E.-J., Gonzalez, J.-C., de Frutos, R., Najera, C. (2006). The microcephaly ASPM gene and schizophrenia: A preliminary study. Schizophrenia Research 84,427-429. Rushton, J., Vernon, P., Bons, T. (2007). No evidence that polymorphisms of brain regulator genes Microcephalin and ASPM are associated with general mental ability, head circumference or altruism. Biol. Letters 3, 157-160. Skelton, P. (1993). Evolution: a biological and palaeoantological approach, The Open University. Smith, K. (2004). The evolution of vocabulary. J. Theor. Biol. 228, 127-142. Smith, K. (2006). The protolanguage debate: bridging the gap? In Cangelosi, Smith & Smith (Eds.) pp. 315-322. Stringer, C. B. & Andrews, P. (1988). Genetic and fossil evidence for the origin of modem humans. Science 239, 1263-1268. Stromswold, K. (2001). The heritability of language: a review and metaanalysis of twin, adoption, and linkage studies. Language 77, 647-723. Tallermann, M. (2006). A holistic language cannot be stored, cannot be retrieved, In Cangelosi, Smith & Smith (Eds.) pp. 447-448. West-Eberhard, M. J. (2003). Developmental plasticity and evolution. Oxford: Oxford University Press. Woods, R., Freimer, N., De Young, J., Fears, S., Sicotte, N., Service, S. et al. (2006). Normal variants of Microcephalin and ASPM do not account for brain size variability. Human Molecdur Genetics 15, 2025-2029. Wray, A. (2000). Holistic utterances in protolanguage: the link from primates to humans. In Knight, Studdert-Kennedy & Hurford (Eds.) pp. 285-302. Wray, A. (2002). Dual processing in protolanguage: performance without competence. In Wray, A. (Ed.), The Transition to Language (pp. 113-137). Oxford: Oxford University Press. Yip, M. (2002). Tone. Cambridge: Cambridge University Press.
SPONTANEOUS NARRATIVE BEHAVIOUR IN HOMO SAPIENS: HOW DOES IT BENEFIT TO SPEAKERS? JEAN-LOUIS DESSALLES ENST, ParisTech, 46 rue Barrault Paris, F- 75013, France The fact that human beings universally put much energy and conviction in reporting events in daily conversations demands an explanation. After having observed that the selection of reportable events is based on unexpectedness and emotion, we make a few suggestions to show how the existence of narrative behaviour can be consistent with the socio-political theory of the origin of language.
1. Spontaneous narratives: a fundamental component of language Dozens of times everyday, human individuals feel the urge to signal current events or to report past events to their conspecifics. In doing so, they respond to specific stimuli such as departure from norm, coincidences or emotion. This event reporting behaviour seems to be unique in nature (Boyd 2001), as remote analogues such as bee dance and alarm calls do not compare with it. During conversational narratives, the speaker may hold the floor during several minutes, with no other interruption than minimal approval signals emitted by interlocutors. Within a Darwinian framework, the existence of such behaviour requires an explanation. How does this time-consuming activity, which deals most often with Wile anecdotes that are unlikely to be of any direct interest for survival, benefit not only to listeners, but also to speakers? We first show the importance of narratives by providing quantitative estimates. Then we outline a cognitive characterization showing that individuals respond to definite stimuli when selecting reportable events. Lastly, we look for plausible Darwinian explanations for the existence of narrative behaviour, in line with our socio-political account of the origins of language. 2. Narratives in daily speech
Human beings make extensive use of their language ability. Individuals have been observed to speak 15 000 words a day on average (Mehl et al. 2007). This 91
92 behaviour covers several activities which can be distinguished by the cognitive mechanisms involved. The two main components of spontaneous language use are discussion (argumentation) and narration (event reporting) (Dessalles 2007a). The proportion of event reporting may vary significantly from one corpus to the next. Suzanne Eggins and Diana Slade (1997) observed the repartition indicated in Table 1 in the three hours of casual conversation data they collected during coffee breaks in three different workplaces. Storytelling takes up the major share in this distribution. Table 1: Distribution of conversational topic types in Eggins and Slade’s corpus (1997 p. 265)
Conversaliontype Storytelling ObservationKomment Opinion Gossip Joke-telling
%
43.4 19.75 16.8 13.8 6.3
We made similar measures on our main corpus, composed of 17 hours of conversations, recorded during meals at family gatherings between 1978 and 1988 among educated individuals belonging to a French middle class family. The distribution of conversation types was explored through a sampling method (Figure 1). The corpus has been digitalized, and 150 excerpts of 120 s. have been automatically extracted at random positions. For each excerpt, the central utterance (occurring at time 60 s.) has been assigned a category (Table 2). U~~
Discussion
7%
5%
overall time
conversation
Figure 1. Distribution of utterance types in our own corpus, assessed through a sampling method.
93
The small relative size of the ‘empty’ category reveals that in this family meal context, language is used 89% of the time. Conversation proper, which excludes utilitarian (more or less ritualized) speech, occupies nearly 70% of the time (as most of inaudible utterances are likely to be conversational). Narratives and signalling together amount to 26% of conversation time. Though it is less than in Eggins and Slade’s corpus, event reporting still represents a significant share of spontaneous language use. Quantitative data about conversational narratives are unfortunately lacking, especially for intercultural comparisons. We can only conjecture that no culture lacks spontaneous narrative behaviour, as such a people would have been well-known for its conversational specificity. Table 2: Definition of utterance types in the family conversation corpus Utterance iype
Definition
Example
Narratives
Reports about past situated events
Dishwasher breakdown which turned out to be a mere leak from a plastic bottle
Signalling
Drawing attention to current facts
Showing a new electronic personal organizer
Discussion
Dealing with problems
Who’s going to be the next Prime Minister
Utilitarian speech
Offers, requests,. . . (mostly in relation with food)
Offering more foie gras
Other
Songs, child screaming
Singing to bring a child to sing along
Inaudible
Superimposed noise, simultaneous conversations
Empty
No speech, no utterance in progress
Narrative speech contrasts with the other main discourse category, discussion. Superficially, the most manifest difference is that narratives deal with situated events, whereas discussion deals with problems and their solutions. When facts are mentioned during a discussion, they are most often not situated (e.g. ‘Fabius [a potential Prime Minister] makes a better impression on TV’), though there are cases in which situated facts or genuine stories are recounted in support of an argument (two occurrences in the sample). But narration and discussion can be opposed more fundamentally on the cognitive ground. Discussion consists in settling an issue or solving an epistemic puzzle. It proceeds through a characteristic problem-abduction-revision procedure (Dessalles 2007a). By contrast, event reporting relies on unexpectedness and emotion elicitation (see below). Though narrative and argumentative moves may
94
be intertwined in conversation, they often remain separate: arguments spark off arguments in a typical problem-solution alternation, whereas narratives spark off new narratives, generating ‘story rounds’ (Tannen 1984 p. 100). Events reported in conversations are most often not fictional (only one example in the sample can be considered fictitious, as it consists in describing the content of a cartoon). Though fiction seems to obey definite patterns (Hogan 2003), conversational narratives seem to be ruled by even more constraining imperatives. Any narrative must have an interesting point (Labov 1997), as otherwise speakers may be regarded as socially inept (Polanyi 1979 p. 21 1). In what follows, we propose a cognitive characterization of competent storytelling, before considering its possible biological role. 3. The selection of reportable events
The dozens of episodes that human beings tell each day through conversational narratives represent only a tiny fraction of their actual experience. The selection of reportable events (Labov 1997) obeys specific patterns. In what follows, we highlight two fundamental requirements that are deeply rooted in our cognition: reportable events must be unexpected andor arouse emotion. 3.1. Unexpectedness The property of unexpectedness covers various aspects of newsworthiness, including deviance, atypicality, rarity, proximity, remarkable structures, and coincidences. In former accounts, we equated unexpectedness with improbability (Dessalles 2002). Though, the model left important cases unexplained. Individuals consider situations that depart from the norm for some qualitative reason (e.g. jogging nuns) highly unexpected, what probability theory fails to explain. Our new model (Dessalles 2007b), based on formal complexity (Li & Vitanyi 1993), makes correct predictions for all situations perceived as unexpected. We found that the relevant parameter is the contrast between the expected and the actual complexity of the situation. While a standard encounter with nuns is expected to be complex to individuate, encountering jogging nuns is maximally simple, thanks to their unique characteristics. Similarly, a narratable fortuitous encounter is all the more unexpected since the person one bumped into is simple (a close friend or a celebrity) and the place is remote enough to be complex. Complexity contrasts turn out to have a systematic influence on reportability (Dessalles 2007b). Assessing variations of individuation complexity is certainly not a trivial cognitive operation, but every human being seems to be equipped to perform it.
95
The universal sensitivity to unexpectedness is a fundamental component of narrative competence and must have been selected for definite reasons. 3.2. Emotion Emotion is the other parameter that stimulates event reporting. Emotional situations are systematically shared (Rim6 2005). In the following conversation from Neal Norrick’s corpus (2000 p. 64), a young man recalls an accident story his aunt told him (transcription details omitted). Mark: you know what happened to my one of my aunt’s friends out in Iowa? Like when- when she was younger, she had a headgear from braces, and these two girls were wrestling around just playing around, wrestling. And one girl pulled her headgear offher mouth and let it snap back. And it slid up her face and stuck in her eyes and blinded her. Jacob: wow. Mark: isn’t that horrid? That’s horrid. Blinded her for Ife. Isn’t that horrid. That’sjust- I mean justfrom goofing around. Unexpectedness and emotion are often combined to enhance reportability. Unexpectedness here lies in the contrast between standard play situations, which are complex to individuate because they are all alike, and the actual situation which is unique by its consequences. Mark’s final remark points to this contrast. Studies show that emotion is reactivated during recounts, and this can be seen as a paradox (Rime 2005 p. 109). Quite surprisingly, bringing back memories about negative emotions and sharing them with listeners is experienced as enjoyable by both parties. People like to talk not only about positive events, but also about events that generated fear, sadness, anger, guilt, embarrassment, contempt and even shame. They also like to listen to others’ corresponding experiences, despite the fact that the evocation of such events produces negative feelings similar to the original ones. It seems that the pleasure of sharing these feelings compensates for experiencing them again. 4. Why are conversational stories told?
The pervasive presence of stories in conversations is an embarrassment for most accounts of the evolutionary origin of language. If language has been selected because of its effect on the welfare of the group (Victorri 2002; Castro et al. 2004 p. 734; Ritt 2004 p. 1-2) or as a fair exchange of information based on strict reciprocity (Pinker 2003 p. 28; Nowak 2006 p. 1561), then the efforts that speakers devote to tell stories for all to hear, most often with much emphasis to
96
highlight interest, is incomprehensible. We would expect speakers to whisper minimal factual information to specific ears and then demand of listeners that they reciprocate. What may be true for crucial advice (such as which shares to buy on the stock exchange) does not apply to conversational stories. Other accounts emphasize the educational value of language (Fitch 2004; Castro et al. 2004 p. 725). But stories are found to be spontaneously told, not only from adults to children, but also from adults to adults and even from children to adults. As soon as by nine months of age, children spontaneously point to unexpected stimuli (Carpenter et al. 1998). More generally, theories of language function that emphasize the practical value of information are at odds with the fact that most stories are about futile matters. Unexpected events are, by essence, unlikely to occur again. Ally practical processing of information would concentrate on vital information (danger, food, mating opportunities) and would neglect the myriad of anecdotal facts that fill daily chatter. Animals do not care about situations just because they are unexpected: they show no interest four-leaf clovers, they would regard a unicorn as a mere horse and they do not care about coincidences. Human communication, on the other hand, is universally replete with details about inconsequential episodes, just because of their unexpectedness, and this requires an evolutionary explanation. The key difference seems to lie in human sociality: we crave the attention of others and narratives are a major way to do so (Boyd 2001). This makes sense within our political theory of the origin of language (Dessalles 2007a), which states that individuals use language to demonstrate qualities that are in demand in the establishment of solidarity bonds. To fit in with the model, however, the qualities shown when telling stories must have political significance. In what way does the ability to produce Unexpectedness and elicit emotions correlate with being a valuable coalition partner? Though these issues have not been properly investigated yet, we may have some idea. First, from early age on, individuals are in competition to demonstrate that they knew first, as in the next exchange, observed between two children aged eight and ten:
M: Did you see there are more [hot-air] balloons up there this morning? Q: Yes, Iknow. M: You, be quiet! I’m not talking to you, I’m talking to the others. [To his father] Did you see there are balloons up there this morning? Being the first who noticed the presence of balloons is important for M, as his second utterance shows. The first-to-know phenomenon is the most obvious case in which unexpectedness is produced. Reclaiming authorship for the news
97 is understandable if, as we suggest, language is display (Dessalles 2007a). The informational competence that individuals display by offering scoop stories to conspecifics reveals crucial in the specific political context of our species. Horninins are ‘apes with spears’.’ The introduction of deadly weapons in an ape species dramatically changes the political game: physical strength becomes much less relevant. The best strategy to survive is to share one’s fate with other individuals (here, we radically depart from Paul Bingham’s (2001) theory, in which the main effect of weapons is to enforce cooperation through retaliation within coalitions). In a context in which every individual must choose allies, the problem is to determine the best ones. Those who are able to spot unusual goings-on are a good choice, as they are the first to warn for complex danger. Since that time, individuals crave for displaying their ability to notice unexpectedness, as they are descended from ancestors for whom it was a good way to be accepted as coalition partners. Still now, individuals who are able to bring unexpected news are perceived as ‘interesting’ and are preferentially chosen as friends. They get social success, whereas boring individuals are avoided. Of course, the political game is more complex in our species than in the first hominin species. However, those who are able to spot unexpected situations or activities remains a crucial asset in a species in which being taken by surprise may be fatal. The claim is thus that there is continuity in the signalling and narrative behaviour along the hominin lineage down to sapiens, as it remains a way for individuals to show off their value as coalition partner. Why do individuals tell emotional narratives? Emotions displayed in stories are associated with political values, such as solidarity or courage. By showing that they are able to experience compassion, pity, concern, indignation at cowardice, cheating or unfairness, that they admire selfless love and feats, individuals try to appear as ideal friends. If we accept that successfully communicated emotions are hard to fake, then expressing them through stories is a reliable indication that one really values the corresponding qualities. Human beings are information oriented animals, who exchange stories on a daily basis. We tried to indicate how this fact, which is hard to explain in traditional accounts of language origins, can be a natural outcome of the particular socio-political organization of our species, in which individuals must compete by demonstrating their information qualities in order to attract friends.
’
Spears may be thought to have been the first efficient weapons used by hominins: they are easy to make and to cany (the use of spears may have contributed to make biped walk advantageous).
98 References
Bingham, P. M. (2001). Human evolution and human history: A complete theory. Evolutionary anthropology, 9 (6), 248-257. Boyd, B. (2001). The origin of stories: Horton hears a who. Philosophy and Literature, 25 (2), 197-214. Carpenter, M., Nagell, K. & Tomasello, M. (1998). Social cognition, joint attention, and communicative competence from 9 to 15 months of age. Monographs of the Society for Research in Child Devel., 255(63), 1- 143. Castro, L., Medina, A. & Toro, M. A. (2004). Hominid cultural transmission and the evolution of language. Biology andphilosophy, 19,721-737. Dessalles, J-L. (2002). La fonction shannonienne du langage : un indice de son evolution. Langages, 146, 101-1 11. Dessalles, J-L. (2007a). Why we talk - The evolutionary origins of language. Oxford: Oxford University Press. http://www.enst.fr/-iId/W WT/ Dessalles, J-L. (2007b). ComplexitC cognitive appliquee a la modelisation de l'inter5t narratif. Intellectica, 45. Eggins, S . & Slade, D. (1997). Analysing casual conversation. London: Equinox. Fitch, W. T. (2004). Evolving honest communication systems: Kin selection and 'mother tongues'. In D. K. Oller & U. Griebel (Eds.), The evolution of communication systems, 275-296. Cambridge, MA: MIT Press. Hogan, P. C. (2003). The mind and its stories - Narrative universals and human emotion. Cambridge, UK: Cambridge University Press. Li, M. & Vitanyi, P. (1993). An introduction to Kolmogorov complexity and its applications. New York: Springer Verlag, ed. 1997. Mehl, M. R., Vazire, S., Ramirez-Esparza, N., Slatcher, R. B. & Pennebaker, J. W. (2007). Are women really more talkative than men?. Science, 317, 82. Norrick, N. R. (2000). Conversational narrative: storytelling in everyday talk. Amsterdam: John Benjamins Publishing Company. Nowak, M. A. (2006). Five rules for the evolution of cooperation. Science, 314, 1560-1563. Pinker, S. (2003). Language as an adaptation to the cognitive niche. In M. H. Christiansen & S. Kirby (Eds.), Language Evolution, 16-37. Oxford: Oxford University Press. Polanyi, L. (1979). So What's the point?. Semiotica, 25 (3), 207-241. Rime, B. (2005). Lepartage social des &motions.Paris: PUF. Ritt, N. (2004). SelJsh sounds and linguistic evolution - A Darwinian approach to language change. Cambridge: Cambridge University Press. Tannen, D. (1 984). Conversational style - Analyzing talk among friends. Nonvood: Ablex Publishing Corporation. Victorri, B. (2002). Homo narrans : le r61e de la narration dans I'emergence du langage. Langages, 146, 112-125.
WHAT DO MODERN BEHAVIORS IN HOMO SAPIENS IMPLY FOR THE EVOLUTION OF LANGUAGE? BENOiT DUBREUIL FNRS, Universite‘libre de Bruxelles, 90 rue Rose-de-Lima, Montre‘al, H4C 2K9. CANADA The emergence of modem cultural behaviors in Homo sapiens 150,000-50,000years ago is often explained by a change in the faculty of language, such as the development of recursive syntax or autonomous speech. In this paper, I argue that the link between modem sapiens behaviors and language evolution has never been made convincingly and that a change in the faculty of language can hardly account for the whole range of new behaviors that appear with Homo sapiens. I propose that the domain-general cognitive ability that psychologists call level-2 perspective-taking - an ability closely related to higher theory of mind and metalinguistic awareness - is more parsimonious in explaining modern sapiens behaviors.
1. Introduction In the archeological record, the emergence of “symbolic artifacts” - such as the abstract engravings and marine shell beads found in the Blombos Cave in South Africa - are universally taken as evidence of modern cognition and language in the human lineage (McBrearty & Brooks 2000; Klein & Edgar 2002; Henshilwood & Marean 2003; d’Errico et al. 2003). In the literature on the evolution of language, influential accounts have connected the emergence of symbolic artifacts with a change in the faculty of language (Bickerton 2003; Corballis 2004). The most important argument in favor of this thesis was the discovery that a mutation of the FOXP2 gene occurred during the last 200,000 years, which is consistent with the emergence of anatomically and behaviorally modern humans (Enard et al. 2002). Dysfunction of the FOXP2 gene in modern humans is associated with the underactivation of the Broca area, where the mirror neurons are located, and with a deficit in motor control during speech. From this point of view, modern sapiens behavior was caused by a mutation of the FOXP2 gene that allowed the development of autonomous speech (Corballis 2004) or that facilitated the iterative productivity of language (Lieberman 2005). 99
100
However interesting, the hypothesis has been recently invalidated by genetic studies that revealed that the alleged FOXP2 mutation was also present in Neanderthals and, most probably, shared with their common ancestor with modern humans 400,000 or 300,000 years ago (Krause et al. 2007). The idea that a deficit in speech production explains the behavioral gap between Neanderthals and modern Homo sapiens has now lost its main support. The debate on the foundation of modern behaviors in Homo sapiens is thus reopened. On the one hand, the change was so sudden and fundamental that it leaves little room for hypotheses that do not include any biological or cognitive factor (Klein & Edgar 2002). On the other hand, no anatomic or genetic changes support the link between the faculty of language and modern sapiens behaviors. In this paper, I challenge the idea that a change in the faculty of language caused the emergence of modern sapiens behaviors. I make three points. First, at the conceptual level, I argue that the link between the use of symbols in the archeological record and the evolution of the faculty of language has never been made convincingly. Second, I point out that modern sapiens behaviors include many traits that are not clearly related to language or communication. Third, I propose a domain general cognitive framework that is realistic at the neuropsychological level and that connects specific behavioral traits more precisely with cognitive abilities. 2. Modern behaviors and language evolution: the link is still missing
The conceptual link between the presence of symbols in material culture and the faculty of language is intuitive in many respects. Is not language firstly about manipulating symbols? Under closer examination, however, the link is not so obvious. First, symbolic reference itself - the very capacity to use arbitrary signs to refer to things in the world - is not so difficult. Trained apes are able to manipulate symbols, while young children begin to refer symbolically to things at around 12 months and understand that every object has a name (the “naming insight”) at about 18 months of age. There is no question, however, that neither is able to produce symbolic artifacts. It is precisely the ease with which apes and toddlers master symbolic reference that brought many scholars to posit an early emergence of protolanguage in the human lineage - in Homo erectus sensu lato - and to adopt a view of language evolution centered either on syntax (Bickerton 2003) or autonomous speech production (Corballis 2004). Nevertheless, the link between recursive syntax, autonomous speech and the manipulation of symbolic artifacts has always been taken for granted and there
101
is no obvious reason as to why the use of such symbols in material culture would depend on recursive syntax or autonomous speech (Bouchard p. c.). The link between symbolic artifacts and the faculty of language is weak for a second reason, related to the nature of the archeological evidence. The most uncontroversial evidence of symbolic behaviors today are the abstract engravings and marine shell beads found in the Blombos Cave in South Africa (ca. 77 - 73 ka). The difficulty is that there is no way to prove that beads and engravings were actually used “symbolically”, as they did not necessarily stand for something else (Wynn & Coolidge 2007). There is no a priori reason to exclude the possibility that beads and engravings were firstly decorative rather than symbolic. 3. Modern sapiens behaviors are not all symbolic
Another argument that weakens the connection between modern sapiens behavior and the evolution of the faculty of language is that symbolic artifacts coincide in the archaeological record with many other original behavioral traits that bear no direct symbolic or communicational component. The archaeology of the Middle Stone Age in Africa is still poorly known (especially compared with Middle Paleolithic in Europe), but it is clearly during this period that we find the first evidence of long-distance exchange, structured living spaces, formal and standardized bone and stone tools, as well as regional styles in material culture (McBrearty & Brooks 2000). The increasing pace of cultural innovation has been ascribed to the “cumulative aspect” of modern culture in humans (Tomasello 1999; Richerson & Boyd 2005). The originality of modern behaviors is particularly salient by 85,000BP in South Africa, where the Still Bay and the Howiesoons Poort industries can be described as an “African Upper Paleolithic” (Henshilwood in press). Most of modern behaviors, however, are not noticeably symbolic or communicative (Henshilwood & Marean 2003; Chase 2006). In Blombos, for instance, engraved ochres and marine shell beads coincide with formal bone tools, finely made bifacial points, and evidence of structured living spaces. How autonomous speech or recursive syntax can account for such innovations remains elusive. 4. The level-2 perspective-taking hypothesis
Another possibility is that the emergence of modern sapiens behavior did not result from a change in language, but from a domain general cognitive change that could explain both symbolic and non-symbolic innovations. From this point of view, autonomous speech or recursive syntax could have been in place much
102 before behaviorally modern humans. For the argument to be convincing, one has to identify precisely what cognitive mechanism underlies the use of symbolic and other modern behaviors (Wynn & Coolidge 2007). One influential hypothesis is that our general social intelligence lies behind modern sapiens behavior. The social brain hypothesis links the evolution of culture with the development of Theory of Mind (TOM) and brain growth (Dunbar 2004). The social brain hypothesis faces two problems. First, the emergence of modern sapiens behavior is rather rapid and is not clearly correlated with any significant brain growth. Second, there are different levels of TOM and each one is based on many cognitive abilities, some of which are not properly social. For instance, the higher form of TOM - the understanding of false beliefs -relies on many domain general cognitive abilities like working memory and inhibitory control (Carlson et al. 2002). These abilities are used to regulate all aspects of our life and do not belong specifically to social intelligence. I propose that the cognitive mechanism behind modern sapiens behavior is one of the general mechanisms underlying the higher form of TOM: the ability to hold in mind a stable representation of conflicting perspectives on objects (Henshilwood & Dubreuil in press). This general ability underlies many related cognitive tasks: the explicit ascription of false beliefs to others (higher level TOM), the distinction between appearance and reality, as well as what psychologists call “level-2 perspective-taking”, that is, the capacity to understand not only what others see (level-1 perspective-taking, present in apes), but also to reconstruct in one’s mind how they see it (Flavell 1992; Perner et al. 2002; Moll & Tomasello 2006). As these tasks are all about complex perspective-taking, I will label my argument the “level-2 perspective-taking hypothesis”. Level-2 perspective-taking is absent in apes and develops between 4 and 5 years of age in human children, depending on the cognitive load of the task. It appears pretty much at the same time as the capacity to understand abstract symbols such as written numbers and the capacity to understand that written words have a stable meaning. At the neuropsychological level, such complex tasks as understanding false beliefs or level-2 perspective-taking activate numerous regions of the brain, although one region, the temporoparietal junction, has been specifically associated with representing conflicting perspectives (Aichhorn et al. 2006). This is consistent with the neuroanatomical data according to which the emergence of behaviorally modern humans did not coincide with any major reorganization of the brain. The modern pattern of activation of the temporoparietal junction could have resulted from a slight
103 increase in the interconnectivity of this region of the brain and not from the general encephalization process (contrary to Dunbar’s (2004) social intelligence hypothesis). There is no question that the analogy between ontogeny and phylogeny should be handled carefully, as the later certainly does not recapitulate the former. But developmental psychology is here supported by cognitive psychology and neuropsychology in showing that one general mechanism allows us to get a stable representation of conflicting perspectives. The main interest of the level-2 perspective hypothesis is to explain what scenarios invoking a change in the faculty of language do not explain: the diversity of modern sapiens behavior. First, it accounts for the emergence of symbolic artefacts. Once one is able to distinguish appearance from reality, or to represent conflicting perspectives, one can takc objects to symbolize something else. Beads can be transformed into personal ornaments to symbolize social status, because people become able to see the personal ornament simultaneously as beads and as indicator of status (Dubreuil in press). This ability can be opposed to the ability that apes have to categorize dominance relations, rnotherchild relations or affiliation to kin groups, which is not accompanied by an ability to use arbitrary symbols to indicate these statuses. The level-2 perspective-taking suffices to explain this, because using a symbol to organize social relationships depend on the capacity to understand how an object looks from someone’s else point of view, and not only to understand that she sees the object (which apes can do). The level-2 perspective-taking hypothesis, however, does not imply that archaeological artefacts like engraved ochres or marine shell beads actually worked as symbols and stood for something else. They could have had a purely decorative or aesthetic function. From an archaeological perspective, there is no real way to tell aesthetic and symbolic functions apart. In the framework presented here, it does not really matter. At the cognitive level, using an object for a symbolic or aesthetic reason implies the same capacity to represent it from different viewpoints (Henshilwood & Dubreuil in press). In the symbolic case, one has to understand that the object refers to something (e.g. a social status) from someone else viewpoint. In the aesthetic case, one has to understand that wearing on object (e.g. marine shell beads) makes one looks good from someone else’s viewpoint. The level-2 perspective-taking hypothesis also explains other features of modern sapiens behaviors that have no obvious symbolic function. Formal bone and stone tools, for instance, are properly modern, but cannot be unambiguously taken to symbolize anything (Chase 2006). Moreover, attempts to link regional
104
styles and the formalization of tools with the evolution of language remain unconvincing since the transmission of knapping techniques relies more on observation than on complex communication (Wynn 1991). On the other hand, the impact of level-2 perspective-taking on the transmission of knapping techniques would be straightforward: hominins would gain the capacity to represent how others see the objects while knapping, facilitating the transmission of complex techniques and the emergence of what has been called “cumulative culture” (Tomasello 1999; Boyd & Richerson 2005). The same argument can be made concerning the emergence among modern Homo sapiens of structured living spaces including windbreaks, fixed hearths, storage pits, etc. The construction of such structures does not involve the use of complex communication, but can be explained parsimoniously by an enhanced understanding of spatial perspective, that would make possible the collective ascription of functions to specific parts of the living space.
5. Implications for the evolution of language The level-2 perspective-taking hypothesis implies that modern sapiens behaviors have not been caused by a change in the faculty of language, but by a change in domain general cognition. Nevertheless, I should make clear that this hypothesis does imply that anatomically modern humans were not using modern language 150,000 or 50,000 years ago. In fact, many independent arguments support the idea that the faculty of language was fully (or almost fully) in place at that time. On the one hand, the linguistic data indicate that all living humans can learn any language and thus share the same faculty of language. On the other hand, genetic data show that all living humans share a common ancestor in Africa between 150,000 and 50,000 years ago, and so there is a good reason to believe that a modern faculty of language was already in place. I should also mention that the level-2 perspective-taking does not rule out the possibility that many feature of modern language, including recursive syntax, could have appear much before behaviorally modern humans. In fact, many adaptations essential to the production of rapid spoken language were already in place in the common ancestor of Homo sapiens and Neanderthals. This includes a modern version of the FOXP2 gene, but also enhanced breathing control and increased brain sized and plasticity (Hublin 2005). To conclude, the argument presented here does not imply that level-2 perspective-taking has no impact on the way humans understand language. We know from child development that higher TOM and perspective-taking are closely related to metalinguistic awareness (Doherty & Perner 1998). This
105 should not be surprising, since the capacity to reflect and manipulate consciously linguistic expressions depends on the ability to understand that things can look different under diverse - and even conflicting - viewpoints. Finally, it does not imply that earlier development in the faculty of language in the human lineage was not instrumental in the emergence of level-2 perspectivetaking. Once again, children’s development shows how the ability to understand complex syntax and, more precisely, embedded sentences can predict pretty well the development of higher theory of mind. Acknowledgements
This research has been supported by the Belgian Fonds de la Recherche Scientfique (FRS-FNRS). The author thanks Denis Bouchard, Christopher Henshilwood, Jean-Marie Hombert and Michael J. Walker for helpful comments and discussion, as well as Chad Horne for editing the paper. References
Aichhorn, M., Perner, J., Kronbichler, M., Staffen W., & Ladurner, G. (2006). Do visual perspective tasks need theory of mind? NeuroImage, 30, 10591068. Bickerton, D. (2003). Symbol and structure: A comprehensive framework for language evolution. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 77-93). Oxford: Oxford University Press. Carlson, S. M., Moses, L. J., & Breton, C. (2002). How specific is the relation between executive function and theory of mind? Contributions of inhibitory control and working memory. Infant and Child Development, I I , 73-92. Chase, P. G. (2006). The emergence of culture: The evolution of a uniquely human way of l$e. New York, NY: Springer. Corballis, M. C. (2004). The origins of modernity: Was autonomous speech the critical factor? Psychological Review, 11I , 543-552. d’Errico, F., Henshilwood, C., Lawson, G., Vanhaeren, M., Soressi, M., Bresson, F., Tillier, A. M., Maureille, B., Nowell, A., Backwell, L., Lakarra, J.A., & Julien, M. (2003). The search for the origin of symbolism, music and language: A multidisciplinary endeavour. Journal of World Prehistory, 17(1), 1-70. Doherty, J., & Perner, J. (1998). Metalinguistic awareness and theory of mind: Just two words for the same thing? Cognitive Development, 13,279-305. Dubreuil, B. (In press). The cognitive foundations of institutions. In B. HardyVallCe & N. Payette (Eds.), Beyond the brain: Embodied, situated and distributed cognition. Newcastle: Cambridge Scholars Publishing. Dunbar, R. (2004). The human story. London: Faber.
106
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S. L., Wicbc, V., Kitano, T., Monaco, A. P., & Paabo, S. (2002). Molecular evolution of FOXP2, a gene involved in speech and language. Nature, 418,869-872. Flavell, J. H. (1992). Perspectives on perspective-taking. In H. Beilin & P. Pufall (Eds.), Piaget’s theory: prospects and possibilities (pp. 107-139). Hillsdale, NJ: Erlbaum. Henshilwood, C. S. (In press). The Upper Palaeolithic of southern Africa: The Still Bay and Howiesons Poort c. 85 000 - 55 000 years. In S. Reynolds & C. Menter (Eds.), African Genesis: Volume in honour of Philip Tobias. Johannesburg: Wits University Press. Henshilwood, C. S., & Dubreuil, B. (In press). Reading the artifacts: Gleaning language skills from the Middle Stone Age in southern Africa. In R. Botha & C. Knight (Eds.), The Cradle of Larzguage, Volume 2: African perspectives. Oxford: Oxford University Press. Hublin, J.-J. (2005). Origine du langage. In 0. Dutour, J.-J. Hublin & B. Vandermeersch (Eds.), Origine et Evolution des Populations Humaines (pp. 377-394). Paris: Comite des Travaux Historiques et Scientifiques. Klein, R. & Edgar, B. (2002). The dawn of modern culture. New York, NY: J. Wiley. Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R. E., Burbano, H. A., Hublin, J.-J., Hanni, C., Fortea, J., de la Rasilla, M., Bertranpetit, J., Rosas, A. & Paabo, S. (2007), The derived FOXPZ variant of modern humans was shared with Neandertals. Current Biology, 17, 1-5. Lieberman, P. (2005). The pied piper of Cambridge. The Linguistic Review, 22, 289-301. McBrearty, S. and Brooks, A. S. (2000). The revolution that wasn’t: A new interpretation of the origin of modern human behavior. Journal of Human Evolution, 39,453-563. Perner, J., Stummer, S., Sprung, M., & Doherty, M. (2002). Theory of mind finds its Piagetian perspective: Why alternative naming comes with understanding belief. Cognitive Development, 17, 145 1-1472. Richerson, P. J., & Boyd, R. (2005). Not by genes alone. Chicago, IL: University of Chicago Press. Tomasello, M. (1999). The cultural origins of human cognition. Cambridge, MA: Harvard University Press. Wynn, T. (1991). Tools, grammar and the archaeology of cognition. Cambridge Archaeological Journal, 1(2), 191-206. Wynn, T., & Coolidge, F. L. (2007). Did a small but significant enhancement in working memory capacity power the evolution of modern thinking? In P. Mellars, K. Boyle, 0. Bar-Yosef, & C. Stringer (Eds.), Rethinking the human revolution (pp. 79-90). Cambridge: McDonald Institute Monographs.
THE ORIGINS OF PREFERRED ARGUMENT STRUCTURE CALEB EVERETT Anthropology Department, University of Miami, Coral Gables, Florida, 33124 The author presents a reexamination of the origins of Preferred Argument Structure (Du Bois 1987, 2002). Conversation data from English are provided, which contradict the putative cognitive motivations of PAS suggested in the literature. The distributional tendencies in the data examined suggest that PAS is epiphenomal and due to extralinguistic factors that could be construed to predate language. Since ligatures between PAS and the evolution of case have been noted in the literature (e.g. Jager 2007), it is suggested that the motivations for PAS merit further investigation. Finally, it is noted that the data presented reflect another plausible motivation, along with those already found in the literature, for the typological dominance of nominative-accusative patterns, vis-a-vis absolutive-ergative ones.
1. An Introduction to Preferred Argument Structure In Du Bois (1 987), a Preferred Argument Structure (PAS) is suggested for the world’s languages. This structure reflects the tendency languages exhibit for placing lexical arguments (i.e. non-pronominal noun phrases) and correlated new arguments in predictable roles within clauses. The positions in which lexical and new arguments tend to occur are the argument slots of intransitive clauses, referred to henceforth as S‘s, and the argument slots of transitive clauses typically associated with less agentive properties, referred to henceforth as 0‘s. Conversely, languages exhibit a strong aversion towards introducing new arguments in the A role, typically reserved for more agentive arguments in transitive clauses. a Du Bois (1987) based his initial findings on transcribed narratives in Sacapultec, an ergative Mayan language of Guatemala. His data suggest that Sacapultec demonstrates a strong inclination to “Avoid more than one lexical argument per clause.” (819) He terms this avoidance a “One Lexical Argument Constraint.” Similarly, his data suggest that Sacapultec seeks to “Avoid lexical a
A, S, and 0 are preferred to terms such as subject and object since the status of the latter grammatically-oriented categories varies across languages. For instance, the presence of a “subject” category in some absolutive-ergative languages is not easily established.
107
108
A’s.” (823) He refers to this finding as the “Non-lexical A Constraint.” He employs these two constraints in his initial formulation of “Preferred Argument Structure”: “The conjunction of the One Lexical Argument Constraint and the Non-lexical A Constraint, as it governs the surface syntactic distribution of lexical arguments in discourse, constitutes what I call P[referred] A[rgument] S[tructure]. (This is the grammatical dimension of PAS; a pragmatic dimension is addressed in the next section.)” (823)
The pragmatic dimension referred to, which parallels the constraints regarding lexical arguments, can be encapsulated by the following two constraints: “Avoid more than one new argument per clause” (826) and “Avoid new A’s” (827). Together with the correlated findings on lexical-argument placement, these constraints serve as the basis of Preferred Argument Structure, referred to henceforth as PAS. Subsequent to Du Bois’ original work, a fair amount of related research has been undertaken by a number of investigators, establishing the existence of PAS in languages such as Mam, English, Hebrew, Chamorro, Portuguese, Quechua. Rama, and Papago. For instance, in one Spanish text, 94% of 591 new referents were coded as S’s (36%) or 0 ’ s (58%), while only 6% of the new referents were coded as A’s (Du Bois 2002:12). Data defined along similar parameters are also evident in research not specifically addressing PAS. Consider the following data on referents in the A and 0 roles in English, based on the CHRISTINE corpus.b Table 1. Percentage of 1” and 2”dperson referents by role in an English corpus. 0 A 803 (80 %) LocaVI” and Ydperson 47 (5 Yo) 3rdperson 202 (20 Yo) 958 (95 %)
In this case, the data are presented according to a local vs. non-local distinction, rather than the new vs. given dichotomy. Nevertheless, given the well-known correlation between local referentsispeech act participants and given-nesshopicality, these results are consistent with findings on PAS. Given the existence of PAS in the world’s languages, two questions follow naturally. First, we might wonder what the origins of PAS are. Second, we might wonder how PAS relates to grammatical relations and case. In his initial presentation of PAS, Du Bois presented answers to both of these questions. With respect to the latter issue related to GR’s and case, he suggested simply that the S and 0 roles are naturally grouped by their ability to host new referents, and that this grouping is fundamental to the existence of absolutiveiergative systems. With respect to the former issue, addressing the
109
motivations for PAS, he suggested that humans generally restrict new referents to the S and 0 roles, and generally only refer to one new referent at most in a clause, since the introduction of new referents is a cognitively more onerous task than the mention of topical referents: I propose that the absolutive syntactic position constitutes a sort of grammatically defined ‘staging area’-reserved for accommodating the process, apparently relatively demanding, of activating a previously inactive entity concept. (1 987:834)
According to this perspective, then, S’s and 0 ’ s are the default “staging areas” for new referents, exploited in order to ease the cognitive burden of information transfer. This functional motivation leads to a natural grouping of S’s and O’s, which creates a natural absolutive category that is exploited by many case and GR systems. There are strong reasons for suspecting this account of the cognitive motivations for PAS, a point I will return to in the following section. Nevertheless, the motivating relationship between the distribution characterized by PAS and absolutivity is important to our understanding of case. Jager (2007) presents a detailed examination of how Evolutionary Game Theory can be applied to linguistic data in order to provide an account of why certain typologically-widespread case systems (such as split-ergative, ergative, and accusative) are so prevalent in the world’s languages, and why so many alternative systems are unexploited. For the sake of space, I will not discuss Jager’s ideas in detail. One point that is worth stressing, however, is that Jager’s account is based on a distribution of referents that is consistent with the correlation between new-ness and the 0 role first noted by Du Bois. For instance, Jager’s (2007:80) data, based on the CHRISTINE corpus of English, suggest that 79% (791/1005) of 0’s are full NP’s rather than pronominal NP’s. Conversely, the data suggest that 91% (9144005) of A’s are pronominal NP’s rather than full NP’s. Given the natural correlation between pronouns and topicality, it follows that the A’s in Jager’s data tend to be topical. Given the natural correlation between full NP’s and new referents, it follows that the 0’s in his data tend to be new. While Jager’s account does not involve an analysis of S referents, his data are generally consistent with PAS and his game-theoretic account therefore makes more explicit a ligature between the distribution of referents characterizing PAS and the development of case systems. Whether one accepts Jager’s ideas or related suggestions such as those in 3 below, it seems fair to say that the distribution of referents characterizing PAS has worked in some manner(s) to help motivate the case patterns which have become prevalent
110
during the evolution of language. Therefore, I would claim, the origins of PAS merit further scrutiny.
2. Another Look at the Numbers The existence of PAS is widely-supported by the literature. This corroboration of PAS can incorrectly be perceived to be a corroboration of the putative cognitive motivations for PAS, i.e. that the absolutive grouping of S and 0 is fundamentally the result of these roles being employed as a “cognitive staging area” for new referents. While the distributional data are generally consistent with PAS, however, they do not support the suggested cognitive motivations. Some data from English will help to illustrate this point. The data in this case are taken from a corpus of 541 clauses, comprising three conversations of approximately five minutes, taken from late-night talk shows. This genre of conversation was chosen for its high “information pressure quotient” (Du Bois 1987:818), i.e. the rate of introduction of new referents during the telling of anecdotes by talk show guests was expected to be high. The size of the corpus is about the size of that employed by Du Bois (1987)‘ In the data, there are 681 S’s, A’s, or 0’s. Of these, 401 are S’s, 140 are A’s, and 140 are 0’s. Of all of these arguments, 141 are lexical mentions. Only 99 of the arguments are actually new mentions. Significantly for our purposes, 93% of both the lexical and new referents are found in either an S or 0 role. This is the sort of finding that has been made in many languages, and can be construed as buttressing the idea that S’s and 0’s are used as a “staging area” for new referents. However, several observations suggest otherwise. The first observation to be made is that, while there are many new and lexical S’s in these data, this is due in large part to the greater incidence of intransitive clauses. There are 35 lexical S’s, for example, and only 10 lexical A’s. However, this disparity basically corresponds to the overall rates of S’s to A’s, 401 to 140. While superficially trivial, this observation is generally not made in studies on PAS. Disregarding the general preponderance of S’s of course colors interpretations of overall rates of new and lexical S’s. Next, consider that there are only 35 lexical S’s, out of 401 total. There are 96 lexical O’s, however, out of only 140 total 0’s. The rate of lexical O’s, 69%, and the rate of lexical S’s, 9%, are clearly not of the same order. Already we can
Large digital linguistic corpora could in principle be tested for the correlations discussed below. However, such corpora would need to be tagged for the relevant categories. Therefore I employ a more modest corpus, amenable to clause-by-clause tabulations.
111
see that grouping S and 0 together on the basis of incidence of new referents, while superficially appealing for these data, is somewhat specious. We know that intransitive clauses generally outnumber transitive clauses in discourse (cf. Dryer 1997). For instance, a random sampling of two hundred clauses from the Santa Barbara Corpus of Spoken American English suggests a 3:l ratio of intransitive to transitive clauses. The ratio in these data is also 3:l. The greater overall number of S’s, with respect to A’s, explains in large part the greater number of new and lexical S’s with respect to new and lexical A’s. One could make the claim that S’s are more frequent in discourse, at least in part, precisely because they serve as a “staging area” for new referents in discourse. If this were the case, however, we would expect there to be examples in these data in which a new referent is introduced in the S role, and subsequently referred to in the A role. In fact, there is not one such case in the data being considered. Similarly, there is not one case in which a new referent is introduced in the 0 role, only to be subsequently referred to in the A role. This sort of data would seem crucial to buttressing the claim that S’s and 0’s are used as staging areas for new referents, and it is notably absent in the literature. Careful examination of the sorts of referents that occur in the S, A, and 0 roles suggest in fact that PAS is not due to a particular way in which new referents can be more facilely processed, but is due to more basic correlations between referent types and the S, A, and 0 roles. Let us consider, for example, the correlation between human referents and these roles, since human referents are generally much more topical than non-human referents. (Cf. Givon 1983b) When we simply split the referents in these data into the categories of human and non-human, the distribution of new and lexical referents is elucidated. We find that, in these data, 96% of A‘s are human, 68% of S‘s are human, and only 25% of 0’s are human. The fact that the vast majority of A’s are human is perhaps not surprising, since humans are prototypical actordagents. Conversely, the fact that the majority of 0 ’ s are non-human is not surprising given that prototypical undergoers are generally inanimate (cf. Hopper and Thompson 1980). Such data suggest that the rate of new A’s and new 0 ’ s can be explained without appealing to the putative burden of processing more than one new referent per clause. Instead, it appears that the low incidence of new A’s and the high incidence of new 0’s can be explained by appealing to what A and 0 referents are, generally humans and non-humans respectively. If this point is valid, however, we would expect to see a correlation between new-ness (or lexicality) and non-human status among referents in the S role. In fact, when we consider the 401 S referents in these data, we find that the new S’s almost
112
New Non-New Totals
Human S 4 270 274
I
Non-Humans 14 119 133
I
Totals 18 389 407
113
cognitive motivations for PAS, his conclusions dovetail neatly with those offered here. While future studies will hopefully draw more light on the subject, it seems plausible that basic non-linguistic factors motivate PAS. First, humans are natural actors (usually coded as A’s and S’s), while the environment is full of natural undergoers (usually coded as 0 ’ s and sometimes as S’s). Second, humans talk most frequently about themselves and others in their immediate environment (e.g. interlocutors). Put simply, we make the best topics of conversation, and we are inherently non-new. The distribution of new and lexical arguments in speech reflect the fact that we most naturally pay attention to and discuss ourselves and our interlocutors. As pithy and aphoristic as these claims are, the data suggest that such basic extralinguistic factors motivate PAS, the topicality of S’s and A’s, and, therefore, the absolutive and nominative categories, respectively. In short, extralinguistic factors lead to the correlation of S, A, and 0 with categories such as “new” and “given”. Over time, such frequent correlations were indexed in the structure of language, through e.g. case patterns. As Jager (2007:74) suggests, “the case-marking patterns that are attested in the languages of the world are those that are evolutionarily stable for different relative weightings of speaker economy and hearer economy, given the statistical patterns of language use” such as those delineated here. 3. A Note on the Preponderance of Nominative Patterns
Since it has been widely observed that S’s represent a mixed category (whether in terms of topicality or agency or some other dimension), offering natural groupings with A or 0, one wonders why the SIA grouping (nominativeIaccusative) is more prominent typologically than SIO (absolutive/ergative). Various accounts are purveyed in the literature, (see Giv6n 1976, and Jager 2007:102, inter alia). I would like to add to these suggestions by simply noting that, while S’s represent a mixed class, they clearly distribute more like A’s than 0 ’ s with respect to factors such as SAP status and humanness. Various fonts of data, such as those above and in Dahl (2000) suggest that S’s represent topical referents about 2/3 of the time, and so are more like A’s in terms of this parameter than they are like 0’s in terms of new-ness. The simple fact that the referents of S’s are generally human and SAP’S, as A’s almost always are, may help explain why the S/A grouping has become so prevalent in case systems during the course of linguistic evolution.
114
4. Conclusion
One of the defining characteristics of human language is the capacity to refer to displaced referents. What the data here and in the literature seem to suggest is that, while we quite often utilize this capacity, we most often topicalize nondisplaced human referents in the environment of an utterance (e.g. SAP’S). Since they are generally less agentive, nonhuman referents are referred to as 0 ’ s and sometimes S’s, and are more often transient in discourse. The factors cited are more fundamental than language in the sense that they reflect inherent qualities about interacting humans (generally given or established contextually) and their nonhuman environment (generally less agentive). Such extra-linguistic factors may have contributed to the distribution of referents characterizing PAS and may have influenced the evolution of case and GR patterns in the world’s languages. From this perspective, the true origins of PAS may be understood to be more basic than, and predate, language. References
Comrie, B. (1981). Language Universals and Linguistic Typology. Chicago: The University of Chicago Press. Dahl, 0. (2000). Egophoricity in discourse and syntax. Functions of Language, 7, 39-77. Amsterdam: John Benjamins. Dryer, M. (1997). On the six-way word order typology. Studies in Language, 21, 69-103. Du Bois, J. (1987). The Discourse Basis of Ergativity. Language, 63, 805- 852. Du Bois, J. (2002). Discourse and Grammar. In M. Tomasello (Ed.), The New Psychology of Language: Cognitive and Functional Approaches to Language Structure, Vol. 2. Erlbaum. Givon, T. (1976). Topic pronoun, and grammatical agreement. In Charles N. Li (Ed.), Subject and topic (pp.149-88). New York: Academic Press. Givon, T. (1983b). Topic continuity in discourse: the functional domain of switch reference. In T. Giv6n (Ed.), Topic Continuity in Discourse: a Quantitative Cross-Language Study (pp. 141-2 14). John Benjamins. Hopper, P.J. and Thompson, S.A. (1980). Transitivity in Grammar and Discourse. Language, 56,25 1-299. Jager, G., (2007). Evolutionary game theory and typology: A case study. Language, 83,74-109. Silverstein, M., (1976). Hierarchy of features and ergativity. In R. M. W. Dixon (Ed.), Grammatical categories in Australian languages (pp. 1 12-71). Canberra: Australian Institute of Aboriginal Studies.
LONG-DISTANCE DEPENDENCIES ARE NOT UNIQUELY HUMAN RAMON FERRER I CANCHO Dpt. de Llenguatges i Sistemes Informatics, Universitat Politicnica de Catalunya. Campus Nord, Edifici Omega. Jordi Girona Salgado 1-3. 08034, Barcelona, Spain V~CTORM. LONGA Dpto. de Literatura EspaAola, T" da Literatura e Linguistica Xeral, Universidade de Santiago de Compostela, Pr. Isabel a Cat6lica 2, 2"E, 36204, Vigo, Spain GUILLERMO LORENZO Dpto. de Filologia EspaAola. Universidad de Oviedo, Teniente A . Martinez s/n, 3301 1, Oviedo, Spain It is widely assumed that long-distance dependencies between elements are a unique feature of human language. Here we review recent evidence of long-distance correlations in sequences produced by non-human species and discuss two evolutionary scenarios for the evolution of human language in the light of these findings. Though applying their methodological framework, we conclude that some of Hauser, Chomsky and Fitch's central claims on language evolution are put into question to a different degree within each of those scenarios.
1. Introduction
Hauser, Chomsky and Fitch's (2002) picture of the faculty of language (FL) and their distinction between a broad (FLB) and a narrow (FLN) sense of the faculty is basically a useful tool for guiding research projects on language evolution, even if not always understood as such (Pinker & Jackendoff 2005 and Jackendoff & Pinker 2005). Their main contention is that explaining the evolutionary origins of language requires, as a first step, to tell apart the formal features of languages which can be thought of either as (1) inherited unchanged from a common ancestor, (2) subjected to minor modifications, or (3) qualitatively new (Hauser, Chomsky & Fitch 2002: 1570). Features reasonably put under categories (1) and (2) are said to be part of the faculty of language in the broad sense (FLB), while features suspected of pertaining to category (3) are 115
116
said to be part of the faculty of language in the narrow sense (FLN). But the import of the distinction is first and foremost methodological: it is a useful criterion for deciding which aspects of language can be explained in relation with the evolutionary history of other non-human species and which aspects of language cannot be illuminated in this way and thus demand somehow special explanations (Fitch, Hauser & Chomsky 2005: 181). A second and in some way stronger contention of Hauser, Chomsky and Fitch has to do with the contents of FLN. They believe that the only serious candidate to be included within this category is the recursive procedure that the computational system of human language makes use of, with its open-ended generativity based on the structural embedding of hierarchically organized phrases (Hauser, Chomsky & Fitch 2002: 1573). But, again, it is important to note that Hauser, Chomsky and Fitch’s proposal is mostly put forward as a guideline to direct testable hypothesis in order to refute or validate the claim that recursion is the only real novelty that language evolution has introduced into the natural world (Hauser, Chomsky & Fitch 2002: 1578). In this sense, it is worth remembering that they also advance an alternative hypothesis, according to which FLN is nothing but a rich set of interconnected mechanisms, all shared with other non-human species but only put together in the course of human evolution (Hauser, Chomsky & Fitch 2002: 1578 and Fitch, Hauser & Chomsky 2005: 181), a possibility also exposed to empirical refutation. This article is not directly aimed at recursion, but at another highly distinctive feature of human language, actually also dubbed by some authors as language specific: long-distance dependencies (henceforth, LDDs) among the items of a sequence. The article is organized as follows. Section 2 offers what we consider the linguists’ consensus view on LDDs as a unique feature of human language. Section 3 discusses the link between LDDs and the complexity of human language. Section 4 is devoted to review some recent evidence of long-distance correlations in sequences produced by non-human organisms, which points to the conclusion that this feature of human language is to be classified as pertaining to the domain of FLB. Section 5 is devoted to discuss some consequences of this conclusion for the evolutionary understanding of the faculty of language. These consequences, we contend, are far reaching, attending to the fact that LDDs are usually connected with the existence of nontrivial recursion. This final section depicts two scenarios for the evolution of the faculty of language, both challenging, though to a different degree, in relation to Hauser, Chomsky and Fitch’s distinctions and contentions.
117
2. LDDs and human language: the consensus view
LDDs are a specially pervasive feature of human language, which adopts many different faces: agreement (John, often drinks, wine), binding (John, wonders which picture of himseg has been stolen), control (Mary, never promised PRO, to kiss John), displacement (which students, did the president say the police arrested t, last week?), among others. What they all have in common is that they imply a relation between two items, one of them to be valuated by the other within a certain search space or domain non-linearly but structurally defined (see Chomsky 2000, where the unifying notion of Agree is proposed). Chomsky contends that this property of language seems rather unexpected in that it is “never built into special-purpose systems and apparently without significant analogue elsewhere” (Chomsky 2000: 101). Chomsky’s claim is by no means exceptional. Hauser, Chomsky and Fitch also defend that “natural language go beyond purely local structure by including a capacity for recursive embedding of phrases within phrases, which can lead to statistical regularities that are separated by an arbitrary number of words or phrases. Such long-distance, hierarchical relationships are found in all natural languages for which, at a minimum, a phrase-structure grammar is necessary” (Hauser, Chomsky & Fitch 2002: 1577), thus establishing a causal connection between LDDs and the sort of complex syntax characteristic of human language. In the same vein, Anderson relates the existence of LDDs with that of grammars with recursive phrase structure (Anderson 2004: 203) and contends that this kind of grammars are language-specific (Anderson 2004: 2 17-2 18). Not very different are Benvick’s opinions, who thinks that Merge is to be considered as the ultimate responsible of many of the distinguishing properties of human syntax, among others recursion and displacement, one of the varieties of LDDs (Berwick 1998: 322). In addition, considering LDDs as language-specific is one of the uncommon points of agreement between Hauser, Chomsky and Fitch, on the one hand, and Pinker and Jackendoff, on the other hand. In their own words: “a final important device is long-distance dependency, which can relate a question word or relative pronoun to a distant verb, as in Which theory did you expect Fred to think Melvin had disproven last week?, where which theory is understood as the object of disprove. Is all this specific to language? It seems likely, given that it is specialized machinery for regulating the relation of sound and meaning. What other human or non-human ability could it serve?” (Pinker & Jackendoff 2005: 216). All these pronouncements speak clearly of a consensus among linguists (yet rarely declared) concerning the language specificity of LDDs and its
118
inclusion (even if implicitly) within FLN, at least by those who accept the notion. 3. LDDs and the complexity of human language
Our understanding of the complexity of human language is largely influenced by Chomsky’s hierarchy of grammatical complexity (Hopcroft & Ullman 1979). Chomsky’s hierarchy consists of the following levels (in decreasing order of potential complexity): Type-0 grammars (unrestricted grammars) include all formal grammars. Type- 1 grammars (context-sensitive grammars) generate the context-sensitive languages. Type-2 grammars (context-free grammars) generate the context-free languages. Type-3 grammars (regular grammars) generate the regular languages. A key difference between type-3 and the other types is that type-3 grammars do not require memory about the past elements of the sequence in order to be produced or processed. Thus, although all four levels involve recursion, it is important to distinguish the recursion of grammars that are not of type-3 (hereafter non-trivial recursion) from that of type-3 grammars. Interestingly, the presence of LDDs indicates that memory of the past is needed and that this memory may need to be maintained for a long time. Can this be interpreted as LDDs implying that the complexity of the language is above type-3? Standard automata theory takes a very radical point of view on the issue (Hopcroft & Ullman 1979): any finite language is of type-3, regardless of the internal dependencies within the sequences that it can generate. Moreover, the fact that a language is infinite and shows LDDs does not imply that the language has a higher complexity than type-3 either. Theoretically, one could design artificially a finite language able to produce long but finite length “sentences” in which LDDs between the “words” of a sentence are present. In sum, LDDs are not a sufficient but a necessary condition for non-trivial recursion, which means that the existence of LDDs dissociated from complex recursive syntax is a possibility not to be discarded. 4. LDDs are not uniquely human
The study of LDDs using information theory and statistical physics techniques has a long tradition. These techniques operate on sequences of units (e.g., words in texts or nucleotides in genomes). For instance, Ebeling and Poschel (1994) used transinformation (a two-point correlation function) to determine if pairs of
119
letters at a certain distance within a literary text are related or not. Using this and more elaborated techniques, LDDs between letters (Ebeling & Poschel 1994) or words (Montemurro & Puri 2002) have been reported. Applying these methods to other species is providing growing evidence that LDDs are not a unique feature of human language. This kind of dependencies are found in the sequence of units of humpback whale songs (Suzuki et al. 2006) and the sequence of dolphin surface behavioural patterns (Ferrer i Cancho & Lusseau 2006). To illustrate this kind of research, we show a piece of a sequence of dolphin surface behavioral patterns borrowed from Ferrer i Cancho & Lusseau (2006): TSD, TO, AS, FB, TSD, TSD, AS, TSD, AS, LT. In this piece of ten patterns, there are five different behavioral patterns: TSD (tail-stock dive), TO (tail out), AS (active surfacing), FB (fart blow), LT (lob tail). Metaphorically speaking, these are the “words” of dolphin surface behavioural sequences. Using transinformation as Ebeling & Poschel (1994), Ferrer i Cancho & Lusseau (2006) studied collections of dolphins behavioral sequences (metaphorically speaking, each sequence could be seen as a “sentence” and the whole collection of sequences could be seen as a “corpus”) and found that the dependency of a dolphin surface behavioral with previous patterns extends back at lead to the 7* past behavioral pattern. Notice that the analysis based on transinformation cannot determine if two concrete occurrences of a pattern of a specific sequence are dependant or not. It just provides global information about the span of dependencies in the whole collection of sequences. Besides, this kind of analysis is based on the concept of mutual dependency: if a pattern depends on a second pattern, then the second pattern depends on the former pattern. In other terms, the possibility of subordination of one pattern to another dependant pattern is not considered as in the case of human words in syntactic theory. It is important to notice that the LDDs that are found in human language do not have an exact correspondence in other species in all cases. In human language, the LDDs between syntactic constituents involve meaningful units. In contrast, the units on which LDDs are found in humpback whale song (Suzuki et al. 2006) are meaningless whereas there is evidence of a rudimentary meaning in dolphin surface behavioural patterns (Ferrer i Cancho & Lusseau 2006). Besides, strong evidence of LDDs needs that the units are produced by a single individual. In Ferrer i Cancho and Lusseau 2006 the evidence of LDDs is limited to the sequence produced by a population of dolphins, not an individual speaker as in human language. However, whales and birds songs exhibit LDDs at an individual level (Suzuki et al. 2006) and it cannot be discarded that a deeper study on dolphins distinguishing the identity of the performer of a
120
behavioural pattern could provide evidence of LDDs in the production of single individuals. In sum, an exact correspondence between LDDs in syntactic constituents and in the sequences produced by another species has not been clearly found at the level of meaning and their maintenance within the behaviour of a single organism has not been studied in all cases. Anyhow, we think that the evidence so far reported suffices to start thinking about: language LDDs as a modified version of this more rudimentary class of long-term correlations. It could be tempting to think that LDDs are a unique feature of complex organisms (i.e. organisms with a brain). However, this hypothesis does not stand. LDDs are found, for instance, in DNA sequences (Li & Kaneko 1992). In this case, the units are nucleotides and, as far as we know, single nucleotides do not have biological meaning. Even more challenging for proponents of longdistance dependencies as a uniquely human property is the evidence of LDDs in the atmosphere (Harrison 2004) or in earthquakes (Telesca et al. 2007). All this means that the presence of LDDs is pervasive, not just in the specific case of language, but in nature at large. The remainder of this article is aimed at showing the implications this finding raises for the evolution of language. 5. Some conclusions: LDDs and the evolution of human language
The data referred to in the previous section seems completely at odds with the consensus view that LDDs are uniquely associated with human language and rather points to the following conclusion, that we are going to assume: LDDs-as-FLB Thesis: “The establishment and maintenance of LDDs falls into FLB”. The evolutionary consequences of this statement are far reaching, given the causal link also customarily assumed among linguists between the existence of LDDs and the existence of complex syntax with non-trivial recursion. As a matter of fact, the statement above seems to open two alternative scenarios concerning the evolution of the faculty of language: Scenario 1. LDDs can be dissociated from complex recursive syntax (or, for the sake of clarity, syntax with structural embedding of hierarchically organized units). Within this scenario, the contention could be made that LDDs have acted as an evolutionary driving force pointing to the advent of phrase structure grammars, possibly as a device for the resolution of LDDs within optimally restricted search spaces or domains. Scenario 2. LDDs cannot be dissociated from complex recursive syntax. Actually, they serve as a diagnostic cue for the existence of this kind of syntax:
121
any system of non-human behavior exhibiting LDDs would automatically be deemed as being in possession of recursion. Within this scenario, the FLB vs. FLN distinction would cease to be operative, demonstrated the existence of some other system of this type aside from language. As a matter of fact, FL would not be other thing that a collection of capacities repeatedly evolved, with their incorporation into a single system as the only distinguishing feature of the faculty as such (Lorenzo 2006: ch. 3). The evidence is still sparse in order to decide which of the advanced scenarios is the most promising for the evolutionary understanding of language. Payne and McVay (1971) defend that the songs of humpback whales are hierarchically structured and Suzuki, Buck and Tyack (2006) contend that they also exhibit long-term correlations of a sort not capable of being represented by a Markovian model. This findings strongly point to the second scenario, but more research is still needed to adhere to the idea. Ferrer i Cancho and Lusseau’s (2006) detection of long-term correlations in the behaviors of dolphins points at a minimum to the fist scenario, but still there could be the case that those behaviors are hierarchically structured, a detail not considered in that paper. Birds, whose patterns of singing behaviour have also been defended as hierarchically organized (Todt & Hultsch 1998 and Gentner et al. 2006), are also a promising organism for extending the search of LDDs. It seems not too risky to assert that a deeper understanding of the behavior of organisms like those mentioned above will shortly serve to straighten the evolutionary gulf between language and the rest of nature, a gulf perhaps not so wide as customarily thought of by most linguists.
Acknowledgements This work was funded by a Juan de la Cierva contract from the Spanish Ministry of Education and Science under the project BFM2003-08258-C02-02 (RFC) and by the Spanish Ministry of Education and Science and FEDER under the project HUM2007-60427/FILO (VML and GL). References Anderson, S. R. (2004). Dr. Dolittle’s delusion: animal communication, linguistics, and the uniqueness of human language. New Haven: Yale University Press. Benvick, R. C. (1998). Language evolution and the Minimalist Program: the origins of syntax. In J. Hurford, M. Studdert-Kennedy, & C. Knight (Eds.), Approaches to the evolution of language. Social and cognitive bases (pp. 320-340). Cambridge: Cambridge University Press.
122
Chomsky, N. (2000). Minimalist inquiries: the framework. In R. Martin, D. Michaels & J. Uriagereka (Eds.), Step by step. Essays on minimalist syntax in honor of Howard Lasnik (pp. 89-155). Cambridge (MA): The MIT Press. Ebeling, W. & Poschel, T. (1994). Entropy and long-range correlations in literary English. Europhysics Letters, 26 (4), 241-246. Ferrer i Cancho, R. & Lusseau, D. (2006). Long-term correlations in the surface behavior of dolphins. Europhysics Letters, 74, 1095-1 10 1. Fitch, W. T., Hauser, M. D., & Chomsky, N. (2005). The evolution of language: clarifications and implications. Cognition, 97, 179-2 10. Gentner, T. Q., Fenn, K. M., Margoliash, D. & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature 440, 1204-1207. Harrison, R. G . (2004). Long-range correlations in measurements of the global atmospheric electric circuit. Journal of Atmospheric and Solar-Terrestrial Physics 66 (13-14): 1127-1 133. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science, 298, 1569-1579. Hopcroft, J. & Ullman, J. (1979), Introduction to automata theory, languages and computation. Massachusetts: Addison- Wesley. Jackendoff, R. & Pinker, S. (2005). The nature of the language faculty and its implications for evolution of language (reply to Fitch, Hauser, and Chomsky). Cognition, 95, 21 1-225. Li, W. & Kaneko, K. (1992). Long-range correlation and partial l/f" spectrum in a non-coding DNA sequence. Europhysics Letters, 17,655-660: Lorenzo, Guillermo (2006). El vacio sexual, la tautologia natural y la promesa minimalista. Ensayos de biolingiiistica. Madrid: AMachado Libros. Montemurro M. A. & Puri, P. A. (2002). Long-range fractal correlations in literary corpora. Fractals, 10 (4), 45 1-461. Payne, R. S., & McVay, S. (1971). Songs of humback whales. Science, 173, 587-597. Pinker, S., & Jackendoff, R. (2005). The faculty of language: what's special about it? Cognition, 95,201-236. Suzuki, R., J. Buck, & P. Tyack. (2006). Information entropy of humpback whale songs. Journal of the Acoustical Society of America, 1 19, 1849-1866. Telesca, L., Lovallo, M., Lapenna, V. & Macchiato, M. (2007). Long-range correlations in two-dimensional spatio-temporal seismic fluctuations. Physica A , 377 (I), 279-284. Todt, D. & Hultsch, H. (1996). How songbirds deal with large amounts of serial information: retrieval rules suggest a hierarchical song memory. Biological Cybernetics, 79,487-500.
HOW MUCH GRAMMAR DOES IT TAKE TO SAIL A BOAT? (OR, WHAT CAN MATERIAL ARTEFACTS TELL US ABOUT THE EVOLUTION OF LANGUAGE?) DAVID GIL Department oflinguistics, Max PIanck Institute f o r Evolutionary Anthropology Leipzig, 04103, Germany This paper examines the relationship between grammatical complexity and complexity in culture, technology and civilization. Colloquial Malay/Indonesian, with its simple nearly Isolating-Monocategorial-Associationalgrammar, fulfils most functions of a complex society, thereby demonstrating that IMA grammar suffices to support most aspects of modem life. Thus, most of the additional complexity of grammar is not necessary for the maintenance of contemporary civilization, and archeological evidence will never be able to prove the existence of language beyond IMA complexity.
1. Complexity of Language, Complexity of Civilization
Human languages are of much greater complexity than the communicative systems of great apes, dolphins, bees and other animals. Similarly, human culture, technology and civilization are also immensely more complicated than anything observed in other species, such as, for example, the ways in which chimpanzees fashion tools to crack nuts or fish for termites. Clearly, these two facts are related: comparing humans to other species leads inexorably to the conclusion that linguistic complexity is correlated with complexity in other, non-linguistic domains. But what exactly is the nature of this correlation? A widespread assumption is that linguistic complexity is necessary in order to support complexity in other domains. This accords with a functional approach towards the evolution of language, whereby greater linguistic complexity enables humans to accomplish more tasks, and in doing so confers an evolutionary advantage. It also forms the basis for archaeological investigations into the evolution of language, in which the existence of material artifacts from a certain era is interpreted as evidence that humans at that time were also endowed with the cognitive and linguistic abilities necessary for the production of such artifacts (Lee and DeVore 1968, Clark 1977, Roebrooks 2001, de la Torre 2004 and many others). Similarly, the discovery of the 123
124
remains of an apparently new species of hominin on the Indonesian island of Flores raises the possibility that such hominins were capable of constructing and sailing boats, which in turn may suggest that Homo Floresiensis was endowed with whatever linguistic abilities are necessary for the collective planning and execution of such a complex set of activities (see Monvood and Cogill-Koez 2007 for recent critical discussion). But what, exactly, are thc nccessary linguistic abilities: How much grammar does it really take to build a boat and sail it to a distant island? Or more generally: How much grammar does it take to do all of the things that, amongst all living species, only humans are capable of doing, such as, for example, worshiping god, waging wars, and working in offices; inventing, manufacturing and using sophisticated hi-tech tools; and engaging in multifarious commercial, scientific and artistic activities. This paper argues that the amount of grammar that is needed in order to support the vast majority of daily human activities is substantially less than is often supposed to be the case, in fact less than that exhibited by any contemporary human language, and far less than that exhibited by most such languages. In other words, much of the observable complexity of contemporary human grammar has no obvious function pertaining to the development and maintenance of modem human civilization. More specifically, it is argued that the level of grammatical complexity necessary for contemporary culture, technology and civilization is no greater than that of Isolating-MonocategorialAssociational (or I M A ) Language.
2. Isolating-Monocategorial-AssociationalLanguage
Isolating-Monocategorial-Associational Language, introduced in Gil (2005a), is an idealized language prototype with the following three properties: (1)
(a) (b) (c)
Morphologically Isolating No word-internal morphological structure; Syntactically Munucategorial No distinct syntactic categories; Semantically Associational No distinct construction-specific rules of semantic interpretation (instead, compositional semantics relies exclusively on the Association Operator).
125 As argued in Gil (2005a, 2006), IMA Language characterizes an early stage in
the phyiogenetic evolution of human language, and also an early stage in the ontogenetic development of contemporary child language. In addition it can be observed in artificial languages such as that of pictograms. However, no known contemporary human language instantiates IMA Language; all such languages are endowed with additional kinds of structures beyond those sanctioned by the definition in (1) above. The defining properties of IMA Language represent the limiting points of maximal simplicity within each of three distinct domains, morphology, syntax and semantics. Hence, for each domain, one may imagine languages approaching these end points along a scale of decreasing complexity. Accordingly, a language is increasingly isolating as it has less and less morphological structure, increasingly monocategorial as its syntactic categories decrease in number and importance, and increasingly associational as its construction-specific rules of semantic interpretation become fewer and less distinct. Alongside Pure IMA Language, as in (1) above, one may thus entertain the possibility of a range of Relative ZMA Languages, approaching Pure IMA Language to various degrees within each of the above three domains.
3. Riau Indonesian as a Relative IMA Language No naturally occurring contemporary human language completely satisfies the definition of IMA Language. However, whereas many languages, such as English, Hebrew, Dani and Pirah5, go way beyond the confines of IMA Language, exhibiting much greater levels of complexity, others approach the IMA prototype to various degrees, thereby warranting characterization as Relative IMA Languages. One example of a Relative IMA Language is Riau Indonesian, as described in Gil(2004,2005a,b and elsewhere). In Riau Indonesian, basic sentence structure is in fact purely IMA. Considcr thc following simple sentence:
(2)
Ayam makan chicken eat A ( CHICKEN, EAT)
The above sentence consists entirely of two "content words", and is devoid of any additional grammatical markers. The isolating character of the language is instantiated by the fact that each of the two words is monomorphemic. The monocategorial nature of the language is reflected by the fact that the two
126
words, although referring to a thing and an activity respectively, exhibit identical grammatical behaviour; rather than belonging to distinct parts of speech, such as noun and verb, they are thus members of the same syntactic category, namely sentence, and therefore the sentence as a whole is a simple juxtaposition, or coordination, of two sentences. The associational character of the language can be seen in the wide range of available interpretations: the first word, ayam, is underspecified for number and definiteness; the second word, makan, is indeterminate with respect to tense and aspect; and the sentence as a whole is underspecified with regard to thematic roles and ontological categories. Sentence ( 2 ) is thus vague, with a single very general interpretation which may be represented, as above, with a formula making reference to the Association Operator, A ( CHICKEN, EAT), to be read as 'entity associated with chicken and eating', or, more idiomatically, 'something to do with chicken and eating'. Sentence ( 2 ) is a typical sentence in Riau Indonesian; it is not "telegraphic" or otherwise stylistically marked in any way. Longer and more complex sentences can be constructed which, like (2), instantiate pure IMA structure. Nevertheless, Riau Indonesian contains a number of features which take it beyond the bounds of a pure IMA Language. That Riau Indonesian is not purely isolating is evidenced by the presence of a handful of bona fide affixes, plus various other morphological processes, such as compounding, reduplication and truncation. That Riau Indonesian is not purely monocategorial is due to the fact that in addition to the single open syntactic category sentence, it also contains a single closed syntactic category containing a few dozen semanticallyheterogeneous words whose grammatical behaviour sets them apart from words belonging to the category of sentence. Finally, that Riau Indonesian is not purely associational is clear from the presence of additional rules of compositional semantics that make reference to specific lexical items or to particular syntactic configurations, such as, for example, word order. Thus, Riau Indonesian is most appropriately characterized as a Relative IMA Language.
4. IMA Language is all that's Needed to Sail a Boat Riau Indonesian is but one of a wide range of colloquial varieties of Malayllndonesian, spoken throughout Indonesia and neighboring Malaysia, Brunei and Singapore by a total population of over two hundred million people. Although differing from each other to the point of mutual unintelligibility, a majority of these colloquial varieties resemble Riau Indonesian in their basic
127
grammatical structures, and accordingly share the characterization as Relative IMA Languages. As Relative IMA Languages, Riau Indonesian and other colloquial varieties of Malay/Indonesian make it possible to address the question: How much grammar does it take to sail a boat? By peeling off the extra layers of non-IMA complexity and homing in on the IMA core, one may examine the expressive power of pure IMA Language, and see exactly what levels of culture, technology and civilization it can support. In order to do this, one may examine fragments of pure IMA Language culled from naturalistic corpora in colloquial Malay/Indonesian. In such corpora, most utterances contain at least some additional structure beyond what is purely IMA: an affix, a word belonging to the closed class of grammatical function words, or a construction-specific rule of semantic interpretation. Nevertheless, it is also possible to find stretches of text in which, by probabilistically-governed accident, no such additional structure is present, and in which, therefore, exhibit pure IMA structure. A selection of such pure IMA stretches of text is presented and discussed in Gil (to appear). The examples are from four sociolinguistically diverse varieties of colloquial MalayiIndonesian: Jakarta Indonesian, Papuan Malay, Siak Malay and Minangkabau (usually considered to be a "different language", but very closely related to Malay/Indonesian, and not much more different from many varieties of Malayihdonesian than such varieties are from each other), The pure IMA examples provide an indication of the expressive power of pure IMA Language, showing how it matches up to other, non-IMA Languages, by capturing notions that in other languages make recourse to specialized grammatical constructions. Comparing the pure IMA fragments to the Relative IMA Language varieties from which they are taken suggests that getting rid of the non-IMA accoutrements has no systematic effect on expressive power in any semantic domain. The affixes and grammatical markers that take these language varieties beyond the bounds of pure IMA Language form a semantically heterogeneous set, a functional hodge-podge sprinkled like confetti over their fundamentally IMA architecture. In principle, anything that can be said in such languages can be paraphrased within the confines of pure IMA Language. This being the case, the pure IMA fragments of colloquial Malay/Indonesian paint a reasonably accurate picture of the functionality of IMA Language, and the amount of culture, technology and civilization that IMA Language can support. In doing so, they demonstrate that IMA Language is enough to run a country of some two hundred million people, and, by extension, most contemporary human activity throughout the world.
128 Thus, colloquial Malay/Indonesian shows that IMA Language is all that it takes to sail a boat. This means that, if indeed Homo Floresiensis sailed across a major body of water to reach the island of Flores, the most that we can infer from this with regard to his linguistic abilities is that he had IMA Language. More generally, what this suggests is that no amount of non-linguistic archeological evidence will ever be able to prove the presence, at some earlier stage of human evolution, of grammatical entities of greater-than-IMA complexity: prefixes and suffixes, nouns and verbs, not to mention complementizers, relative clauses, government, movement, functional categories, antecedents binding empty positions, and all the other things that so delight the souls of linguists.
5. Why is Grammar so Complex? If indeed IMA Language is all it takes to sail a boat and to run a large country, why is it then that no languages are pure IMA Languages, and most languages are not even Relative IMA Languages, instead exhibiting substantially greater amounts of grammatical complexity? One cannot but wonder what all this complexity is for. This paper has only been able to provide a negative answer, by identifying one albeit enormous thing that this complexity is not for, namely, the maintenance of contemporary human civilization: IMA Language is enough for all that. As noted at the outset, comparing humans to other species suggests that grammatical complexity is in fact positively correlated with complexity in other non-linguistic domains. However, more fine-toothed observations within the human species reveal a more ambivalent picture. Admittedly, within certain specific contexts, it is possible to identify what appear to be significant correlations between grammatical complexity and complexity in other domains, as for example in colloquial Malayhdonesian, where the development of a nonIMA and hence more complex coordinator may be shown to be related to the introduction of mobile telephony and text messaging (Gil 2004). However, in other contexts, the correlation seems to go in the opposite direction, as in the well-known case of language simplification being associated with the greater sociopolitical complexity of contact situations (McWhorter 2005 and others). Thus, across the diversity of contemporary human languages and cultures, grammatical complexity just does not seem to correlate systematically with complexity in other, non-linguistic domains. In the words of Sapir (1 921 :2 19): "Both simple and complex types of language of an indefinite number of varieties
129
may be spoken at any desired level of cultural advance. When it comes to linguistic form, Plato walks with the Macedonian swine-herd, Confucius with the head-hunting savage of Assam." These facts cast doubt on a central tenet of most functionalist approaches to language, in accordance with which grammatical complexity is there to enable us to communicate the messages we need to get across. In spite of overwhelming evidence showing that diachronic change can be functionally motivated, the fact remains that language is hugely dysfunctional. Just think of all the things that it would be wonderful to be able to say but for which no language comes remotely near to providing the necessary expressive tools. For example, it would be very useful to be able to describe the face of a strange person in such a way that the hearer would be able to pick out that person in a crowd or a police line-up. But language is completely helpless for this task, as evidenced by the various stratagems police have developed, involving skilled artists or graphic computer programmes, to elicit identifying facial information from witnesses - in this case a picture actually being worth much more than the proverbial thousand words. Yet paradoxically, alongside all the things we'd like to say but can't, language also continually forces us to say things that we don't want to say; this happens whenever an obligatorily-marked grammatical category leads us to specify something we would rather leave unspecified, as when, in English, we are forced to choose between he or she, or else make use of other awkward circumlocutions. And such examples can be multiplied at will. What these facts suggest, then, is that grammatical structure with its concomitant complexity is not a straightforward tool for the communication of pre-existing messages, but rather, to a large degree, our grammars actually define the messages that we end up communicating to one another. Instead of wondering what grammatical complexity is for, one should ask how and why grammars have evolved to their current levels of complexity. Clearly, many factors are involved, some common to all languages, underlying the development of grammatical complexity in general, others specific to individual languages, resulting in the observed cross-linguistic variation with respect to grammatical complexity. Among the many different factors involved, a particularly important role is played by diachrony. Contemporary grammatical complexity is the result of thousands of years of historical change, with its familiar processes of grammaticalization, lexicalization, syntacticization and the like. Rather than having evolved in order to enable us to survive, sail boats, and do all the other things that modem humans do, most contemporary grammatical complexity is more appropriately viewed as the outcome of natural processes of self-organization whose motivation is largely or entirely system-
130
internal. In this respect, grammatical complexity may be no different from complexity in other domains, such as anthropology, economics, biology, chemistry and cosmology, which have been suggested to be governed by general laws of nature pertaining to the evolution of complex systems.
References
Clark, G. (1977). World Prehistory in New Perspective. Cambridge: Cambridge University Press. Gil, D. (2004). Learning About Language from Your Handphone; dun, and and & in SMSs from the Siak River Basin. In I(. E. Sukatmo (Ed.), Kolita 2, Konferensi Linguistik Tahunan Atma Jaya (pp. 57-61). Jakarta: Pusat Kajian Bahasa dan Budaya, Unika Atma Jaya. Gil, D. (2005a). Isolating-Monocategorial-AssociationalLanguage. In H. Cohen and C. Lefebvre (Eds.), Categorization in Cognitive Science (pp. 347-379). Oxford: Elsevier. Gil, D. (2005b). Word Order Without Syntactic Categories: How Riau Indonesian Does It. In A. Carnie, H. Harley and S.A. Dooley (Eds.), Verb First: On the Syntax of Verb-Initial Languages (pp. 243-263). Amsterdam: John Benjamins. Gil, D. (2006). Early Human Language Was Isolating-MonocategorialAssociational. In A. Cangelosi, A.D.M Smith and K. Smith (Eds.), The Evolution of Language, Proceedings of the 6th International Conference (EVOLANG6) (pp. 91-98). Singapore: World Scientific. Gil, D. (to appear). How Much Grammar Does It Take to Sail a Boat? In G. Sampson, D. Gil and P. Trudgill (Eds.), Language Complexip as an Evolving Variable. Oxford: Oxford University Press. Lee, R. B. & DeVore, I. (1968). Man the Hunter. Chicago: Aldine. McWhorter, J. (2005). Dejning Creole. Oxford and New York: Oxford University Press. Monvood, M. J. & Cogill-Koez, D. (2007). Homo on Flores: Some Early Implications for the Evolution of Language and Cognition. In A.C. Schalley and D. Khlentzos (Eds.), Mental States, Volume I : Evolution, Function, Nature. Amsterdam: John Benjamins. Roebrooks, W. (2001). Hominid Behavior and the Earliest Occupation of Europe: An Exploration. Journal ofHuman Evolution, 41,437-461. Sapir, E. (1921). Language. London: Harcourt, Brace and World. Torre, I. de la (2004). Omo Revisited: Evaluating the Technological Skills of Pliocene Hominids. Current Anthropology, 45,439-465.
THE ROLE OF CULTURAL TRANSMISSION IN INTENTION SHARING TAO GONG, JAMES W. MINETT, & WILLIAM S-Y. WANG Department of Electronic Engineering, The Chinese University of Hong Kong Shatin, New Territories, Hong Kong This paper presents a simulation study exploring the role of cultural transmission in intention sharing (the ability to establish shared intentions in communications). This ability has been argued to be human-unique, and the lack of it has deprived animals of the possibility of developing human language. Our simulation results show that the adequate level of this ability to trigger a communal language is not very high, and that cultural transmission can indirectly optimize the average level of this ability in the population. This work extends the current discussion on the human-uniqueness of some languagerelated abilities, and provides better understanding on the role of cultural transmission in language evolution.
1. Introduction Language evolution is a fascinating topic in the interdisciplinary scientific community. Many empirical and theoretical studies (e.g., Oller & Griebel, 2000) have revealed a “mosaic” fashion of language evolution (Wang 1982) with a number of competences (e.g., social cognition, vocal tract control, imitation, etc.) all taking part in this process. There is an ongoing discussion on whether language results from abrupt changes of these abilities through macro-mutations (Pinker & Bloom, 1990), or it is caused by a quantitative evolution of “prototypes” of these abilities (Elman, 2005; Ke et al., 2006). Among these various abilities, intention sharing is crucial for developing a communication system. An intention is a plan of action that an organism chooses and commits itself to for pursuing a goal, and sharing intentions can be viewed as intentional (selective) comprehension during interactions (Tomasello et al., 2005). Comparative studies between chimpanzees and human infants have shown that the latter ones are good at establishing shared intentions during interactions with peers or adults, while the formers are poor at it (ibid). Based on these findings, Tomasello and his colleagues (ibid) argued that sharing intentionality must be human-unique, and the lack of it in animals prevents them 131
132 from developing language. However, a significant difference between modem humans and chimpanzees in this ability is insufficient to indicate the uniqueness of this ability to humans, since it may result from a gradual evolution along with the development of the human communication system. Apart from comparative studies, we therefore need other methodologies to investigate its development in humans. Computational simulation is efficient in this respect, and has been widely adopted to tackle problems concerning the evolution of language and other cognitive activities (e.g., Cangelosi et al., 2006). This paper presents a simulation study to explore intention sharing and some possible forces to adjust its level which is quantified as the probability of establishing a shared intention during communications. We argue first that the adequate level of this ability to trigger a communal language need not be very high, and that a small quantitative change of it can greatly affect the understandability of the emergent language. Second, cultural transmission can help to optimize the level of this ability in the population to “assist” language emergence. A low level of it can be increased, while a high level can be slightly reduced. Third, the emergence of displacement (human language can efficiently describe the events not occurring in the immediate environment of the conversation, Hockett, 1960) in human language could be a side effect of the optimization role of cultural transmission in intention sharing. The rest of the paper is organized as follows: Section 2 roughly reviews the adopted language emergence model; Section 3 introduces the framework to explore the role of cultural transmission in intention sharing; Section 4 discusses the simulation results; and finally, Section 5 provides the conclusions. 2. A brief Review of the Language Emergence Model
The adopted model in this paper was originally designed to study the coevolution of compositionality (in the form of lexical items) and regularity (in the form of word order) during language origin (Gong et al., 2005; Gong, 2008). Its conceptual framework is shown in Fig. 1, in which utterances encoding simple integrated expressions such as “run” (meaning “a fox is running”) or “chase<wolf, sheep>” (meaning “a wolf is chasing a sheep”) are exchanged among agents during communicative acts. Through the pattern extraction ability, individuals may acquire some recurrent patterns in the exchanged utterances as lexical items (see the LEXICON rectangle in Fig. 1). By sequential learning, individuals may acquire local orders recording order relations among two lexical items in the exchanged utterances. In addition, when individuals observe that some lexical items with the same semantic role are similarly used (display
133
the same local order with respect to other lexical items) in some exchanged utterances, they can assign these lexical items to the same category; for simplicity, we labeled them with the syntactic roles met in simple declarative sentences in English (‘S’, Subject; ‘V’, Verb; and ‘O’, Object). Through reiterating local orders among categories, individuals gradually acquire emergent global order(s) to regulate strings of lexical items from categories and form utterances to encode integrated meanings. For instance, if an individual’s linguistic knowledge includes some S, V, and 0 categories locally ordered “S before V” and “ 0 after V” that lead to an emergent global order SVO, then, helshe can express the integrated meaning ”chase<wolf, sheep>” as /WOLF CHASE SHEEP/, letters within “/ /” are utterance syllables chosen from a signaling space and not necessarily identical to English words. The initial stage of the model could be either no language at all or a small holistic signaling system in which all individuals share a small number of holistic rules to encode some integrated meanings. Through iterated communications, a compositional language having a set of lexical items and global order(s) gradually emerges. This model gives us an appropriate level of complexity to observe the effect of intention sharing on language evolution and the optimization role of cultural transmission in adjusting the level of this ability.
Detection of m~wrmntpatterns
e EMERGENT GLOBAL ORDERS S I I V + O I I V + WLFCHASESHEEPI
‘BOUom-up’ syntadic develwrnent
Figure 1. The conceptual framework of the language emergence model: the SEMANTICS rectangle stands for the predefined semantic space; the ovals represent the three aspects of linguistic knowledge acquired by agents based on different domain-general abilities: pattern extraction, sequential learning, and categorization; the EMERGENT GLOBAL ORDERS rectangle encompasses the emergent syntactic palterns triggered by this linguistic knowledge.
Intention sharing in this model boils down to the mailability of the topic from the environment during communicative acts, and it is simulated as an individual’s parameter, Reliability of Cue (RC), which indicates the probability (from 0.0 to 1.0) for the listener in a communication to accurately acquire the speaker’s intended integrated meaning in the heard utterance from an environmental cue (an ongoing event represented by an integrated meaning in
134
their environment). In a communication without intention sharing, a wrong cue containing an event different from the speaker’s intended meaning is given to the listener; in a communication with intention sharing, the speaker’s intended meaning is directly given to the listener through a cue. From the speaker’s perspective, RC indicates the probability of choosing an ongoing event in the immediate environment as the topic of the communicative act. From the listener’s perspective, it indicates the probability of referring to the ongoing event to assist comprehension. If RC is 1.O, intention sharing is established in all communications; if it equals 0.0, the listener only gets wrong cues, and no intention sharing is established in any communication. In this paper, the relations among RC, language emergence, and cultural transmission are discussed by evaluating two indices: Understanding Rate (UR, the average percentage of accurately understood integrated meanings in communications of all pairs of agents in the population based on their linguistic knowledge only and without referring to cues) and Convergence Time (CT, the number of generations of communications to reach a high UR, say 0.8). 3. The Cultural Transmission Framework Cultural transmission is defined as the communications among individuals from the same (intra-generational) or different (inter-generational) generations. As the medium of language exchange, it plays important roles in language evolution. In this paper, we assume that there is an ongoing optimization process based on linguistic understandability during cultural transmission; individuals who can better understand others in communications may obtain more resources and produce more offspring, and these offspring may maintain some of their parents’ language-related abilities. A cultural transmission framework is simulated under this assumption to test whether this optimization process plays a role in adjusting RC. In the framework, after a number of intragenerational transmissions, some individuals who have higher linguistic understandabilities will become “parents” and produce “offspring”. The offspring replicate their parents’ RC values with some occasional, small changes. GA-like mechanism such as mutation (a tiny increase/decrease in a RC value) is applied during the reproduction. After “birth”, the offspring start to learn from their parents through inter-generational transmissions, and then replace them and other individuals from the previous generation. After that, a new cycle begins. For the sake of comparison, another type of simulations without optimization is implemented, in which agents are randomly chosen as parents to
135
produce offspring regardless of their communicative success in each generation. During the reproduction process, mutation is also applied. In all simulations of this paper, the population has 10 agents. In the first generation, all individuals' RC values are randomly chosen from a Gaussian distribution of RC whose standard deviations are 0.01 and their means range from 0.0 to 1.O in different conditions. In each generation, there are 200 rounds of random painvise intra-generational transmissions and 200 rounds of intergenerational transmissions from parents to offspring. A round of transmissions includes 10 communications among different pairs of agents. After intragenerational transmissions, 5 agents are chosen as parents, each producing 2 offspring. During the reproduction process, a small (0.1) increase/decrease of the RC values occurs with a probability of 0.02 (the mutation rate). The total number of generations is 200. In the simulations with optimization, parents are chosen according to their linguistic understandabilities, i.e., the average percentage of integrated meanings that they, without referring to cues, can accurately understand when others speak to them. In the simulations without optimization, parents are randomly chosen. In each condition of the simulations, the results of 20 runs are collected for statistical analysis. 4. The Simulation Results
Fig. 2 (a) records the average and standard deviation values of the highest UR throughout the simulations in the 20 runs with different initial RC values. UR reflects the average linguistic understandability of the whole population. Fig. 2 (b) illustrates the average CT under different initial RC values. ..
n l .
d l , . , , , 1
02
0 1
D l
05
oe
RC
07
08
OD
3
1
02
03
0.
0s
(16
07
08
OD
t
RC
(a) (b) Figure 2 The simulation results with and without optimization (a) Average highest UR vs RC, (b) Average CTvs RC The dashed lines trace the results with optimization during cultural transmission, and the solid ones trace the results without optimization
In the simulations without optimization, when RC is low (below 0.3), UR is rather low (around 0.125, the UR of the initially shared holistic rules), and a
136
communal language with a high UR does not emerge in the population; when RC lies in the interval [0.4 0.71, a communal language emerges, and the increase in RC accelerates language origin, which is indicated by the decrease in CT; when RC is rather high (over 0.8), an increase in RC does not further accelerate language origin. These results suggest that without optimization, a relatively low RC (around 0.5) is sufficient to trigger a language with a high UR (around OX), and a small increase in RC from 0.4 to 0.5 causes a qualitative change from no language to a communal one. In other words, a small phenotypic change can result in a communication means of a totally different nature (Elman, 2005). In the simulations with optimization, the adequate level of RC to trigger a communal language is further reduced; a much smaller initial RC (0.2) can trigger a communal language with a high UR (over 0.6). In addition, language origin in these simulations is more efficient than that in the simulations without optimization. However, if the initial RC is high (over 0.7), language origin doesn't differ much in these two types of simulations. The evolution of RC values in the simulations with optimization is shown in Fig. 3, in which Fig. 3 (a) traces the average and standard deviation values of initial, maximum and last RC throughout 200 generations and Fig. 3(b) traces the RC values in some particular runs. If the number of generations extends a little bit, say 300, a similar trend is maintained, though the further update (increasing or decreasing) of RC is inexplicit.
nh*l
U."
I\usrag. RC I" m p0pulallon
(a)
Nuntserolpansrslmns
(b)
Figure 3 . The evolution of RC in the simulations with optimization: (a) Statistical results of RC, each line summarizes the initial, maximum and last RC values in the simulations with a particular range of initial RC (from 0.1 to 1.0 with a step of 0.1); (b) Specific RC values in different runs, each line records the RC values at different generations in one simulation.
Two roles of cultural transmission with respect to RC are shown in Fig. 3. The optimization during cultural transmission is based on individual linguistic understandability. Since a high RC contributes to the acquisition of correct
137 linguistic rules that help an individual to accurately understand others’ idiolects, it can be indirectly selected by cultural transmission, and gradually spread in the population. Then, the average level of RC in the population increases gradually in respond. This increasing effect is well illustrated in Fig. 3, especially when the initial RC is low (below 0.8). However, if the initial RC is already high (around 0.7), cultural transmission does not greatly change it, but maintain it throughout the simulations. For a rather high RC in [0.9 1.01 interval, cultural transmission may even lead to a slight reduction of it; its last value becomes slightly smaller than its initial one, as shown in Fig. 3 (a). Slightly reducing a rather high RC is a side effect of optimization. Since these initial RC values are high enough to trigger a communal language, an individual who has a slightly lower value can still have a high understandability, and be chosen as the parent to produce offspring and spread this RC to the population. Then, the average level of RC in the population may slightly drop, without greatly affecting the UR of the emergent language. In this situation, there are a number of communications with no intention sharing during cultural transmission, which provides the opportunity for agents to develop robust linguistic knowledge that needs no assistance of cues or even resists distractions of wrong ones. This reliable language can efficiently describe the events not occurring in the immediate environment, gradually liberate itself from the restriction of nonlinguistic information, and become efficiently used in communications with no cues or other nonlinguistic assistance. Compared with the increasing effect on RC, this reducing effect is not much explicit in the short run, but it is crucial for language evolution in the long run.
5. Conclusions The simulations in this paper demonstrate the roles of cultural transmission in intention sharing. Cultural transmission can adjust the level of this ability to trigger a communal language. Meanwhile, it can also prevent this ability from going rather high so that displacement can establish in the emergent language. Apart from shaping linguistic features such as compositionality and regularity, our study shows that cultural transmission can help to optimize some languagerelated abilities, leading them to optima that are not necessarily the highest possible values. In addition, the framework in this paper can be adopted to study the role of cultural transmission in other language-related abilities, such as the ability to detect recurrent patterns or manipulate local orders. This approach will provide a clear picture on the “mosaic” fashion of language evolution, and may help to
138
verify the claim of Connectionism (Elman, 2005) that small phenotypic changes in our species may yield language as the outcome. Finally, the level of RC is modified via some GA-like mechanisms during inter-generational transmissions based on individual linguistic understandability. The adopted GA-like mechanism does not imply that the ability of intention sharing has to be updated necessarily through genetic transmission, and other optimization mechanisms may play a similar role. Acknowledgements We thank Dr. Christophe Coup6 from Laboratoire Dynamique du Langage, CNRS - University Lumibre Lyon 2 for the valuable discussions. References Cangelosi, A., Smith, A. D. M., & Smith, K. eds. (2006). The evolution of language: Proceedings of the 6th international conference. London: Word Scientific Publishing Co. Elman, J. L. (2005). Connectionist models of cognitive development: Where next? Trends in Cognitive Science, 9, 1 1 1- 1 17. Gong, T., Ke, J-Y., Minett, J. W., Holland, J. H., & Wang, W. S - Y . (2005). A computational model of the coevolution of lexicon and syntax. Complexity, 10, 50-62. Gong, T. (2008). Computational simulation in evolutionary linguistics: A study on language emergence. Taipei: Institute of Linguistics, Academia Sinica. Hockett, C. F. (1960). The origin of speech. Scientific American, 203, 88-96. Ke, J-Y., CoupC, C., & Gong, T. (2006). A little bit more, a lot better: Language emergence from quantitative to qualitative change. In A. Cangelosi, A . D. M., and K. Smith, (Eds.), The evolution of language: Proceedings ofthe 6th international conference (pp. 4 19-420). London: Word Scientific Publishing co. Kirby, S. (1999). Function, selection and innateness: The emergence of language universals. New York: Oxford University Press. Oller, K. & Griebel, U . eds. (2000). Evolution of communication systems: A comparative approach. Cambridge, MA: MIT Press. Pinker, S. & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707-784. Tomasello, M., Carpenter, M., Call, J., Behne, T., & Moll, H. (2005). Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences, 28, 675-69 1. Wang, W. S-Y. (1982). Explorations in language evolution. Osmania Papers in Linguistics, 8. Reprinted in W. S-Y. Wang, (Ed.), Explorations in language, Taipei (pp. 105-131). Taiwan; Seattle, WA: Pyramid Press.
THE ROLE OF THE NAMING GAME IN SOCIAL STRUCTURE TAO GONG, WILLIAM S-Y. WANG Department of Electronic Engineering, The Chinese University of Hong Kong Shatin, New Territories, Hong Kong This paper presents a simulation study to explore the role of the naming game in social structure, which is nearly neglected by contemporary studies from statistical physics that mainly discuss the dynamics of language games in predefined mean-field or complex networks. Our foci include the dynamics of the naming game under a simple distance restriction, and the origin and evolution of primitive social clusters as well as their languages under this restriction. This study extends the current work on the role of social structure in language games, and provides better understanding on the self-organizing process of lexical conventionalization during cultural transmission.
I . Introduction The origin and evolution of language or general communication systems is a fascinating topic in the interdisciplinary scientific community. A number of approaches from biology, statistical physics, and computer science have been proposed to comprehend some specific aspects in this topic (Oller & Griebel, 2000), among which the self-organizing emergence of a shared lexicon during cultural transmission has been extensively studied based on various forms of language game (Steels, 2001) models in the past few years. The naming game is one form of language games that simulates the emergence of a collective agreement on a shared mapping between words and meanings in a population of agents with painvise local interactions (Steels, 2001). A minimal version of it (Baronchelli and Loreto, 2006) studies the main features of semiotic dynamics. In this version, N homogeneous agents are describing a single object by inventing words during painvise interactions. Each agent has an inventory (memory) that is initially empty and can store an unlimited number of words. In a painvise interaction, two agents are randomly chosen, one as “speaker” and the other as “hearer”. The speaker utters a word to the hearer. If its inventory is empty, the speaker randomly invents a word; otherwise, it randomly utters one of the available words in its inventory. If the hearer has the same uttered word in its inventory, the game is successful, and 139
140
both agents delete all their words but the uttered one. If the hearer does not have the uttered word, the game is a failure, and the hearer adds (learns) the uttered word to its inventory. In a mean-field system, the dynamics of the naming game can be traced by N,,(t), the total number of words in the population; Nd(t), the number of different words; and S(t), the average successful rate of interactions among all pairs of agents. Statistical physicists (e.g., Baronchelli & Loreto, 2006; Dall’Asta et al., 2006) have further explored the dynamics of the naming game in structures such as 1D/2D lattices, small-world and scale-free networks. Although these studies extensively discussed the role of social structures in convergence of shared lexicon, most of them neglected the reverse role of the naming game in social structure; in these studies, a successful or failed naming game only affects individual’s linguistic knowledge, but has nothing to do with the predefined social structures. In a cultural environment, successful or failed interactions among individuals can not only adjust their knowledge, attitudes or opinions, but also affect their social connections or political status in the community. Factors that operate on a local scale, such as interaction procedures and geographical or social distance restriction, can adjust the possibilities of interactions among agents, thus affecting individual or group similarities on a global scale (Axelrod, 1997; Nettle, 1999). These simple factors may take place much earlier than the emergence of complex social structures, and cast their influences on formation of primitive social clusters and their communal languages. For instance, during language origin, a successful naming game towards a common object in their environment may form a social binding among the participants of this game, and share a common lexicon among them. These factors may take similar effect in modem societies during language change. For instance, a successful or failed naming game towards a salient concept may form a new binding or destroy an old one among the participants, and adjust their communal languages. Moreover, in order to establish a complex social network in a huge population in which not every two individuals could ever directly interact with, a certain degree of mutual understanding is necessary, and simple language games may play a role in achieving such mutual understanding through local interactions. Therefore, besides its dynamics in some predefined complex networks, the dynamics of the naming game under simpler constraints and its role in social structure are worth exploring as well. In this paper, we present a preliminary study in this respect. Instead of detailed constraints determined by complex networks, we simulate a simple distance constraint, and discuss its influence on formation of social clusters and their communal languages. The simulation traces the coevolution of language
141
and social structure based on the naming game, and the formation of mutual understanding in a population via local interactions among its members, both of which will help us better understand the self-organizing process of lexical conventionalization based on the naming game. The rest of the paper is organized as follows: Section 2 introduces the simple distance restriction; Section 3 discusses the simulation results of two experiments; and finally, Section 4 provides the conclusions and future work.
2. The Naming Game with a Distance Constraint The interaction scenario of our naming game is identical to its minimal version described in Section 1. To introduce distance restrictions, we situate all agents in a 2D square torus (X', X i s the side length of the torus), and each agent can randomly move around to its 8 unoccupied, nearby locations, as shown in Fig. 1. This torus represents either a physical world, or an abstract world, such as the distributions of opinions or social status.
0
Agent
6
Movement
''./
Possib'e locatla"*
CC Successful naming game * - - - C Failed Communication
[:I]
Distance restriction (ZDx+l)r(ZDy+l)
Figure 1. A 2D torus with moving agents.
The distance restriction, inspired from our previous study (Gong et al. 2005) and applied on agent selection during painvise interactions, is defined as follows: The distance restriction: interactions only take place between agents whose coordinates are within a limited block distance (Oxand D,), as shown in Eq. (l), where x,, y , are agent i's coordinates in X2, the second part of each condition calculates the situation where agents are located in boundaries but their block distance may still be within 0, and D, since they are in a torus:
142
This concept of distance can either represent geographical distance such as the city-county distance, or social distance such as the dissident opinions. Under this distance restriction, each agent in the torus can interact with at most (2Dx+l)x(2D,+l)-l(itself) nearby agents. When D, and Dyequal 1, each agent only interacts with those lying in its 8 nearby locations. This restriction provides a binding (bias) for the participants of the naming game: a successful naming game can bind the speaker with the listener, and they tend to move together to maintain their block distance within D, and Dy(in other words, either of them can move in such a way that after movement, their block distance is still within D, and Dy); however, a failed naming game may break down this binding (in other words, either of them can randomly move in any direction). This restriction is much simpler than those defined by complex networks. Based on it, some big social clusters containing agents who share a common lexical but may not necessarily interact directly with each other may emerge and be maintained. These clusters and their shared words could be the prototypes of complex social structures and their communal languages. We design two experiments to evaluate the influence of this simple restriction on formation of social clusters and conventionalization of shared lexicon. In Exp. 1, 100 agents are situated in a lo2 torus (each location in the torus is occupied by an agent), and D,and Dyrange from 1 to 10. In Exp. 2, 100 agents are put into tori whose side length Xranges from 10 to 55, but D,and D, are fixed. In each time step, a random sequence of agents is set, and following which, each agent is chosen to interact with one of the others lying within its distance restriction (if any), and then, it moves, based on the interaction result (successful or failed), to one of its unoccupied neighboring positions (if any). The total number of time step is 100, and the maximum number of possible interactions is lOOx 100=10000. In each condition, the results of 20 simulations are collected for statistical analysis. After a time step, S (the average successful rate of interactions among all pairs of agents) and Nd (the number of different words) are evaluated. If all agents gradually share a common lexicon, S will gradually increase to 1.0 and Nd reduce to 1. In this situation, NT (the number of time steps required to reach the highest S) indicates the degree of efficiency of the distance restriction on lexical conventionalization in the population. On the contrary, if all agents cannot share a common lexicon, but form different clusters, S and Nd will not reach 1. In this situation, Nd indicates the number of isolated clusters, and NT the effect of the distance restriction on lexical conventionalization within clusters. The following sections discuss the simulation results of the two experiments.
143 2.1. Exp. 1: frved torus size but various distance restriction
In this experiment, all 100 agents lie in a lo2 torus; D, and Dy change from 1 to 10. In all simulations, after 100 time steps, a common lexicon is shared in the population; both S and Nd become 1 at the end of simulations. Fig. 2 illustrates the average and standard deviation values of NT under different D, and Dy.
D, and Dy
Figure 2 The statistical results of Nr in Exp 1, each point is calculated based on 20 simulations after 100 time steps and maximum 10000 possible naming games
As shown in Fig. 2, with the increase in D, and Dy, the process of lexical conventionalization follows two regimes: as D, and Dy increase from 1 to 4, agents can interact with more nearby agents and adjust their words, then, the lexical convergence is accelerated and NT drops; when D, and Dy are greater than 5 , each agent can already interact with all the others in the population, then, the lexical convergence is not further accelerated and NT becomes stable. In addition, in a 10’ torus, when D, and Dy are small and each agent cannot directly interact with all others, lexical conventionalization is still accomplished after not many interactions via intermediate agents, and a cluster having agents who cannot directly interact with each other but share a lexicon is established.
2.2. Exp. 2: various torus size but frved distance restriction In this experiment, 100 agents are randomly situated in tori whose side lengths increase from 10 to 55 with a step of 5. D, and Dy are fixed to 5. Fig. 3 illustrates the average and standard deviations of S, NT, and N d in Exp. 2. The process of lexical conventionalization in Exp. 2 also follows two regimes: when X is smaller than 30, all agents in the population form a huge cluster and share a common lexicon; however, after X reaches a certain level (say, 30), S begins to drop, and both NT and Nd begin to increase. In a relatively small torus (Xis smaller than 30), although agents may not find many others within their distance restrictions, through moving around, they can encounter some agents and get their words converged to a shared lexicon. However, in a
144
big torus (Xis bigger than 30), this 1-step movement is insufficient for agents to meet many others and the big torus size greatly restricts the local interactions among agents, then, isolated, smaller clusters gradually emerge, and each of which shares a common lexicon. The drop of S and increase of Nd both indicate the emergence of small clusters. Within a cluster, S among the cluster members is high, but between clusters, S among members of different clusters is low, since they may share different words. In addition, once such clusters are formed, it is difficult for agents within clusters to interact with outsiders, since they tend to maintain their distance among each other and not to freely move. In a sense, the bindings within clusters are relatively strong, and these clusters and their shared words are relatively stable, which are indicated by the stable values of S(t) and N& for a long time in specific simulations.
Figure 3 . The statistical results of S (a), NT (b), and N,I (c) in Exp. 2, each point is calculated based on 20 simulations after 100 time steps and maximum 10000 possible naming games.
A “local convergence, global pulurizalion” phenomenon (Axelrod, 1997) is shown in Exp. 2 under a big torus: agents within clusters clearly understand each other via a shared lexicon, but those between clusters do not, since they may share different words. This phenomenon partially reflects the coexistence of many languages in the world, and it is mainly caused by the distance restriction and mutual understanding during local interactions. Besides, if we assume that agents are developing a basic vocabulary using the naming game, these simulations may actually trace the concurrent emergence of different vocabularies, and later on, different languages in the early stage of language development in the world. Second, combing Exp. 1 and Exp. 2, the boundary values of the distance restriction and torus size suggests a quantitative relation between the local view and the world size. Roughly speaking, the current results seem to show that given a certain number of time steps (loo), once the local view (2Dx+1)x(2Dy+1) is smaller than 1/10 of the torus size, the whole population will neither
145
efficiently form a cluster nor share a common lexicon. Further statistical analysis in simulations with bigger populations can confirm this prediction. Finally, people may intuitively think that under random or biased movements, sooner or later, all agents will encounter all others, and since the naming game can easily make the participants’ vocabularies converge in one interaction, all agents will eventually form a big cluster. However, two arguments are against such prediction. First, in the case of random movements, this process may take extremely long time. In our model, once agents form close clusters, those in the central may not easily move since all their neighboring locations are occupied by others. Therefore, even given an extremely long time, the formation of a big cluster may not occur. Second, the convergence role of the naming game may also cause divergence of a cluster, since the convergence is made via deleting all the other words in the participants’ vocabularies. For instance, if Agent 1 with Word A interacts twice with Agent 2 in a cluster where all agents only use Word B, Agent 2 may diverge from this cluster and form a new one with Agent 1 using Word A, and then, the agents in Agent 2’s original cluster has to interact at least twice with Agent 2 to drag it back to their cluster. This process introduces fluctuations that may delay lexical conventionalization. Therefore, even if agents, through random or biased movements, have chances to encounter all others, or all of them are within certain distance restriction, their vocabularies might not quickly converge. This partially explains why in the mean-field system all agents still need many rounds of naming games to conventionalize their vocabularies. Such fluctuations also show in our results and help to maintain the polarization state; the clusters are dynamically stable; their boundary agents may occasionally change, but their shared lexicon, sizes, and majority candidates remain roughly unchanged in a long run. 3. Conclusions
The simulations in this paper demonstrate the role of the naming game in social structure: the naming game under the simple distance restriction can adjust the social binding among agents and form primitive clusters based on mutual understanding. This line of research is largely neglected in contemporary studies that mostly focus on the impact of social structures on language games (e.g., Delgado, 2002; Dall’Asta et al., 2006). We present two experiments to illustrate the dynamics of the naming game under distance restrictions and word sizes. First, a big cluster sharing a common lexicon can be formed among individuals whose local views (distance restriction) might not allow them to see all members in the population. In addition, there is a
146
close relation between the local view and the world size: under a fixed world, the increase in the local view accelerates the conventionalization of individual knowledge; under a fixed local view, the increase in the world size triggers the emergence of different clusters and linguistic divergence, i.e., common knowledge (shared lexicon) is developed within clusters, but heterogeneity (different shared words) occurs between clusters. Furthermore, the enlarging local view may be reminiscent of the growing mass media and the “global village” phenomenon in recent centuries, while the fixed local view with increasing world sizes may represent the reality that people do have such a constraint of a relatively limited view. Considering these, our model may address a scenario with these two competing conditions, and other activities like opinion formation (Rosvall & Sneppen, 2007) may follow a similar scenario. Acknowledgements
We would like to thank Dr. Jinyun Ke from Michigan University and my colleague Dr. James Minett for the valuable suggestions and discussions. References
Axelrod, R. (1997). The dissemination of culture: A model with local convergence and global polarization. The Journal of Conflict Resolution, 41, 203-226. Baronchelli, A. & Loreto, V. (2006). Ring structures and mean-first passage time in networks, Physical Review E, 73, 026103. Dall’Asta, L., Baronchelli, A., Barrat, A,, & Loreto, V. (2006). Nonequilibrium dynamics of language games on complex networks. Physical Review E, 74, 036105. Delgado, J. (2002). Emergence of social conventions in complex networks. ArtrJicial Intelligence, 2002, 141, 17 1- 185. Gong, T., Minett, J. W., & Wang, W. S-Y. (2005). Computational exploration on language emergence and cultural dissemination. Proceedings of IEEE Congress on Evolutionary Computation, 2, 1629- 1636. Nettle, D. (1999). Using social impact theory to simulate language change. Lingua, 108,95- 1 17. Oller, K. & Griebel, U. eds. (2000). Evolution of communication systems: A comparative approach. Cambridge, MA: MIT Press. Rosvall, M. & Sneppen, K. (2007). Dynamics of Opinions and Social Structures, arXiv:0708.0368~1[physics.soc-ph]. Steels, L. (2001). Grounding symbols through evolutionary language games. In A. Cangelosi and D. Parisi, (Eds.), Simulating the evolution of language (pp. 2 1 1-226). London: Springer-Verlag.
DO INDIVIDUALS’ PREFERENCES DETERMINE CASE MARKING SYSTEMS?
DAVID J. C. HAWKEY Language Evolution and Computation Research Unit, Edinburgh UniversiWs 40 George Square, Edinburgh, EH8 9LL, UK [email protected] The typological distribution of case marking systems presents a puzzle which some linguists have tried to solve in terms of the preferences of individuals. In this paper I highlight some flaws in these approaches, and argue that the typological facts are best dealt with from a diachronic perspective. Processes which could plausibly have given rise to case systems need not have any relation to hypothesised individual preferences concerning case marking. This divergence between putative individual preferences and reasons for the development of linguistic structures undermines the notion that generally typological facts can be informatively illuminated as optimal solutions to aggregated individual preferences.
1. Explaining case systems Accusative case systems mark S NPs the same as A NPs and differently to 0 NPsa In contrast, ergative case systems mark S and 0 NPs in the same way but A NPs differently. Somc languages use different case systems for different kinds of NP. An interesting typological generalisation can be described using a hierarchy of NP types ordered by their likelihood of performing the A role (the “nominal hierarchy’’, see Fig. 1): ergativity (when present) characterises all NP types from the right hand end of Fig. 1 up to a (language dependent) point. Similarly, accusativity (if present) affects all NP types from the left up to a certain point.
1.1. Discourse “motivations”
Du Bois (1987) sought to demonstrate the existence of a motivation in discourse for ergativity. Analysing a set of Sacapultec speech samples, he found that novel this paper I use S to refer to the NP argument of an intransitive clause and A and 0 to refer to the subject and object respectively of transitive clauses. Note that S can be defined purely syntactically, but A and 0 refer to syntactic categories which generally have certain semantic properties (A is on the whole the argument which could initiate or control the activity, and 0 is then simply the other argurnent).This paper is only concerned with morphemes attached to NPs to indicate their S/A/O roles. Other systems, such as ergative/accusative cross referencing of NPs with verb affixes, are not dealt with here. (I use bold font for S, A and 0 to avoid confusion with capitalised indefinite articles.)
147
148 Nominative-accusative
1st person pronouns
--+
2nd person pronouns
Demonstratives, 3rd person pronouns
, Proper Human nouns
Common nouns c
Animate
-
1
Inanimate
Ergative-absolutive
Figure 1. The nominal hierarchy (from Dixon, 1994).
discourse entities were almost always introduced in S and 0 roles. Du Bois argued that the pattern was likely to be common across languages, and interpreted this as being the “motivation” for ergativity: the absolutive case exists to “accommodate”b new information. He also suggested a competing motivation which would favour accusativity, namely that mentions in S and A roles are typically human, agentive and topical. Du Bois appealed to differences between NP types in terms of the degree to which these competing motivations apply to explain typological patterns of split ergativity. NPs higher up the nominal hierarchy are rarely used to introduce new information, and so the motivation for an ergative/absolutive distinction is weaker than for NPs lower down the hierarchy. Thus, higher up the hierarchy the motivation to equate S and A is more likely to dominate giving rise to accusative marking for (e.g.) pronouns, while further down the hierarchy, the pressure to accommodate new mentions (in S and 0) dominates, giving rise to ergative marking for (e.g.) common nouns. However, Du Bois didn’t propose a mechanism by which these motivations could give rise to case systems. In Kirby’s (1999) terms, he left the “problem of linkage” unsolved. If the motivations are interpreted as preferences of the speaker and/or hearer, they are rather unconvincing. Why should individuals prefer that new information be accommodated by an ablative case? It isn’t clear that highlighting the fact that an NP may not have been mentioned before (though the majority of absolutive mentions are not new according to Du Bois) has any significant effect on individuals’ linguistic intercourse. Nor is it clear that if it did, this fact would lead to the development of ergative case marking. Similarly, it is unclear that using the same case (on the whole) for the topic of discourse in both transitive and intransitive utterances produces any significant benefit, nor that were such benefit to exist it would lead to the development of accusative case marking.
1.2. Evolutionary Game Theory Approach In a recent paper, Jager (2007) attempts to account for the typological facts of case marking using Evolutionary Game Theory (EGT). His approach relies on stipulating a set of possible speaker strategies and hearer strategies, and comparing b D Bois ~ couldn’t say that the absolutive case “signalled new information as only about one quarter of absolutive mentions in his corpus were mentions of new discourse entities.
149
their “utility”. Jager divides NPs into two categories, prominent (p) and nonprominent (n), according to their position on the nominal hierarchy.“ This gives four NP categories: Alp, Aln, Olp, and O h . Speaker strategies determine which of these categories will be case marked. All S NPs are assumed to be unmarked (as is the case in most languages, Dixon, 1994, p. 63). Speaker strategies can be notated as a string identifying whether Alp, Aln, Olp and Oln NPs have zero (z), A identifying ( e ) or 0 identifying ( a ) marking. Thus the string ezaz represents the strategy that leaves non-prominent NPs unmarked and marks prominent NPs according to their role.d Hearer strategies all correctly interpret any utterance with case marking on at least one NP, and are differentiated by their response to utterances with two unmarked NPs: either word order is assumed to be consistent (though no speaker strategies employ consistent word order) or the arguments are interpreted as A and 0 on the basis of their prominence (when the NPs have the same prominence, the hearer guesses their roles). Strategies are assigned a “utility” value according to how often they lead to successful communication given a population of other strategies, and how many case marking morphemes are required per utterance on average (speakers are assumed to have a preference for strategies which lead to the production of fewer morphemes). For various different prominenthon-prominent split points, Jager counted the number of Alp, Aln, Olp, and Oln NPs in speech corpora, and used these frequencies when calculating the utility functions of speaker and hearer strategies in various populations.e The utility function is interpretedf in a manner analogous to the fitness of a biological system: strategies with greater utility are more likely to be employed at a later stage. New strategies may enter the population by “random mutation”. Jager identifies strategy sets which are evolutionarily stable (that is, persist given a low enough rate of mutation), some of which represent attested languages. However, the model excludes the pure accusative speaker strategy (zzaa, which characterises existing languages such as Hungarian), and includes typologically uncommon strategies (zzza and ezzz). There are a number of objections to Jager’s mode1.g However, the most signif‘Actually, Jager employs a slightly different hierarchy. For A role, the hierarchy is: pronoun + name definite full N P indefinite specific NP + nonspecific indefinite NP. The 0 role hierarchy is the reverse of this. dAn attempt to apply the terms ergative and accusative to the case markers in this strategy would be confusing. Rather. prominent NPs have a tripartite case system and non-prominent NPs have neutral (no) case marking. ‘It is these frequencies which introduce the asymmetry between A and 0 roles. ‘Jager mentions several different interpretations of his model. The differences between these interpretations do not change the mathematics of his model. gThe split point on the nominal hierarchy is arbitrarily determined to be at the same place for both A and 0 roles, in spite of the fact that for some languages (e.g. Yidiny) A and 0 marking overlaps in the middle of the nominal hierarchy (producing tripartite marking for these NPs, see Dixon, 1994,
+
+
150
icant arises from his attempt to explain why the evolutionarily stable ezzz and zzzu speaker strategies are so rare. Jager appeals to the notion of stochastic stability: some evolutionarily stable states are less resistant to invasion by mutant strategies (in the sense that it requires a smaller proportion of the population to mutate for the whole system to change state). To illustrate this concept, he compares ezzz and zzuzspeaker strategies (in the context of hearer strategies based on NP prominence rather than word order). A system dominated by ezzz will change to being dominated by zzuz if only 2.1% of the strategies happen to mutate from ezzz to zzaz (in contrast, the reverse state change would only occur if 97.9%of strategies mutated in the opposite direction). However, this raises the question: how could this kind of mutation occur? Certainly speakers could not simply adopt an alternative case system out of the blue, nor could they re-use the case markers they already have but switch their function. Jager’s model assumes that hearers always interpret case marked NPs correctly, but in order to do this they presumably must have learned the functions of the case markers they are presented with. Invention of a new case marker or the re-use of an old one for a new function would simply produce confusion (which has a low utility value). Jager notes that comparative methodology has suggested a number of pathways by which case systems can develop, but his model ignores these. His stochastic stability analysis is sensitive to the probabilities of each type of “mutation” occurring, but he adopts the “null hypothesis” that all probabilities are equal, citing the paucity of recorded evidence for the emergence of ergative systems. However, while we may lack evidence specifically for ergativity, we have abundant evidence concerning the development of other syntactic phenomena (McMahon, 1994). That evidence presents a picture of diachronic development as a gradual change whereby patterns of usage change without discontinuity (or individuals’ metalinguistic insight). If we apply that picture to ergativity, we may profitably ask what sorts of changes are likely to have given rise to ergative case marking. And, I suggest, in answering that question we also gain insight into the typological facts of differential NP ergative marking. 2. On the Origins of Split Ergative Systems Garrett (1990) presents an account of the development of ergative marking in Hittite from an earlier ablative case marker which could also have an instrumental sense. He suggests that this development was possible because of the functional overlap between instruments and agents in clauses with transitive predicates. For example, English John extinguished the fire with water and water extinguished 54.2). Another problem is that consistent word order is not available as a speaker strategy, though it is available as a hearer strategy (redundantly, in both A 0 and OA orders). A combination of consistent argument order speaker and hearer strategies would be evolutionarily stable, making perfect understanding of the utterances in Jager’s model possible without any costly case morphology.
151
the Fire. A Hittite translation of the first of these sentences would have ‘John’ in nominative case, ‘the fire’ in accusative, and instrumental/ablative marking on ‘water’. If the A core argument (‘John’) were omitted (giving a Hittite equivalent of the second English example), ‘water’ could be (re)interpreted as fulfilling the A role and the instrumental/ablative marking reanalysed as marking this role. A similar process for intransitives is unlikely given the fact that thematic instruments are rare (or absent) in the subject role of an intransitive clause.h This asymmetry means that it would only be for A (and not for S) that the ablative/instrumental would be reanalysed as a core role case marker. Thus the reanalysis of instrumental marking is a possible source of ergativity. Garrett explained the hierarchy for NP split ergativity by relating it to the likelihood that an NP would be an instrumental: instrumental pronouns are rare and, for pragmatic reasons, animate NPs in instrumental function are unusual. Those NPs which are less likely to fill an instrumental role are also less likcly to develop an ergative case via Garrett’s route. Garrett (1990) also argues that a similar development from an instrumental produced ergative case marking in the prehistory of the Gorokan language family. Thus, he presents evidence for this process happening twice. While Garrett’s explanation is convincing for the languages he analyses, it isn’t clear that it accounts for all NP conditioned ergative splits: Dixon (1994, p. 104) notes that a number of languages employ ergative marking for only some NPs (on the lower end of the nominal hierarchy) and only in the past tense. It seems unlikely that in these languages ergative marking arose from an instrumental used only in the past tense. An alternative route for the development of an ergative case is via the reanalysis of a passive construction. Passive constructions are intransitive clauses derived from active transitive clauses in which the NP which would fill the 0 role in the active transitive syntactically takes the S role. The NP which would fill the A role in the active transitive may be omitted or expressed as an oblique. Languages may employ a passive construction for a number of reasons, including satisfaction of syntactic constraints in the coordination of multiple clauses, to avoid mentioning the entity performing the A role, and to place focus on the 0 NP. If the passive becomes the dominant construction (replacing the previous active transitive), the oblique case marking on the A NP would become an ergative case marker (see Fig. 2). This explanation is favoured by most linguists as an account of the origin of ergativity in modern Indo-Iranian languages (Trask, 1996). In Sanskrit, the passive construction was used mainly in the past tense, and this fact is appealed to to explain the use of ergative case only in the past tense (in, for example, Pashto). Would the development of an ergative case system from a passive construction apply equally to all types of NP? If some kinds of NP are rarely expressed in the oblique case of a passive construction, they would be less likely to be reanalysed hE.g.English John walks with a cane but not *a cane wulks (Garrett, 1990)
152 Accusative Intransitive Active transitive Passive (transitive)
S-x
Ergative
V
A-x 0 - y V 0 - x (A-z) V
==+
Intransitive Transitive
S-x V
o-xA-z
Figure 2. Development of ergative case from passive construction. -x, -y and -z represent case markers. S, A and 0 represent NPs. In the passive construction on the left, S, A and 0 represent the roles these entities would play in the corresponding active construction (also on the left).
with an ergative case as the form A-OBLIQUE(Fig. 2 ) would be systematically missing prior to the reanalysis. Thus if NPs from the left of the nominal hierarchy are rarely expressed as oblique arguments of a passive construction, reanalysis of a passive could be another route by which NP split ergative systems could develop (and may account for some of the combined splits mentioned above). To test whether certain kinds of NP are systcmatically excluded as obliques, I extracted the passive construction arguments in the Switchboard corpus (recorded telephone conversations) of the Penn Treebank.’ In order to determine whether certain NP types are rarer in the oblique case of a passive construction than in the A role of an active construction, I compared arguments in active and passive clauses. Comparing all passive clauses with all active clauses could potentially bias the result if the passive construction is commonly used with an unrepresentative distribution of verbs, and different verbs vary with respect to the kinds of NPs commonly act as arguments. Therefore, passive and active constructions were matched for the main verb. Table 1 shows the results comparing first, second and third person pronouns (singular and plural) with all other types of NP. The passive construction most frequently omitted the NP that would semantically take the A role, but this omission didn’t apply evenly to all NP types. A chi squared test performed on the numbers of expressed underlying-A-role arguments revealed a significant effect ( ~ ~ (N3 = , 1075) = 171.1,p < O.Ol), allowing rejection of the null hypothesis that the absence of pronouns in the passive was simply due to the rate at which underlying-A-function arguments were omitted. Thus, in English at least, the way the passive construction is used systematically excludes pronouns as oblique arguments to a greater degree than non-pronoun arguments. Were a language with this pattern to develop an ergative case from a passive construction, the ergative pattern would likely apply to non-pronoun arguments and not to pronounsj As Garrett (1990) notes, analogical extension of ergative case marking from ’http://www.cis.upenn.edu/”treebank/ ’There are two minor wrinkles with this scenario concerning clauses with one prominent and one non-prominent argument. We may assume that Alp-Oln clauses would simply loose the now redundant old accusative case marking on the analogy with Aln-Oln clauses. and that the new absolutive case marking on infrequent (according to Jager’s counts) Aln-Olp clauses (derived from the passive) would be replaced with accusative case on analogy with the more frequent Alp-Olp clause type.
153 Table 1. A role arguments of active and passive clauses
Active Passive
First 313 1
Second 156 0
Third 328 1
Other 217 59
nominals to pronominals, while possible, is unlikely given that nominal and pronominal morphology are typically divergent. Thus a split ergative system with ergativity confined to nominals is likely to persist. 2.1. The relationship between A and S roles
The pathways outlined above began with an initially accusative language. Is it plausible to take accusativity as a starting point? As Dixon (1994, p. 14) notes, “there are some languages that appear to be fully accusative, in both morphological marking and syntactic constraints [. . . however] no language has thus far been reported that is fully ergative”. One explanation of this fact appeals to diachronic processes that lead to common forms for A and S NPs, irrespective of how accusative/ergative the language already is. If semantic change from a transitive to an intransitive verb most often involves the omission of the 0 NP, the result will be an intransitive verb with S marked in the same way as the original A NP. (Similarly, addition of an 0 to an intransitive would produce an accusative transitive.) The mirrors of these processes (such as A addition/omission) would produce ergativity. Is there any evidence that accusative-producing changes are more common? Du Bois (1987) found a clear association between human mentions and A and S, and continuity between subsequent clauses in the identity of entities in S and A, but not S and 0.Thus development of intransitive clauses from transitives by loss of 0 would be more likely as the resultant S would typically be the kind of entity speakers use intransitives with. Dixon (1994) appeals to such changes as a possible source of split-S marking. In an ergative language, a transitive verb which lost its 0 argument would produce an intransitive with A-like marking of S (in contrast to other intransitives with 0-like marking on S). Recurrence of this change could produce a situation in which the majority of S NPs were marked with A marking; from here all S NPs could become marked like A, either by generalisation of the dominant intransitive pattern, or by loss of the remaining intransitives with unmarked S. This series of changes would produce a “marked nominative” language (S and A treated in the same way but 0 unmarked). Marked nominative languages are rare, but “marked absolutive” (i.e. {SO} marked, A unmarked) are even rarer (Comrie, 2005), and this may be taken as indirect evidence that intransitives are more likely to develop from transitives by the loss of 0 than by the loss of A (a process which could produce a marked absolutive from an accusative language).
154
Diachronic processes which are sensitive to the functional overlap of S and A may thus explain why accusativity is more common than ergativity. Again, these processes are independent of language user’s cognitive case-system preferences.
3. Conclusions The diachronic processes mentioned in this paper probably do not exhaust the possibilities for how ergative case systems can develop, nor is every typological detail explained by these processes. A common feature of the pathways outlined here is that they have little to do with a functional account of case. The omission of subjects in clauses containing instrumentals, or the change from the active to the passive as the default clause type (which Trask, 1996,suggests may have been related to politeness), are two processes that happen to produce split case systems that fit Du Bois’ “motivations” without satisfaction of those motivations being the rationale for the development of case systems. Similarly, it seems unlikely that a passive construction would come to be the dominant form for Aln transitive clauses because of individuals’ preferences for fewer morphemes per clause. The applicability of Jager’s model seems to be restricted to changes from one strategy to another which only involve the loss of a case marker. By drawing attention to the frequencies of prominent and non-prominent argument combinations, the model suggests that the loss of some case marking morphemes may be more likely than others (those likely to be lost being those whose omission would rarely lead to communicative breakdown and the need for repair). In contrast, when it comes to changes which introduce new morphemes, Jager’s null hypothesis of equal mutation probabilities should be improved by adopting a diachronic perspective. However, once plausible diachronic pathways are identified, it seems there is little the EGT analysis can add to our understanding, and much that it obscures by presenting changes from one case system to another as if they always result from a trade-off between disambiguation and production effort. References
Comrie. B. (2005). Alignment of case marking. In M. Haspelmath, M. Dryer, D. Gil, & B. Comrie (Eds.), The world atlus of lunguuge structures. Oxford: Oxford University Press. Dixon, K. M. W. (1994). Ergativify (Vol. 69). Cambridge: Cambridge University Press. Du Bois, J. W. (1987). The discourse basis of ergativity. Language, 63(4), 8054355. Garrett, A. (1990). The origin of np split ergativity. Language, 66(2), 261-296. Jager, G. (2007). Evolutionary game theory and typology: a case study. Language, 83(1), 74-109. Kirby, S. (1999). Function, selection and innateness. Oxford: Oxford University Press. McMahon, A . M. S. (1994). Understanding lunguuge change. Cambridge: Cambridge University Press. Trask. R. L. (1996). Historical linguistics. London: Arnold.
WHAT IMPACT DO LEARNING BIASES HAVE ON LINGUISTIC STRUCTURES?
DAVID J. C. HAWKEY Language Evolution and Computation Research Unit, Edinburgh University, 40 George Square, Edinburgh, EH8 9LL. UK [email protected] Recent work modelling the development of communication systems has suggested that linguistic structure may reflect cognitive structures through the repeated effect of biased learning, the language adapting to conform to the learning preferences of its users. However, the notion that an individual’s learning is biased can be cashed out in numerous ways. This paper argues that different ideas of what a “learning bias” is produce different population level effects. Biased learning may result in the population’s maintenance of structures disfavoured by the bias (cultural inertia) which we can think of as arising through a variety of diachronic processes. Without clear articulations of the relevant diachronic processes and an empirically sound notion of biased learning, the assumption that linguistic structure reflects individuals’ language learning psychology is premature.
The use of computer simulations in the field of language evolution offers the potential to illuminate cultural processes impacting on the structures of communication systems developed (as opposed to designed) by groups of individuals. Such simulations commonly employ agents endowed with a learning mechanism (and, usually, a pre-defined communication channel). Simulations are generally designed such that an agent’s learning mechanism responds to episodes of communication (in which the agent might be playing the role of speaker or hearer) in such a way as to increase the success of future communicative interactions. As agents repeatedly (iteratively) learn from each other, they settle on relatively stable shared functioning systems (at least, in simulations considered successful). The properties of such emergent systems commonly depend, among other things, on the properties of the learning mechanisms chosen for the agents: for example Kirby, Dowman, and Griffiths (2007) present a model in which a Bayesian learning algorithm with a nominallya weak learning bias produces strong linguistic universals through the process of iterated learning. aIn the model of Kirby et al. (2007), the Bayesian posterior probability of a hypothesis given some input data (see their Q. 1) does not play the role of the probability that the agent will choose that hypothesis. Rather, the agent chooses the hypothesis which has the maximum posterior probability. If the “bias” is thought of as a relationship between the learner’s experience and their behaviour,
155
156
Interpreting these simulations as models of the evolution of human communication systems, it seems that properties of human languages may be explicable by appeal to the learning biases of individuals. In a recent paper, Dediu and Ladd (2007) present a correlation between the presence of linguistic tone and two genes (known to be related to brain size) which they argue is not a reflection of shared linguistic history or geographical proximity. They suggest that this correlation might be due to the genes producing a weak learning bias for or against tone. The bias is assumed to be weak because it seems that anyone can (eventually) learn tonal and non-tonal languages, whatever their genetic makeup. However, the notion of a learning bias is somewhat vague and can be operationalised in a number of different ways. This paper investigates some different ways of modelling a learning bias, and asks how such biases can influence the communication systems that emerge in simulations of language evolution. Given contemporary ignorance, confusion and dispute over the processes of language learning, it would be foolhardy to try to implement realistic learning models, as what would count as “realistic” is utterly opaque. Rather, these simulations are intended to illustrate the difficulties attached to interpreting linguistic features (or distributions over these) as reflections of individual psychology.
1. Three models of learning bias Kirby (1999) presented a model in which learning was biased in favour of syntactic structures judged to be less complex by a certain theory of processing (viz. Hawkins’ (1990) processing theory). The learning algorithm used by Kirby (1999) implemented a bias for lower complexity by always setting the probability that an agent would produce the less complex form to be greater than the relative frequency with which that form had been encountered by that agent. This kind of learning bias I will refer to as a transformational bias to capture the notion that the average effect of biased learning is always to transform the relative frequencies of variants (from input to output) in favour of the biased form. A transformational bias, by definition, consistently biases the outcome of learning towards a particuarguably it should not be interpreted as the “prior” distribution over hypotheses used in the calculation of the posterior probability. Consider the trivial case of two hypotheses (hl and hz) and only one possible data set (d): both hypotheses produce only d so P(dlh1) = P(dlh1) = 1. If the prior is characterised as P(h1) = 0.50001 and P(h2) = 0.49999, a learner presented with data d will alwuys adopt h l . Thus, in this case, while the prior is only marginally in favour of h l , the learner is maximally biased toward hl. The hypothesis/data combinations explored by Kirby et al. have numerous data sets for which different hypotheses have the same production probability (i.e. the same P(dlh)),and when presented with such data learners will always choose the hypothesis with greater prior probability, regardless of how small the margin is. Different learners with different priors may be statistically behaviourally indistinguishable, raising the question of how to interpret these different priors, i.e. what the difference in the model is supposed to correspond to in reality. Interpretation in terms of a private language of expectation will sink in a philosophical quagmire whose articulation is far beyond the scope of this paper.
157
lar state. However, this is not the only way the notion of a learning bias could be interpreted: a bias may be thought of as an effect on the process of learning, and as such produce more complex, less consistent relationships between input and the outcome of learning. For example, a learning bias could be taken as what an inexperienced individual is most likely to do (which we could refer to as a default strategy bias), or it could be taken as meaning that certain linguistic forms are more readily learned than others (an ease of learning bias).
2. Learning Biases in an Iterated Learning Model The effect of these three kinds of learning bias were investigated using a simple iterated learning model. In the model, agents can select one of two possible variants in each interaction. Agents select variants stochastically, and learning affects the probability with which a particular form is chosen.b Agents interact with each other in randomly chosen pairs. During an interaction, one agent (the speaker) stochastically produces a form which the other agent (the hearer) learns from. Simulations consist of a fixed number of agents (the population), and after a number of interaction events a number of the oldest agents are removed from the population (they die) and are replaced by new (born) agents. The selection of agents as speaker and hearer is independent of agents’ degree of experience: agents learn from other agents throughout their lives, and agents with no experience have as much chance of being selected as a model from which to learn as more experienced agents. This unrealistic assumption (that agents can affect the language from birth) is made to increase the chance that agents’ biases will come to be reflected in the language. The two forms agents can produce are abstractly referred to in the model as a and b. The implementations of the various learning rules are biased towards the production of form a. Populations are initialised with agents who on the whole do not produce more a forms than b (and, in many cases produce more b forms than a ) , so these simulations can be seen as testing whether various learning biases can change the population’s system.
2.1. Learning rules It is worth stressing again that the models here make no pretence of being realistic. The learning rules employed are designed to capture different intuitive notions of biased learning. The transformational learning rule is modelled on the rule used by Kirby (1999, p. 48). However, rather than being used to calculate the probability that an agent will end up with a given competence given a fixed set of observations, bThis is contrasts with Kirby’s 1999 model which probabilistically assigned each agent a deterministic competence
158
the current model uses the rule to continuously update the probability that the agent will produce form a the next time it is a speaker. The probability, p that an agent will produce form a is given by Eiq. (1) where na and n b are the numbers of a and b forms the agent has experienced other agents producing during its lifetime, and a is the biasing parameter which ranges between zero (strongest bias) and one (weakest bias).c P=
na
+
n, anb In contrast to the transformational learning rule, the default strategy and ease of learning biases are implemented without an explicit representation of the frequencies with which forms have been encountered. Instead, when an agent learns as the hearer in an interaction, its probability vahe is modified by an amount which depends on the form it is presented with. Eq. (2) expresses the relationship between the change in probability (Ap), a “learning rate” parameter (A), the identity of the presented form ( a ,which is equal to one when the heard form is a and minus one when b is heard) for the default strategy bias. The terms in Eq. ( 2 ) involving p make Ap smallest when the agent has a particularly strong preference for producing one form rather than another. Agents’ default strategy bias (that is, their preference for producing form a prior to learning) is modelled by setting p to a high value for new agents entering the population.
Ap= A.a.p(l - p )
(2)
An ease of learning bias is modelled using a similar learning rule, but with a modification that favours changes in the value of p in one direction over the other. In Eq. ( 3 ) , the parameter P can vary from zero (no bias) to one (strongest bias). Inexperienced ease of learning agents begin life with p = 0.5.
2.2. Results and Explanation The results of some selected simulations are shown in Fig. 1 which plots the population mean probability that form a will be produced. During simulations with the transformational bias, this probability generally increases with time, until a dominates. In contrast, during simulations run with the default strategy and ease of learning biases, the direction of change of the population mean probability at a given moment in time depends on the values of the various parameters and the proportion of a forms being produced at that time. The system can evolve to be dominated by either the bias-favoured form or the bias-disfavoured form. =Agents with no experience, i.e. with na = 0 and n b = 0, produce form a with a 50%probability.
159
Figure 1. Qualitative results of using different learning biases. Simulations run for different numbers of iterations are scaled and plotted on the same axes to facilitate qualitative comparison. Numerical details: all simulations had 100 agents. Trunsformutionul hius: one agent replaced every 100 interactions. simulation run for 30,000 interactions, 01 = 0.4, simulation began with agents having n, = 1 and n b = 100. Defuulf strutegy bius: one agent replaced every 400 interactions, simulation run for 10,000 interactions. X = 0.1, simulation began with agents having p = 0.4 (lower) or p = 0.5 (upper). Ease ofleurning bius: same as default strategy except simulations began with p = 0.3 (lower) and p = 0.4 (upper), and fi = 1/3
One virtue of using such simplistic learning rules is it is easy to develop a perspicuous account of why populations behave in the ways they do. Transformational bias Consider a new agent introduced to the population when the average probability of producing a among the rest of the population is 7r. After playing the role of hearer in N interactions, the expected values for the numbers of observed forms, (n,) = 7rN and ( n b ) = (1 - r ) N , can be substituted into Eq. (1) to calculate the expected value of the agent’s production probability, ( p ) . As can be seen in Eq. (4), this expected value is always greater than 7r unless r is equal to one or zero (in which case only one form exists in the population) or a 2 1 (in which case the agent is not actually biased in favour of a). Thus a new agent introduced to a population will, on average, produce form a with greater probability than the rest of the population: therefore, the introduction of a new agent will, on average, increase the mean probability that the bias-favoured form will be produced (irrespective of what that probability actually is).d dNew agents do not have to be added for the transformational bias to drive the system to domination by the bias-favoured form. Agents always produce form a with greater probability than the relative frequency with which they have observed it, and thus their contributions always serves to increase the relative frequency of a throughout the simulation. However, without the introduction of new agents, the speed with which the form a comes to dominate is much slower.
160 7 r ( l - 7r)(1- a )
( p ) = 7r
+
7r
(4)
+ a(1 - 7 r )
Default strategy and ease of learning biases Consider an agent playing the role of hearer using the ease of learning rule in Eq. (3). If the probability that a randomly selected speaker from the rest of the population (i.e. not the hearer) will produce form a is again 7r, then the expected change in the hearer’s production probability value will be given by Eq. (5).
(Ap) = A . ( 2 +~p - 1) . P ( P - 1)
(5)
The direction of (Ap) is dependent on the relationship between 7r and p. While the value of 7r varies with the identity of the hearer, we can estimate the direction of change of the population mean probability by substituting this probability, p , for 7r in Eq. (5). Thus, roughly speaking, the expected change in the population’s production probability will be zero when p = (the unstable equilibrium), positive when p is greater than this value and negative when p is less than this value. The ease of learning bias always has p > 0, so the unstable equilibrium value of p is below 0.5. Thus the introduction of new agents can change the dominant form produced by the population if the effect of their nalve language production manages to increase p beyond a certain value which is generally lower than 0.5 (thus, for example, increasing the rate of population turnover would make the effect of the bias more likely to have an impact). In simulations run with the the default strategy bias, the learning rule is effectively the same as the ease of learning strategy with ,B = 0, so the unstable equilibrium value is p = 0.5. The introduction of an agent with a default strategy bias serves to move the population production probability towards the biased agent’s initial production probability. If, following the introduction of the biased agent, p is still less than the equilibrium value, the expected effect of each subsequent interaction will be to reduce j?j (in part, by reducing the new agent’s production probability). Whether the introduction of new agents with a default strategy bias manages to change the population from producing the bias-disfavoured form to the bias-favoured form depends on the rate at which the introduction of new agents increases p compared to the rate at which interactions between agents decrease p. With these two models of learning bias the population can become effectively‘
9
eBecause of the stochastic nature of this model, the state of a population is never indefinitely stable: there is always a non-zero probability that a population which produces a as the dominant form will switch to one dominated by b (for example, all agents could, by improbable chance, produce b forms in every interaction until p i s below the equilibrium value). However, the learning biases all make the probability of a transition from an a dominated situation to a b dominated situation less likely than the converse. Thus, in the limit of infinitely long runs, we would expect to see the population spend more time dominated by the bias-favoured form than the bias-disfavoured form. The relevance of this
161
trapped in a state dominated by the production of the bias-disfavoured form. We might say that in such situations, the population exhibits cultural inertia resulting from the effect of agents learning the ambient language outweighing the effect of nai’ve agents’ biased behaviour.
3. Discussion The models presented in this paper are not at all realistic, and no attempt was made to determine reasonable values for the parameters used: the ungroundedness of the overly simple equations used cannot be compensated for by, for example, realistic rates of population turnover. The purpose of presenting these models is not to find a counter-intuitive result of a complex process, but to show that not all intuitive notions of learning bias (afforded to us by our ignorance of the processes of language learning) will inexorably produce distributions of languages which reflect those biases. It would thus be premature to think that a link between learning biases and language typology has been demonstrated. The simulations presented here all (arbitrarily) begin with an initial state in which the bias-favouredform is not the majority form produced by the population. How might such a situation arise? Bybee and Newman (1995) performed experiments designed to detect learning biases (in students learning an artificial language) for plural marking with suffixes versus plural marking with stem-changes. They found no bias in favour of either plural marking scheme (in terms of ease of acquisition and generalisation), in spite of the fact that stem changes are far less common in the languages of the world. Bybee and Newman argue that it is the differences in the diachronic processes which produce affixes and stem changes that are responsible for the dominance of affixes over stem changes. Briefly, affixes generally develop by the grammaticalization of free morphemes whereas stem changes develop by the phonological conditioning of the stem by an affix followed by the deletion of the affix. Thus the process that produces stem changes depends on the presence of affixes, but not vice versa. The processes by which stem changes arise also generally take longer than the processes by which a free morpheme becomes an affix. Attention to these processes, rather than individuals’ preferences for one system over another, accounts for the typological distribution presented by the world’s languages. The clearest way linguistic structure can develop independently of learning biases is by some new structure being what is left when some other aspect of linguistic structure is omitted. For example, Garrett (1990) argues that ergative property of the model is hard to see given the lack of a connection between these models and reality: the time-scale over which this effect could become noticcably manifest within a population could be far longer than the time-scales over which human languages have existed. The point of this paper is that cultural inertiu may outweigh biased learning. The fact that another possibility (which we may call sponruneous population level c h n g e ) could theoretically outweigh both in an infinite limit is only relevant if we consider its occurrence likely to have a significant effect on linguistic structures.
162
case marking in Hittite developed from instrumental case marking and happened through the reanalysis of the instrumental in a null subject transitive clause as the (ergative) subject. This process was driven by the omission of transitive clause subjects, a process orthogonal to any putative language learning mechanisms’ biases for or against ergative or accusative case systems. We may draw an analogy between learning biases and Coriolis forces: when dealing with weather systems, the Coriolis force is a dominant effect in determining the direction of winds relative to the surface of the earth; however, when dealing with water draining from a bathtub, the Coriolis force (contrary to popular belief‘) is incredibly weak in comparison to other effects and does not determine the rotation of the draining water. Similarly, learning biases should be thought of as one among many kinds of effects which may shape the emergence and stability of language structures. Whether a particular learning bias substantially affects a particular language structure will depend on the balance of thesc cffects. The view of language adopted by generativist linguistics (Chomsky, 1965) sees universals of language structure as reflections of the fact that an individual’s “language acquisition device” can only select from a limited range of “possible human languages”. Viewing language stiuctures as the aggregate of biased learning is, in a sense a generalisation of this view which weakens the effect of the acquisition device from identifying possible languages to identifying preferred structures. Both perspectives have the shortcoming that they reduce the explanation of language structures to (speculative) features of the individual psychology of language learning. A broader perspective on the sources of structural similarities across languages would balance the effect of biased language learning against other effects that could impact on language structures and may be more or less similar across different human communities without necessarily being reducible to human psychology. For example, similarities found in the world’s languages colour term systems can be understood as a reflection of the similarities in the useful colour contrasts presented to humans by their environments (Hawkey, 2006). References
Bybee, J. L., & Newman, J. E. (1995). Are stem changes as natural as affixes? Linguistics. 33, 633454. Chomsky, N. (1965). Aspects ofthe theory ofsyntm. Cambridge, MA: M.I.T. Press. Dediu, D., & Ladd, D. R. (2007). Linguistic tone is related to the population frequency of the adaptive haplogroups of two brain size genes, aspm and microcephalin. Proceedings of the Nutionul Academy of Sciences, 104(26), 10944-10949. Garrett, A. (1990). The origin of np split ergativity. Languuge, 66(2), 261-296. Hawkey, D. J. C. (2006). The interrealted evolutions of colour vision, colour and colour terms. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of lunguuge: proceedings of the 6th internutionul conference (evofung6)(pp. 417-418). Singapore: World Scientific. Hawkins, J. (1990). A parsing theory of word order universals. Linguistic Inquiry, 21, 223-261. Kirby, S. (1999). Function, selection und innutmess. Oxford: Oxford University Press. Kirby, S., Dowman, M., & Griffiths, T. L. (2007). Innateness and culture in the evolution of language. Proceedings of the Nutionul Academy of Sciences, 104(12), 5241-5245.
REANALYSIS VS. METAPHOR? WHAT GRAMMATICALISATIONCAN TELL US ABOUT LANGUAGE EVOLUTION
STEFAN HOEFLER & ANDREW D. M. SMITH Language Evolution and Computation Research Unit, Linguistics and English Language, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL [email protected]/ [email protected] We argue that studying grammaticalisation is useful to evolutionary linguists, if we abstract away from linguistic description to the underlying cognitive mechanisms. We set out a unified approach to grammaticalisation that allows us to identify these mechanisms, and argue that they could indeed be sufficient for the initial emergence of linguistic signal-meaning associations.
1. Introduction Language evolution has a notorious data problem: its object of study is simply too far remote in the pre-historic past for any direct observation to be possible. In such situations, Ockham’s razor recommends the assumption of the uniformity of process: that the mechanisms operating in the past are the ones still operating in the present. This would lead to the assumption that we should be able to learn something about the evolution of language from the study of language change, and in particular of semantic change leading to grammaticalisation (Heine & Kuteva, 2002a; Hurford, 2003). Grammaticalisation denotes the (unidirectional) process by which a discourse strategy, syntactic construction, or word, loses some of its independence of use and becomes more functional. It is usually accompanied by phonetic reduction and semantic bleaching and generalisation. There is disagreement over whether the study of grammaticalisation can give useful insights into language evolution. Newmeyer (2006), for instance, criticises the assumption that the unidirectionality of grammaticalisation provides sufficient evidence that early human language contained only nouns and verbs. We argue that grammaticalisation is indeed worthy of evolutionary linguists’ study, if one abstracts away from linguistic descriptions of individual phenomena to underlying psychological mechanisms. We thus support calls for a more cognition-oriented study of grammaticalisation (Heine, 1997; Kuteva, 2001; Tomasello, 2003): Exactly how grammaticalization and syntacticization happen in the concrete interactions of individual human beings and groups of human beings, and how these processes might relate to the other pro163
164 cesses of sociogenesis by means of which human social interaction ratchets up the complexity of cultural artefacts, requires more psychologically based linguistic research into processes of linguistic communication and language change. (Tomasello, 2003, p. 103) The remainder of this paper falls into three sections. We first provide a unified approach to grammaticalisation, allowing us to identify the underlying cognitive mechanisms. We project these to study the emergence of a non-linguistic code, before exploring the implications of our approach for evolutionary linguistics.
2. Metaphor vs. reanalysis Two competing kinds of accounts of grammaticalisation phenomena can be identified in the literature: those which emphasise metaphorical use (Heine, 1997), and those which emphasise reanalysis (Hopper & Traugott, 2003). We propose a unified approach based on an ostensive-inferential model of communication (Sperber & Wilson, 1995). Such a model emphasises the fact that in a given situation, a speaker and hearer assume common ground (Clark, 1996). Common ground includes, among other shared knowledge, the awareness of shared linguistic conventions and the recognition of what is relevant in a given situation, which allows the hearer to infer what the speaker intends to communicate on the basis of an ostensive stimulus provided by the speaker. The grammaticalisation of the English construction be going to, which originally stood for SPATIAL MOTION, and then came to express INTENTION, as shown in Example 1, is one of the most cited examples in the grammaticalisation literature (Heine, Claudi, & Hiinnemeyer, 1991; Kuteva, 2001; Hopper & Traugott, 2003), and is also a particular instance of grammaticalisation which is very common, both historically and cross-linguistically (Heine & Kuteva, 2002b). (1)
a. We are going to Windsor to see the King. b. We are going to get married in June.
(MOTION)
not MOTION) (INTENTION,
(examples from Bybee (2003, p.147)). We illustrate our approach by presenting the underlying psychological mechanisms, for both speaker and hearer, of metaphor- and reanalysis-based accounts of this change. In the metaphor-based scenario, detailed in Example 2, a speaker intends to express INTENTION (2a). She uses the form for SPATIAL MOTION metaphoricallya, assuming that the hearer will realise that (i) spatial motion is irrelevant in the current context, and (ii) spatial motion often implies intention, which in turn is relevant (2b-f). The hearer realises that the literal meaning of the aThere are many reasons for ad-hoc metaphorical use; these could be sociolinguistic (e.g. for prestige), or the speaker could simply not have a convention for the intended meaning in her code.
165
signal is irrelevant in the current context, and falls back on INTENTION, which he associates-and knows the speaker associates-with SPATIAL MOTION (2g-m). (2)
Detail of the metaphor-based scenario. Speaker: (a) 1Want to express INTENTION. (b) I have a construction which expresses SPATIAL MOTION, and the
(c) (d) (e) (f)
hearer shares this convention. SPATIAL MOTION is associated with INTENTION. SPATIAL MOTION is not relevant in the given context. Because we share common ground, the hearer will be aware of (b)-(d), and realise that I am aware of it too. Because of (e), 1 can use the construction for SPATIAL MOTION metaphorically to convey INTENTION.
Hearer: (g) The speaker has expressed SPATIAL MOTION. (h) SPATIAL MOTION is not relevant in the given context. (i) SPATIAL MOTION often implies INTENTION. fi) INTENTION would be relevant in the given context. (k) I must assume that the speaker is co-operative. (1) I must also assume that the speaker is aware that I know (g)-(k), and that I know of his being aware of it. (m) From (g)-(l), I conclude that the speaker intends to convey 1NTE N TION.
Both speaker and hearer remember that be going to has been used succcssfully to express INTENTION; the more frequently be going to is used in this sense, the more deeply this new association will become entrenched (Langacker, 1987) in their knowledge. Such entrenchment eventually leads to the phenomenon of context-absorption, where a pragmatically inferred meaning becomes part of the lexical item’s conventional, semantic meaning (Croft, 2000; Levinson, 2000; Kuteva, 2001; Traugott & Dasher, 2005). The entrenched meaning no longer needs to be inferred from its relevance in the given context, but can be retrieved instead from the shared conventions which make up part of language users’ encyclopaedic knowledge. In the reanalysis-based scenario, detailed in Example 3, the speaker uses be going to in its conventional sense to express SPATIAL MOTION-the expression of which she deems relevant in the given context (3a-e) The hearer, however, perceives things differently; he does not think that SPATIAL MOTION is relevant in the present situation but does believe that information about INTENTION would be
166
(30. From the hearer's perspective, this appears to be exactly the same scenario as the metaphor-based scenario in Example 2. This time, the interlocutors make different adjustments to their codes: the speaker will further entrench the convention that maps be going to onto SPATIAL MOTION, whereas the hearer establishes a new, additional association between be going fo and INTENTION. (3)
Detail of the renalysis-based scenario, Speaker: (a) 1Want to express SPATIAL MOTION. (b) I have a construction for the expression of SPATIAL MOTION in my linguistic code, and the hearer shares this convention. (c) SPATIAL MOTION is relevant in the given context. (d) Because we share common ground, the hearer will be aware of (b)-(c) and realise that I am aware of it too. (e) Because of (d), I can use the construction to communicate SPATIAL MOTION.
Hearer: (f) performs the same reasoning as in (2g)-(2m) above. A special case of the reanalysis-based scenario is one where the hearer, in the role of a language learner, does not have any existing mapping for be going to in his linguistic code. However, because he can work out from the context that the speaker intends to express INTENTION, he will create an association between that meaning and be going to. In contrast to the previous two scenarios, layering (the co-existence of an old and a new mapping, which yields polysemy) does not arise in the hearer's linguistic code in this case. Two important conclusions can be drawn from our analysis of the metaphorand reanalysis-based explanations of the grammaticalisation of be going to. First, both scenarios are based on the same cognitive processes: (i) those involved in ostensive-inferential communication-in particular the assumption of common ground, including knowledge of shared linguistic conventions and the recognition of what is relevant in the given context; (ii) the automatisation-based process of entrenchment. Second, the difference between the scenarios is not that only one of them uses metaphor, but rather that the (infelicitiously named) metaphorbased scenario relies on common ground having been successfully established between speaker and hearer, whereas the reanalysis-based scenario describes a situation where, although common ground is assumed by the interlocutors, there is actually a mismatch between their respective discourse contexts (Kuteva, 2001). The metaphor-based scenario is thus speaker-oriented, focusing on the speaker as the source of linguistic innovation, while the reanalysis-based account is heareroriented. Depending on which of the two perspectives one takes, however, either scenario can be regarded as a special case of the other.
167
3. Reconstructible meanings How can we project these scenarios to language evolution? First, we step back to see how ostensive-inferential communication works-independent of language. We note that communication is inherently task-oriented; humans do not communicate “just so,” but to do something, to achieve a goal or solve a task (Austin, 1962). The task-orientedness of communication entails that once a speaker has made manifest her intention to communicate, the hearer will have certain expectations as to what are plausible things to communicate in the given situation. In this way, a hearer discerns what is relevant from what is irrelevant in a given situation (as in the scenarios for the grammaticalisation of going to above), and the speaker can likewise anticipate what the hearer is likely to infer. In the simplest case, in Fig. l(a), making manifest one’s communicative intention may suffice for the hearer to be able to infer the information one wants to communicate. The hearer’s reasoning may go as follows: my conspecific exhibits behaviour that does not make sense unless she intends to communicate; therefore she intends to communicate something; in the current situation, the only thing that would make sense for her to communicate is that there is some danger around; therefore, she is communicating that there is some danger around. Note that the speaker’s and hearer’s assumptions can be different (i.e. there can be a contextual mismatch): if the perlocutionary effect does not differ, this may go unnoticed, and speaker and hearer will map the produced stimulus onto different utterance meanings. In Fig l(b), for example, as long as the hearer runs and hides, it does not matter that the speaker thought she was communicating the presence of lion, while the hearer assumed that hyena were around. Of course it is not always possible to reduce the set of plausible utterance meanings to a single one; in such cases, the hearer needs some assistance in selecting the right one, namely a clue. The hearer’s reasoning might run along the following lines (see Fig. l(c)). Because it does not make sense otherwise, I must interpret the speaker’s behaviour as an attempt to communicate. In this situation, the only things that would make sense for her to communicate are to tell me that there is danger and to specify whether this danger is a lion or an eagle. She is communicating, so there is danger, but how can I decide if it is a lion or an eagle? The speaker must realise my dilemma, and so her ostensive stimulus will contain a clue. She is growling: lions growl, eagles don’t (hyenas growl too, but this is irrelevant as there are no hyenas around at this time of year); therefore, she is communicating that there is a lion. The cognitive mechanisms underlying these instances of communication are identical to those described in section 2 for grammaticalisation. This equivalence also extends to the entrenchment of the signal-meaning association and thus to the emergence of a convention. In all cases, the meanings which come to be associated with signals are those which can be reconstructed from the stimuli in context.
168
Figure 1. The reconstruction of meaning in ostensive-inferentialcommunication, where @ is the set of plausible intended perlocutionary effects in a given situation. (a) If only one thing makes sense to be communicated, e.g. that there is some danger around (A), then the recognition of a conspecific’s intention to communicate suffices to infer what she attempts to convey. (b) Contextual mismatch: the speaker means A (e.g. that there is a lion), the hearer infers C (e.g. that there is a hyena). Because both have the same perlocutionary effect p i (e.g. climbing a tree), the hearer’s misinterpretation goes unnoticed and communication does not fail. (c) In situations where more than one thing is plausible, the speaker must additionally provide a clue. For instance, it might make sense to communicate that there is a lion (A) or an eagle (B): if there is a lion, one must climb (pi); if there is an eagle, one must hide 012). Growling (S)serves as a clue: it is the sound made by lions (S -t A)-and by hyenas (S C),but this is irrelevant in the given context. ---$
169
Every speaker innovation can only be propagated through hearer reconstruction; semantic reconstructibility therefore constrains the types of form-meaning mappings which can persist over time (Smith, 2008).
3.1. Burling’s scenario revisited Burling (2000) makes a case for a scenario of the emergence of linguistic symbols that is reminiscent of the reanalysis-based explanation of the grammaticalisation be going to we have given above. He suggests that symbols arise from situations in which one individual erroneously interprets a conspecific’s behaviour as an ostensive stimulus. In our model, this would be represented as an extreme, but nevertheless ordinary, case of contextual mismatch: the hearer interprets the interaction as communicative but the speaker does not. Because the supposed ostensive stimulus will not have the properties of a proper clue, the hearer will only be able to infer a plausible meaning if there is only one relevant thing that would make sense to be communicated in the given context, and if their reaction does not expose the misunderstanding. Burling concludes that comprehension runs ahead of production: “[C]ommunication does not begin when someone makes a sign, but when someone interprets another’s behaviour as a sign” (Burling, 2000, p.30). This interpretation must be rejected on the basis of our analysis of the psychological underpinnings of the equivalent reanalysis-based scenario of grammaticalisation in section 2. Although in Burling’s scenario, the hearer does indeed infer something not implied by the speaker, he does so not on a whim, but under the assumption that the speaker is inviting him to make those very inferences. Rather than one being prior to the other, therefore, production and comprehension mirror each other: whatever a hearer can infer, a speaker can imply. Communication is inherently co-operative (Grice, 1975; Clark, 1996; Tomasello, 2003), and while Burling’s “reanalysis-based” account cannot be ruled out, its “metaphor-based” counterpart is equally possible. Both should be seen as instances of the same set of underlying cognitive mechanisms: ostensive-inferential communication and entrenchment. 4. Conclusion
We have shown that grammaticalisation can indeed answer questions relevant to evolutionary linguists, if one moves away from linguistic classification to investigating its underlying psychological mechanisms. We have argued that the same cognitive processes that lead to grammaticalisation phenomena could also have been sufficient for the initial emergence of linguistic signal-meaning associations. We thus neither endorse nor attempt to disprove Newmeyer (2006)’s specific criticism of the use of grammaticalisation as a source of information about language evolution. Our approach is different from both his approach and the approaches of those he criticises. We claim that the merit of studying grammaticalisation, and in fact any semantic change (Traugott & Dasher, 2005), for insights
170
into language evolution, lies in the underlying cognitive processes it makes visible, which can be applied to investigate the origins of language. References Austin, J. L. (1962). How to do things with words. Oxford: Oxford University Press. Burling, R. (2000). Comprehension, production and conventionalisation in the origins of language. In C. Knight, M. Studdert-Kennedy, & J. R. Hurford (Eds.), The evolutionary emergence of language: Social function and the origins of linguistic form (pp. 27-39). Cambridge: Cambridge University Press. Bybee, J. L. (2003). Cognitive processes in grammaticalization. In M. Tomasello (Ed.), The new psychology of language: Cognitive andfunctional approaches to language structure (Vol. 2). Erlbaum. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Croft, W. (2000). Explaining language change: An evolutionary approach. Longman. Grice, H. P. (1975). Logic and conversation. In P. Cole & J. Morgan (Eds.), Syntax and semantics 3: Speech acts. New York: Academic Press. Heine, B. (1997). Cognitivefoundations of grammar. New York: Oxford University Press. Heine, B., Claudi, U., & Hiinnemeyer, F. (1991). Grammaticalization: A conceptual framework. Chicago: University of Chicago Press. Heine, B., & Kuteva, T. (2002a). On the evolution of grammatical forms. In A. Wray (Ed.), The transition to language (pp. 376-397). Oxford: Oxford University Press. Heine, B., & Kuteva, T. (2002b). World lexicon of grammaticalization. Cambridge: Cambridge University Press. Hopper, P. J., & Traugott, E. C. (2003). Grummaticalization (Second ed.). Cambridge: Cambridge University Press. Hurford, J. R. (2003). The language mosaic and its evolution. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 38-57). Oxford: Oxford University Press. Kuteva, T. (2001). Auxiliation: An enquiry into the nature of grammaticalization. Oxford: Oxford University Press. Langacker, R. W. (1 987). Foundations of cognitive grammar: Theoretical prerequisites (Vol. 1). Stanford, CA: Stanford University Press. Levinson, S. C. (2000). Presumptive meanings: The theory of generalized conversational implicatures. Cambridge: Cambridge University Press. Newmeyer, E J. (2006). What can grammaticalization tell us about the origins of language? In A. Cangelosi, A. D. M. Smith, 8c K. Smith (Eds.), The Evolution ofLanguage (pp. 434-435). World Scientific. Smith, A. D. M. (2008). Protolanguage reconstructed. Interaction Studies, 9. Sperber, D., & Wilson, D. (1995). Relevance: Communication and cognition (Second ed.). Oxford: Blackwell. Tomasello, M. (2003). On the different origins of symbols and grammar. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 94-1 10). Oxford: Oxford University Press. Traugott, E. C., & Dasher, R. B. (2005). Regularity in semantic change. Cambridge: Cambridge University Press.
SEEKING COMPOSITIONALITY IN HOLISTIC PROTO-LANGUAGE WITHOUT SUBSTRUCTURE - DO COUNTEREXAMPLES OVERWHELM THE FRACTIONATION PROCESS?
SVERKER JOHANSSON School of Education and Communication, University of Jonkoping, Box 1026 SE-551 I 1 Jonkoping, Sweden [email protected] In holistic theories of protolanguage, a vital step is the fractionation process where holistic utterances are broken down into segments, and segments associated with semantic components. One problem for this process may be the occurrence of counterexamples to any segment-meaning connection. The actual abundance of such counterexamples is a contentious issue (Smith, 2006; Tallerman, 2007). Here I present calculations of the prevalence of counterexamples in model languages. It is found that counterexamples are indeed abundant, much more numerous than positive examples for any plausible holistic language.
1. Introduction Human beings today have language. Our ancestors long ago did not. The notion that modern language with all its complexity arose ex nihilo is preposterously unlikely, which implies that one or more intermediate stages, less complex than modern language, must have existed. A popular possibility for an early intermediate stage is a language where each utterance is a unit without substructure. In analogy with the ontogeny of language, we might call this a one-word stage. There are at least two ways to get from a one-word stage to a composite language, either analytic/holistic or synthetic (Hurford, 2000; Bickerton, 2003; Johansson, 2005). In the holistic version (Wray, 2000; Arbib, 2003), the units of the one-word stage are holistic utterances, which are then fractionated into parts that become independent recombinable morphemes in the next stage, whereas in the synthetic version (Bickerton, 2000; Jackendoff, 2002, among others), two or more units from the one-word stage are combined into structured utterances in the next stage. The segmentation and analysis step, finding substructure in utterances that are postulated to lack substructure, is a critical step for holistic theories. It is not obvious to me, nor to Bickerton (2003) or Tallerman (2007), why the fractionation process envisaged by Wray (2000) would be expected to work. A similar process is certainly present in modern-day language acquisition - children first acquire 171
172
some stock phrases as unanalyzed wholes, and later figure out their internal structure - but that works only because these stock phrases have an internal structure, given by the grammar of the adults from whom the child acquires them. As an analogy for the origin of grammar, this is unsatisfactory. Wray (2000) describes a scenario in which people already talking at the oneword stage at some point acquire a grammar from somewhere - apparently not from any linguistic or communicative pressures, but as an exaptation - and start applying it to their language, attempting to identify structure and constituents in their structureless holistic one-word utterances. Tallerman (2004, 2007) provides a detailed critique of this process, to which Smith (2006) provides a partial response. In this paper, I will concentrate on one specific point of contention between Tallerman and Smith, which concerns how connections are established between semantic components and sound segments. By pure chance, it may sometimes happen that different utterances have both a “phonetic segment” in commona, and a semantic component in common. It is argued by e.g. Wray (2000) that this will lead to the identification of the phonetic segment with the semantic component, so that the former comes to “mean” the latter. Tallerman (2004, 2007) argues that it is self-evident that counterexamples will by far outnumber confirming examples for such a generalization. Smith (2006) disagrees, arguing that there is no logical necessity that counterexamples outnumber positive examples. Smith (2006) further argues that it is not established that counterexamples, whatever their frequency, are actually fatal to generalization. This issue hinges on whether the analysis process in proto-humans has a logical and statistical component, or is purely based on positive examples. The mental processes of protohumans are unfortunately unavailable to direct observation, but since it has been established that both modern human infants (Saffran, Aslin, & Newport, 1996) and monkeys (Hauser, Newport, & A s h , 2001) are sensitive to statistical patterns in language-like input, it is not parsimonious to assume that proto-humans totally disregarded statistics. The weight of Tallerman’s argument thus depends on the actual ratio of counterexamples to positive examples in plausible proto-languages, a ratio that can be estimated through simplc calculation in simulatcd model languagcs. I present here the results of such a calculation. ““Phonetic segment” has been used in this debate as a term for whatever chunks of sound that proto-humans will identify as a unit, and hopefully connect with a meaning. It is far from obvious that proto-humans at the relevant stage possessed the segmentation ability and phonological awareness needed to segment an utterance into anything useful (Studdert-Kennedy, 2005; Tallerman, 2007), but for the sake of the argument this additional hurdle for the holistic model is assumed to be solvable. I will call these chunks “sound segments”.
173
0.01
2-
B
0.001
E
0!
0.0001
0.OOM)l
a
c
0
0.OOMx)l
%
0.0000001
5 Y
0 0000M)(31 0
20
40
60
80
IW
120
140
1M)
180
S b O f ICnQUOge
Figure I . The fraction of predicates for which positive examples outweigh counterexamples, as a function of the size of the language, The values of the other parameters are fixed at #segments = #predicates = 50, utterance length = 4 segments.
2. Model
A toy language is constructed by creating a set of utterances. Each utterance consists of a number of sound segments, and carries a meaning consisting of a basic predicate-argument structure, with a single predicate and one or more arguments. Both sound segments and meaning are randomly assigned to each utterance, uncorrelated with each other. The following features of the language could be varied as free parameters in the model: 0
Total size of language, number of distinct holistic utterances
0
Total inventory of sound segments (“#segments”)
0
Total semantic inventory of predicates (“#predicates”) Total semantic inventory of argumentsb
0
Number of sound segments in one utterance (“utterance length”)
Many different parameter combinations were investigated, to identify which regions, if any, in parameter space are conducive to creating a composite language the present analysis, the arguments are neglected. A full analysis is left for future work.
174 100% 90% 80%
70% 60% 50%
40%
30% 20% 10% 0%
She of language
Figure 2. The fraction of positive examples and the two types of counterexamples separately, as a function of the size of the language. The values of the other parameters are fixed at #segments = #predicates = 50, utterance length = 4 segments.
as argued by Wray (2000) and Smith (2006). For each parameter combination, a large number of toy languages (100,000or more) were generated and analysed. Once a language has been randomly generated with a given set of parameters, it is analysed for possible semantic-phonologicalconnections according to the following procedure: For all predicates and all sound segments in the language, the number of co-occurrences of predicate p , with segment sII in the same utterance are counted. For each predicate in the language, the phonological segment &'best that most often co-occurs with it is identified. For the segment Shest, if it co-occurs at least twice with p t , the following items are counted:
- The number of positive examples, where it co-occurs with p , in the
-
Same utterance. Counterexamples type 1, the occurrence of does not mean p t .
Sb&
in an utterance that
175
\
\‘t
015 C
zis
01
0
2 005
-
La@ I m l n u = l @ 2 0 )
0
0
2
4
6
a
10
12
S e g m n t s in uttetmce
Figure 3. The ratio of positive examples to counterexamples, as a function of utterance length, for two different language sizes. The values of the other parameters are fixed at #segments = #predicates = 50.
- Counterexamples type 2, an utterance that means p i but does not contain
Sbest.
The two types of counterexamples are shown separately in Figure 2. As can be seen there, both contribute substantially. In the rest of the figures, data is shown only for both types conflated. Various higher-order complications, like the possibility that the same segment s is the best choice for two different predicates, have been neglected. Taking such complications into account would only decrease the possibility of finding and reinforcing connections. It is also assumed for the sake of the calculation here, contra Tallerman (2007), and for that matter contra my own judgement, that segmentation of an utterance is unproblematic (see footnote a), and that proto-humans already have compositional semantics.
3. Results For all parameter combinations, the number of counterexamples were found to outweigh the number of positive examples by a considerable margin. For no parameter combination did the fraction of all predicates with more positive examples than counterexamples exceed 2% (Fig. 1). The most important parameter is language size. The smaller the language, the larger the fraction of predicate-segment connections with predominantly positive examples, as shown in Fig. 1, and the larger (but still much less than unity) is
176
0 0
20
40 60 80 Totd Inventory of sqgments In Imgucge
103
120
Figure 4. The ratio of positive examples to counterexamples. as a function of segment inventory, for two different language sizes. The values of the other parameters are fixed at #predicates = 50, utterance length = 4 segments.
the ratio of positive examples to counterexamples. This can be explained as a sampling effect, with random fluctuations being more important at small sample size, and also as a selection effect - only those predicates with at least two positive examples are counted at all, and this gives the positives a “head start” that is nonnegligible in a small language. Similarly, the number of segments per utterance has a substantial effect, with very short utteranccs bcing “bcttcr”, as shown in Fig. 3. For small languages the connection success gradually grows with increasing segment inventory and predicate inventory (Figs. 4 and 5 , upper curves). For large languages, the situation is different. Success rate is very low, largely independent of both segment inventory and predicate inventory (Figs. 4 and 5, lower curves). 4. Discussion
It is clear that there is only a small range of parameters for which the positive examples are not totally overwhelmed by counterexamples. The fractionation process has a non-negligible chance of success only for very small simple languages - but where would the pressure towards compositionality come from with a tiny language? And even for these tiny languages, success rate is small unless the inventory of segments and predicates is of the same order of magnitude as the total number of utterances in the language, which is hardly plausible. Unless it can be shown that humans totally disregard counterexamples when extracting patterns from data, the argument from counterexamples has considerable force.
177
0.18
0.16
Srrdll ~ ( n u 1 0 0 )
0.14
0.12 0.1 008
/
0.06
L age Icngrog3I nu =1 OOO)
0 04 0 02 0 10
100
lo00
N u n t e of predicates
Figure 5. The ratio of positive examples to counterexamples, as afunction of predicate inventory. for two different language sizes. The values of the other parameters are fixed at #segments = 50, utterance length = 4 segments.
References Arbib, M. A. (2003). The evolving mirror system: a neural basis for language readiness. In M. H. Christiansen & S. Kirby (Eds.), Language evolution. Oxford: Oxford University Press. Bickerton, D. (2000). How protolanguage became language. In Knight, StuddertKennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press. Bickerton, D. (2003). Symbol and structure: a comprehensive framework. In M. H. Christiansen & S. Kirby (Eds.), Language evolution. Oxford: Oxford University Press. Hauser, Newport, & Aslin. (2001). Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins. Cognition 78: B53-B64. Hurford, J. R. (2000). Introduction: the emergence of syntax. In Knight, StuddertKennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press. Jackendoff, R. (2002). Foundations of language. brain, meaning, grammar, evolution. Oxford: Oxford University Press.
178
Johansson, S. (2005). Origins of language - constraints on hypotheses. Amsterdam: Benjamins. Saffran, Aslin, & Newport. (1996). Statistical learning by 8-month old infants. Science 274:1926-1928. Smith, K. (2006). The protolanguage debate: bridging the gap. In A. Cangelosi, A. Smith, & K. Smith (Eds.), The evolution of language proceedings of the 6th international conference (evolarrg6)rome, italy 12 - I5 april2006. Singapore: World Scientific Publishing. Studdert-Kennedy, M. (2005). How did language go discrete? In M. Tallerman (Ed.), Language origins: Perspectives on evolution. Oxford University Press. Tallerman, M. (2004). Analysing the analytic: problems with holistic theories of the evolution of protolanguage. In Proceedings of 5th conference on evolution of language, leipzig. Tallerman, M. (2007). Did our ancestors speak a holistic protolanguage? Lingua 117:579-604. Wray, A. (2000). Holistic utterances in protolanguage: the link from primates to humans. In Knight, Studdert-Kennedy, & Hurford (Eds.), The evolutionary emergence of language. Cambridge: Cambridge University Press.
UNRAVELLING DIGITAL INFINITY CHRIS KNIGHT & CAMILLA POWER School of Social Sciences, Media and Cultural Studies, University of East London, Docklands Campus, London E l 6 2RD ‘The passage from the state of nature to the civil state produces a very remarkable change in man, by substituting justice for instinct in his conduct, and giving his actions the morality they had formerly lacked. Then only. when the voice of duty takes the place of physical impulses and right of appetite, does man, who so far had considered only himselJ find that he is forced to act on diferent principles, and to consult his reason before listening to his inclinations’. Jean-Jacques Rousseau, The Social Contract (1973 [1762]: 195).
1.1. Digital minds in an analog world
Language has sometimes been described as a ‘mirror of mind’. Chomsky attributes this idea to ‘the first cognitive revolution’ inspired by Descartes among others in the seventeenth century. ‘The second cognitive revolution’ triggered in large measure by Chomsky’s own work- is taken to have been a twentieth century rediscovery of these earlier insights into the nature of language and mind. In 1660, the renowned Port Royal grammarians (Arnauld and Lancelot 1972 [ 16601: 27) celebrated ‘this marvelous invention of composing out of twenty-five or thirty sounds that infinite variety of expressions which, whilst having in themselves no likeness to what is in our mind, allow us to disclose to others its whole secret, and to make known to those who cannot penetrate it all that we imagine, and all the various stirrings of our soul’.
If this ‘marvelous invention’ reflects some part of human nature, then on Cartesian first principles it must correspond to some innate mechanism in the biological mindbrain. Chomsky (2005) calls it ‘discrete infinity’. Or as Pinker (1999: 287) puts it: ‘We have digital minds in an analog world. More accurately, a part of our minds is digital.’ But if ‘a part of the mind is digital’, how did it ever get to be that way? Under what Darwinian selection pressures and by what conceivable mechanisms 179
180
might a digital module become installed in an otherwise analog primate brain? Can natural selection acting on an analog precursor mechanism transform it incrementally into a digital one? Is such an idea even logically coherent? If these were easy questions, the ‘hardest problem in science’ (Christiansen and Kirby 2003) might long ago have been solved. Chomsky concludes that the transition to ‘Merge’ - the irreducible first principle of ‘discrete infinity’ - was instantaneous, commenting that ‘it is hard to see what account of human evolution would not assume at least this much, in one or another form.’ Note that whatever the account of human evolution, the assumption of instantaneous language evolution must stand. Chomsky (2005: 11-12) writes: ‘An elementary fact about the language faculty is that it is a system of discrete infinity. Any such system is based on a primitive operation that takes n objects already constructed, and constructs from them a new object: in the simplest case, the set of these n objects. Call that operation Merge. Either Merge or some equivalent is a minimal requirement. With Merge available, we instantly have an unbounded system of hierarchically structured expressions. The simplest account of the “Great Leap Forward” in the evolution of humans would be that the brain was rewired, perhaps by some slight mutation, to provide the operation Merge, at once laying a core part of the basis for what is found at that dramatic “moment” of human evolution..._’ Merge, then, is more than an empirical necessity: it is a logical one. It is the procedure central to any conceivable system of ‘discrete infinity’. Merge is recursive: it means combining things, combining the combinations and combining these in turn- in principle to infinity. Chomsky suggests that a ‘slight mutation’ might have allowed the evolving brain of Homo supiens to do this for the first time. No matter how we imagine the physical brain, the transition to Merge is instantaneous, not gradual. This is because discrete infinity- ‘the infinite use of finite means’- either is or is not. What sense is there in trying to envisage ‘nearly discrete’ objects being combined in ‘nearly infinite’ ways? A moment’s thought should remind us that when objects are subject to even limited blending, the range of combinatorial possibilities crashes to a limited set. In short, for Merge to work, the elements combined must be abstract digits, not concrete sounds or gestures. Combining a sob with a cry would not be an example of Merge. Neither would we call it Merge if a chimpanzee happened to combine, say, a bark with a scream (Crockford and Boesch 2005).
181
1.2. Analog minds in a digital world One way to escape the conundrums inseparable from this position - conundrums foundational to all our debates and very well documented by Botha (2003)might be to keep the essential idea but reverse the underlying philosophy. Humans have analog minds in a digital world. More accurately, just a certain part of our world is digital. We are at one with our primate cousins in being immersed in ordinary material and biological reality - Pinker’s ‘analog world’. But unlike them, we have woven for ourselves an additional environment that is digital through and through. This second environment that we all inhabit is sometimes referred to as the ‘cognitive niche in nature’, but the evolutionary psychologists who invented this expression (Tooby and DeVore 1987) did so for their own special reasons. Adherents of the ‘cognitive revolution’ but attempting to weld Chomsky with their own mentalist version of Darwin, they were committed to minimizing the intrinsically social, cultural and institutional nature of the digital representations made available to our brains. The expression ‘cognitive niche’ may have explanatory value, but not if the purpose is to deny the existence of what social anthropologists and archaeologists term ‘symbolic culture.’ Contrary to those who coined the expression, the ‘cognitive niche’ actually doesn’t exist ‘in nature.’ No one has ever found such a niche in nature. As Tomasello (1999) points out, the niche in question exists only as an internal feature of human symbolic culture. So what exactly is this thing called ‘symbolic culture’? Following the philosopher John Searle (1996), let’s begin by drawing a distinction between ‘brute facts’ and ‘institutional facts’. Birth, sex and death are facts anyway, irrespective of what people think or believe. These, then, are brute facts. Legitimacy, marriage and property are facts only if people believe in them. Suspend the belief and the facts correspondingly dissolve. But although institutional facts rest on human belief, that doesn’t make them mere distortions or hallucinations. Take two five-pound banknotes and place them on the table. Now exchange them for a single ten-pound note. The identity of the two amounts is not merely a subjective belief it’s an objective, indisputable fact. But now imagine a collapse of confidence in the currency. Suddenly, the facts dissolve. It is crucial to Searle’s philosophy that institutional facts are not necessarily dependent on verbal language: one can play chess, use an abacus or change money without using language. The relevant digits are then the chess pieces, beads or coins that function as markers in place of any linguistic markers. Digital facts of this kind - the intricacies of the global currency system, for
182
example - are patently non-physical and non-biological. They are best conceptualized as internal features of an all-encompassing game of ‘let’s pretend’. Needless to say, the existence of such facts presupposes a brain with certain innate capacities, syntactical language being one possible manifestation of these capacities. But explaining distinctively human cognition by invoking ‘language’ is circular and unhelpful: it is precisely language that we need to explain. Institutional facts develop ontogenetically out of the distinctively human capacity for mindreading, joint attention and pretend-play (Leslie 1987; Tomasello 2006). Extended across society as a whole, ‘let’s pretend’ may generate a whole system of ritual and religion (Durkheim 1947 [1915)]; Huizinga 1970 [ 19491; Knight 1999; Power 2000). The morally authoritative intangibles internal to a symbolic community - that is, to a domain of ‘institutional facts’ - are always on some level digital. This has nothing to do with the supposedly digital genetic architecture of the human brain. The explanation is less mystical. It is simply that institutional facts depend entirely on social agreement - and you cannot reach agreement on a slippery slope. By definition, anything perceptible can be evaluated and identified through direct sensory input. But institutional intangibles are by definition inaccessible to the senses. They can be narrowed down and agreed upon only through a process in which abstract possibilities are successively eliminated. ‘Discrete infinity’ captures the recursive principle involved. The sound system of a language - its phonology - is prototypically digital. It is no more possible to compromise between the t and the d of tin versus din than to compromise between 1159 and 12.00 on the face of a digital clock. Of course, categorical perception is common enough in nature (Harnard 1987). But the meaningless contrastive phonemes of human language comprise only one digital level out of the two that are essential if meanings are to be conveyed at all. Combining and recombining phonemes - ‘phonological syntax’, as it is called by ornithologists who study the digital phenomenon in songbirds - would be informationally irrelevant if it did not interface with a second digital level, which is the one necessary if semantic meanings are to be specified. No animal species has access to this second level of digital structure. It would therefore be inconceivable and in principle useless anyway for an animal to make use of syntactical operations - whether Merge or anything else - in order to interface between the two digital levels. The explanation is that animals inhabit just their own biological world and therefore don’t have access to the extra digital level. It is the nature and evolution of the entire second level- the level of symbolic culture- that has proved so difficult to explain. Explaining ‘the Great Leap
183
Forward’ as an outcome of ‘Merge’ is a parsimonious solution (Chomsky 2005), but only in the sense that explaining it as an outcome of divine intervention might seem persuasive in terms of parsimony although less so in terms of testability. 1.3. A Darwinian solution The alternative (Knight 2000) is to conceptualize the language capacity as one special manifestation of a ‘play capacity’ continuous with its primate counterparts but let loose among humans in a manner not open to other primates. The development of ‘let’s pretend’ and the development of language in children are widely recognized as isomorphic. They have the same critical period, the same features of intersubjectivity and joint attention, the same triadic (‘Do you see what I see? ’) referential structure and the same cognitive expressivity and independence of external stimuli. It is unlikely that these parallels are a pure coincidence (Bruner et al. 1976; Leslie 1987; McCune-Nicolich and Bruskin 1982). ‘Digital infinity’ corresponds to what developmental psychologists might recognize as a children’s game - in this case, ‘let’s play infinite trust’. Take any patent fiction and let’s run with it and see where it leads. Metaphorical usage is an example of this. A metaphor ‘is, literally, a false statement’ (Davidson 1979). By accepting and sharing it, we construct it as truth on a higher level - truth for ‘our own’ joint purposes of conceptualization and communication. As fictional public representations become conventionalized and reduced to shorthands, one possible trajectory is that they crystallize out as linguistic signs. Grammatical markers and associated constructions are historical outcomes of processes of grammaticalization that are now well understood - processes that are essentially metaphorical (Meillet 1903; Heine et al. 1991; Gentner et al 2001). To evolve a grammar, in other words, humans must be trusting enough to accept falsehoods from one another. Animals cannot afford to do this. Their hard-to-fake signs - reliable signals on the model of human laughs, sobs, cries and so forth - are deception-resistant and evaluated for quality on an analog scale. Regardless of details of cognitive architecture, ‘honest fakes’ are in principle impossible to interpret in that way. Meaningless and valueless in themselves, they would read ‘zero’ on any costly signaling scale. Linguistic signs are ‘honest fakes’ - literal irrelevancies and falsehoods, significant only as cues to the intentions underlying them. Since communicative intentions are intangibles, processing them has to be digital by reason of conceptual necessity, not because the brain or any part of it is innately digital.
184
‘Animals,’ Durkheim (1947 [1915]: 421) long ago observed, ‘know only one world, the one which they perceive by experience, internal as well as external. Men alone have the faculty of conceiving the ideal, of adding something to the real. Now where does this singular privilege come from?’ Maynard Smith and Szathmary (1995) offered a bold answer to Durkheim’s question, citing Rousseau and viewing the puzzle of language origins as inseparable from the wider problem of explaining the emergence of community life based on social contracts. Their ‘major transitions’ paradigm is ambitious and conceptually unifying, assuming no unbridgeable chasm between natural and social science. The same applies to the paradigm being developed by Steels and his colleagues (Steels 2006; Steels et al. 2002), who use robots to show how lexicons and grammars - patterns far too complex to be installed in advance in each brain spontaneously self-organize through processcs of learning, recruitment, social co-ordination and cumulative grammaticalization. By maintaining continuity with primate cognitive evolution while introducing novel social factors, we can continue to apply basic principles of Darwinian behavioural ecology to account for the emergence of distinctively human cognition and communication. Pinker (1999: 287) concludes his book on ‘the ingredients of language’: ‘It is surely no coincidence that the species that invented numbers, ranks, kinship terms, life stages, legal and illegal acts, and scientific theories also invented grammatical sentences and regular past tense forms’. Confusing correlation with causation, Pinker here treats the supposedly digital concepts intrinsic to human nature as responsible for the legalistic distinctions of language and culture. Note, however, that the digital concepts he actually mentions here -whether linguistic or non-linguistic - belong without exception to the realm of agreements and institutions. Is there any evidence that a language faculty could operate at all outside such institutional settings? Reversing Chomsky - and correspondingly reversing the whole idea of ‘digital minds in an analog world’- we may conclude that ‘doing things with words’ (cf. Austin 1978 [1955]) is invariably more than just activating a biological organ. To produce speech acts is to make moves in a non-biological realm- a realm of facts whose existence depends entirely on collective belief. ‘Analog minds in a digital world’ is fully compatible with Darwinian evolutionary theory. ‘Digital minds in an analog world’ is not compatible at all. Installation of an innate digital mind - whether instantaneous or gradual - is a deus ex machina with nothing Darwinian about it. A model of language evolution, to qualify as scientific, cannot invent fundamental axioms as it goes along. It cannot invoke currently unknown physical or other natural laws. It ~
185
should be framed within a coherent, well-tried body of theory; it should generate predictions that are testable in the light of appropriate empirical data; and it should enable us to relate hitherto unrelated disciplinary fields. Whereas the deus ex machina approach rigidly rejects reference to any part of social science, the play/mindreading/joint attention paradigm (Tomasello 1996, 1999, 2003, 2006) has the potential to link the natural and social sciences in a theory of everything .
References Arnauld and Lancelot (1972 [1660]). Grammaire ge'ne'rale et raisonne'e, ou la grammaire de Port-Royal. Re'impression des editions de Paris, 1660 et 1662. Genkve: Slatkinem. Austin, J. L. (1978 [1955]). How to Do Things with Words. Oxford: Oxford University Press. Botha, R (2003). Unravelling the Evolution of Language. Oxford: Elsevier. Bruner, J. S., A. Jolly and K. Sylva (eds) (1976). Play: Its role in development and evolution. New York: Basic Books. Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry 36( 1): 1-22. Christiansen, M. H. and S. Kirby, (2003). Language evolution: the hardest problem in science? In M. H. Christiansen and S. Kirby (eds), Language Evolution. Oxford: Oxford University Press, pp. 1- 15. Crockford C. and Boesch C. (2005). Call combinations in wild chimpanzees. Behaviour 142(4): 397-421. Davidson, R. D. (1979). What metaphors mean. In S. Sacks (ed.), On Metaphor. Chicago: University of Chicago Press, pp. 29-45. Durkheim, 8. (1947 [1915]). The Elementary Forms of the Religious Life. A study in religious sociology. Trans. J. W. Swain. Glencoe, Illinois: The Free Press. Hamad, S. (1987). Categorical Perception: The groundwork of cognition. Cambridge: Cambridge University Press. Gentner, D., Holyoak, K. J., and Kokinov, B. N. (eds.), (2001), The Analogical Mind: Perspectivesflorn cognitive science. Cambridge, MA: MIT Press. Heine, B., U. Claudi and F. Hunnemeyer (1991). Grammaticalization: A conceptualpamework. Chicago and London: University of Chicago Press. Huizinga, J. (1970 [1949]). Homo Ludens. A study of the play element in culture. London: Granada. Knight, C. (2000). Play as precursor of phonology and syntax. In Knight, C., M. Studdert-Kennedy and J. R. Hurford (eds), The Evolutionary Emergence of Language. Social function and the origins of linguistic form. Cambridge: Cambridge University Press, pp. 99-1 19.
186
Leslie, A. (1987). Pretence and representation: The origins of ‘theory of mind’. Psychological Review 94: 412-426. Maynard Smith, J. and E. Szathmary (1995). The Major Transitions in Evolution. Oxford: W. H. Freeman. Meillet, A. (1903). Introduction ri l’e‘tude comparative des langues indoeuropdens. Paris: Hachette. McCune-Nicolich and C. Bruskin (1982). Combinatorial competency in play and language. In K. Rubin and D. Pebler (eds), The Play of Children: Current Theory and Research. New York: Karger, pp. 30-40. Pinker, S. (1999). Words and Rules. The ingredients of language. London: Weidenfeld and Nicolson. Power, C. (2000). Secret language use at female initiation. Bounding gossiping communities. In C. Knight, M. Studdert-Kennedy and J. R. Hurford (eds), The Evolutionary Emergence of language: Social function and the origins of linguistic form. Cambridge: Cambridge University Press, pp. 8 1-98. Rousseau, J.-J. (1973 [1762]). The social contract. In Jean-Jacques Rousseau, The Social Contract and Discourses. Trans. G. D. H. Cole. New edition. London & Melbourne: Dent, pp. 179-309. Searle, J. R. (1996). The Construction of Social Reality. London: Penguin. Steels, L. (2006). Experiments on the emergence of human communication. Trends in Cognitive Sciences, 1O(8): 347-349. Steels, L., F. Kaplan, A. McIntyre, and J. van Looveren (2002). Crucial factors in the origins of word meaning. In A. Wray (ed.), The Transition to language Oxford: Oxford University Press, pp. 252-271. Tomasello, M. (1996). The cultural roots of language. In B. J. Velichkovsky and D. M. Rumbaugh (eds), Communicating Meaning. The evolution and development of language. Mahwah, NJ: Erlbaum, pp. 275-307. Tomasello, M. (1999). The Cultural Origins of Human Cognition. Cambridge, MA: Harvard University Press. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press. Tomasello, M. (2006). Why don’t apes point? In N. J. Enfield and S . C. Levinson (eds), Roots of Human Sociality: Culture, cognition and interaction. Oxford & New York: Berg, pp. 506-524. Tooby, J. and I. DeVore (1987). The reconstruction of hominid behavioral evolution through strategic modeling. In: W. G. Kinzey (ed.), The Evolution of Human Behavior: Primate models. Albany: State University of New York Press, pp. 183-237.
LANGUAGE SCAFFOLDING AS A CONDITION FOR GROWTH IN LINGUISTIC COMPLEXITY
KIRAN LAKKARAJU’, LES GASSER1”, AND SAMARTH SWARUP’
’
Computer Science Department Graduate School of Library and Information Science University of Illinois at Urbana-Champaign {klakkara I gasser I swarup}@uiuc.edu Over their evolutionary history, languages most likely increased in complexity from simple signals to protolanguages to complex syntactic structures. This paper investigates processes for increasing linguistic complexity while maintaining communicability across a population. We assume that higher linguistic communicability (more accurate information exchange) increases participants’ effectiveness in coordination-based tasks. Interaction, needed for learning others’ languages and for converging to communicability, bears a cost. There is a threshold of interaction (learning) effort beyond which (the coordination payoff of) linguistic convergence either doesn’t pay or is pragmatically impossible. Our central findings, established mainly through simulation, are: I) There is an effort-dependent “frontier of tractability” for agreement on a language that balances linguistic complexity against linguistic diversity in a population. To remain below some specific bound on collective convergence effort, either a) languages must be simpler, or b) their initial average communicability must be higher. To stay below such a pragmatic effort limit, even agents who have the ultimate capability for complex languages must not invent them from the start or they won’t be able to communicate; they must start simple and grow complexity in a staged process. 2) Such a staged approach to increasing complexity, in which agents initially converge on simple languages and then use these to “scaffold’ greater complexity. can outperform initially-complex languages in terms of overall effort to convergence. This performance gain improves with more complex final languages.
1. Introduction
Language evolution studies generally assume that the developmental trajectory for human languages followed stages from simple signaling systems to holistic protolanguages to simple compositional languages, and finally to the lexically and syntactically complex languages known today. If languages indeed grew from the simple to the complex, several questions need answering; two of these are: 0
0
Could complex languages ever emerge early? Why or why not? Local, individual innovations that increase linguistic complexity also create linguistic diversity and, at least temporarily, reduce communicability. How can a population maintain the communicability of its language while accommodating the diversity of innovation? 187
188
While inspired by the enduring issues of human language evolution, we are primarily interested a design stance: evolving artificial languages for artificial agents. We need to discover general principles of language emergence that also cover automated agents with different sensorimotor, cognitive, and/or interactional possibilities from humans, their evolutionary predecessors, or animals. We believe, in fact, that language evolution is a model problem for issues that arise in many kinds of distributed semantic systems, including Web semantics, resource description-discovery (metadata) systems, cartographic systems, and biological systems. One case in point is the intentional creation and ongoing revision of XML-based semantic web languages. These can vary in complexity (number of terms, syntactic categories, etc.), and they exhibit frequency-dependent “network effects”: any single language in the space has little value until a large population of agents can apply and interpret it. In this situation also, the two questions abovc are important: communities must converge on shared languages quickly, and ongoing linguistic innovations should only minimally disrupt the use of the language.
1.1. Assumptions We are interested in artificial agents that operate continuously over long periods of time in complex worlds, performing tasks that require coordination. The value of (reward from) successful coordination drives information exchange, which in turn drives agents to create and share languages. While rewards actually come from doing things with shared information, we can usefully attribute at least part of the reward to the language itself. Thus a language that allows agents to exchange more critical information or to coordinate better has a higher value. We assume that agents need to talk to each other about conditions and events in their worlds, and this talk is valuable in the sense above. The ability to describe and distinguish objects and actions are the fundamental kinds of information needed for coordination and increased fitness. We consider task complexity to be information-theoretic. That is, tasks differ in complexity on the basis of how many different objects, situations, and actions they involve, and how much information is needed to reliably distinguish these objects, situations, and actions. This becomes important later when we discuss how to measure complexity of language. The ability to handle greater task diversity and task complexity increases agents’ fitness; greater linguistic complexity helps enables this (as greater cognitive and motor complexity, etc. also would). Since tasks of interest here require successful communication, and since what needs to be communicated for unsophisticated tasks is different (“simpler”) than what needs to be communicated for complex tasks, agenl communication languages have to vary with task complexity. For agents to become competent at more complex tasks, they need more complex languages. This means that languages have to change in complexity over time.
189
2. The complexity-diversity-effort frontier
Since collective activity is ongoing and must remain so while complexity grows, we have a difficult problem: how do agents change their languagesfrom simple to complex while maintaining communicability? Language variation must originate at the individual level (Croft, 2001). If this is so, then as an agent originates a change from a fully communicative language, the agent will become less communicative with others, thus less effective in coordinated tasks. For language to grow in complexity this means there is a trajectory through which agents must somehow innovate (increasing complexity and decreasing communicability), then then build up communicability again by learning and propagating the innovations. This disruptive shift characterizes each increase in complexity. Computational tractability is Hwoihesized Relaiionshios an issue for this complexity growth. We hypothesize that given any set of agents with 6 a fixed cognitive structure and a set of tasks (need for lan- guage), there exists afrontier of 4 tractability for convergence to a $ m common language. Informally, 5 for a set of languages L of a =I given complexity C , greater iniIncreasing Initial Linguistic Diversity + tial diversity in the subset 1 of L spoken in the population will imFigure 1. Conjectured tractability frontiers. ply greater learning effort (e.g. time) to converge the population to full communicability. Similarly, for a given degree of initial linguistic diversity D, higher linguistic complexity implies greater effort to converge the population to full communicability. Let us limit the available convergence time (i.e., effort to converge) to some amount E and plot c = f ( d , E ) , where f means “given a set of agents whose set of languages exhibits diversity d, let f ( d , E ) equal the maximum linguistic complexity for which the population will converge within E units”. Then we will see a curve with the following property: Any complexitydiversity point “under” the curve limited by E (i.e. where for any point ( c ,d ) , c < f ( d , E ) ) will converge in time bounded by E while any point “above” the curve (i.e. c > f ( d , E ) )will take longer than time E . (See Figure 1.) E establishes a tractability frontier of complexity and diversity. Higher linguistic complexity lowers the degree of diversity the population can sustain and still converge within E. As a result, for languages that are higher in complexity, agents must make fewer, smaller innovations (introduce less diversity) if they are
f
6
~~
~
190
to converge within E.
Similarly, for a population to exhibit greater linguistic diversity and still have the possibility of converging tractably, its linguistic complexity must be lower. If a population is going to be highly innovative linguistically, introducing great diversity, then its language must be simple enough that the effort to converge from more widely varying linguistic “starting points” remains below E. Throughout this discussion we focus on languages as lexical matrices. A study of convergence frontiers for structured, compositional languages (languages with a grammar) is left for future work.
3. Implementation and experiments We demonstrate the existence of tractability frontiers through an experiment. Each agent represents its language as a Form-Meaning Association Matrix, which is a likelihood matrix that explicitly stores the joint likelihood of the forms and meanings. Forms are symbols in the language and meanings are concepts that can be talked about. For the present, we assume the simplest possible setup: the number of forms and meanings is equal, and the set of forms and meanings is shared among all the agents, so they are only tasked with achieving consensus on the associations between forms and meanings. The language game proceeds through random interactions between agents. We assume a “full information” scenario, where agents provide form-meaning pairs to hearers. A speaker generates a form for a given meaning, j , by finding the element in column j of its form-meaning matrix, that has maximum value. This is a maximum likelihood rule for language production. If aij is the current value of the hearer’s form-meaning matrix for the given symbol-meaning pair, it gets updated as follows: c~ij= q . uij (1 - 7 ) . Additionally, all the values in row i are updated as nic = q . cric Qc # j , and all values in column j are updated in the same way, arj = q . orJQr # i. This “lateral inhibition” is meant to discourage synonymy and polysemy (Vogt & Coumans, 2003).
+
3.1. Measuring linguistic diversity In order to understand the limits of this process, it is necessary to understand how much diversity can be introduced in a population such that the population can still return to (or maintain an adequate degree of) communicability to be successful in the ongoing tasks they face. There are several principled ways to measure linguistic diversity. Greenberg’s index (Greenberg, 1956) measures diversity as the probability that a pair of randomly selected individuals from the population do not speak the same language.
i
191
where pi is the probability of encountering a speaker of language i . Greenberg also suggests modifying this formula to take into account the similarity between languages, thus, ij
where rij is a measure of the overlap between languages i and j. A and B are both measuring communicability (or rather, the lack of it) in the population. We say a population is converged if the communicability is 1, i.e. diversity is 0. Another measure, more popular in genetics, is known as the Jensen-Shannon diversity (see, e.g., Grosse et al., 2002), given by,
J
= H(A1Pl
+ A#, + . . . + A,P,)
-
CA,H(P,),
(3)
i
xi
where X i = 1, and the Pi are the probability distributions describing the languages (form-meaning associations). H is the Shannon entropy function. Since languages for our agents are defined as the joint likelihood matrices for forms and meanings, J measures the diversity in the corresponding probability distributions which are obtained by normalizing the form-meaning matrix. When all distributions are identical, J = 0. The difference between Greenberg’s index and Jensen-Shannon diversity is analogous to the difference between phenotype and genotype in biology. J is a measure based on the underlying probability distribution, and A and B are more “behavioral” measures as they directly evaluate communicability. When J = 0, A and B are also 0, and J attains its maximal value, A and B equal 1. However, it is possible to have perfect communicability even if the underlying distributions are not identical, since communicability depends on the maximum likelihood interpretation.
3.2. Generating diversity To evaluate the tractability frontier we need to create a population with a specified diversity, not measure the diversity of a given linguistic population. To d o this we initialize the agents with identity matrices for their form-meaning mappings. Then we devolve this perfectly converged state by adding a uniform random variable, drawn from a range [0,€1, to each value in the matrix. It turns out that the noise level, E, is very strongly correlated with Greenberg’s index and the JensenShannon diversity. In other words, by increasing E , we can smoothly and (nearly) linearly increase the diversity of the population according to these two measures. We have confirmed this fact through careful simulation (not presented here for lack of space).
192
3.3. Linguistic Complexity
Complexity is determined by both form and meaning complexity. McWhorter has defined four criteria for the evaluation of the complexity of a language (McWhorter, 2001), based on phonology, syntax, grammaticalization, and morphology. However, only his grammaticalization criterion makes reference to meanings. It says that a language is more complex if it makes finer semantic and pragmatic distinctions. The language of an agent also reflects its cognitive capabilities, and an agent capable of making greater cognitive distinctions will have a more complex language simply by virtue of being able to express more meanings. This is an information-theoretic notion of complexity, as discussed earlier, and should be included in a measure of linguistic complexity. This is understandably hard to do for natural languages, but is the criterion we use in our simulations because artificial agents, in particular, can differ widely in their cognitive capabilities and characterizing this distinction is essential in a discussion of language evolution.
3.4. ~xperimentalresult We measure effort as number of iterations required to converge. We initialize a B i8 population of ten agents with varying levels of diversity g‘6 ’tas described above. We also vary the complexity of 8’0 the language by varying the number of meanings. Then ie we run the language game i* 8 25 3 3s * for each initial condition and D I Y ~ Hrll*iowm Y oy wsii evaluate the number of iterations necessary to converge Figure 2. Time to convergence vs. complexity and diversity. to a communicability level greater than 0.9. This gives us a three-dimensional graph, shown in two dimensions in Figure 2, with time to convergence colorcoded. We see a clear emergence of frontiers, demarcated by regions of different colors, confirming our hypothesis from Fig. 1. ‘4
a
t
1s
2 P.~u,,Wlon
1s
5
N m
4. Scaffolding and staged learning “Scaffolding” is one means of overcoming the diversity/complexityfrontier established by E. Scaffolding is a general human learning strategy, and its existence and efficacy has been reported for language learning both in the psychological literature (Iverson & Goldin-Meadow, 2005) and in simulation work (Elman, 1993). Lee, Meng, and Chao (2007) provide a model of “staged learning” that cap-
193
tures the idea of scaffolding. Agents a) constrain choices, b) act within those constraints until c) no novelty appears, then d) lift some constraints, and repeat. Constraints temporarily reduce the agents’ decision space. When quiescence occurs at one stage, strategically-chosen constraints are lifted. (Thus staged learning is order-dependent and there are likely more and less effective developmental trajectories.) Learning commences again in an extended decision space, now biased by the structures and generalities learned in prior stages. We created such a staged version of our experiments as 00 t 0 12 follows. We choose a max1.0 0.1 111 I imum number of meanings, n, that the population has to converge upon. However, the Stage 3 Stage 3 Beginning agents do not consider all of these meanings initially. Y They start at Stage 1. Ill1 111 0 1 01) The number of active meani f 1 0 0 114 111 ings (= “used in language games”) is a function of the Stage 4 Stage 4 stage number. The complexBeginning End ity step size 6 represents how many new meanings to make Figure 3. Moving from Stage 3 to Stage 4 uncovers a row and active per stage, ~h~~ the column of the matrix. The grey areas are hidden to the agent until it reaches that stage. 6 = 1 number of meanings active at Stage i is 6i. If the system is in Stage 4 and 6 = 4 there are 16 active meanings. Each agent is initialized with a rn x n lexical matrix. However at each stage i, an agent only sees part of its full lexical matrix, of size i6 x 26. As the stages progress more of the agents’ lexical matrix is revealed, as illustrated in figure 3. The system changes stages based on the communicability of the population. Let 0 be the stage transition communicability threshold. When the population has communicability 2 8 in stage i , it has converged to within 8 on i6 x i 6 forms and meanings. It then moves to the next stage and uncovers new meanings for each agenta. At this transition point, (i - 1)6 meanings have already been converged upon (to within B ) , and 6 meanings are new. These earlier convergence decisions bias agents’ learning choices for the new, larger matrix. This is scaffolding. To confirm the value of staging, we repeated the earlier experiment with staging added, evaluating the new tractability frontier for varying complexity and diversity levels. Note the axes of this plot go much farther than the axes in Figure 2. In
I
b
”Collective ordering of meanings is an issue, with several possible efficient approaches, e.g. common environment structure. We leave to future work a more detailed model exploring this topic.
194
fact we began each simulation with 10 meanings and 10 forms because smaller matrices converge very quickly. Even with higher initial noise levels and number of meanings going up to 30, we see that the population converges in a fairly short amount of time. Staging has pushed out the tractability frontier greatly.
5. ~onc~usions We have shown the need for scaffolding in language learning to be a fundamental requirement arising from the z5 5woc tradeoff between complexity 5 3w<1 and diversity. The interaction i 2 m between complexity and diim versity leads to the existence i of a tractability frontier that 1 2sprevents convergence in rea3 8 5 8 7 sonable time if the initial diDhrelfy versity is too high for a given complexity of language (or Figure 4. Tractability for staged learning. vice versa). However, by learning in stages, it is possible to attain convergence even on complex languages that would otherwise be beyond the tractability frontier.
I
4
1
m
~
Populatian
Imubuiid by PIaas LWMt
eferences Croft, W. (2001). Explaining language change. Longman Group United Kingdom.
Elman, J. L. (1993). Learning and development in neural networks: The importance of starting small. Cognition,48,71-99. Greenberg, J. H. (1956). The measurement of linguistic diversity. Language, 32(1), 109-115. Grosse, I., Bernaola-GalvBn, P., Carpena, €?, RomBn-RoldBn, R., Oliver, J., & Stanley, H. E. (2002). Analysis of symbolic sequences using the JensenShannon divergence. Phys. Rev. E, 65,041905. Iverson, J. M., & Goldin-Meadow, S. (2005). Gesture paves the way for language development. Psychological Science, 16(5),367-371. Lee, M. H., Meng, Q., & Chao, F. (2007). Staged competence learning in developmental robotics. Adaptive Behavior, 15,241-255. McWhorter, J. H. (2001). The world's simplest grammars are creole grammars. Linguistic Typology, 5, 125-166. Vogt, P., & Coumans, H. (2003). Investigating social interaction strategies for bootstrapping lexicon development. Journal of Artificial Societies and Social Simulation, 6(l).
THE EMERGENCE OF A LEXICON BY PROTOTYPE-CATEGORISING AGENTS IN A STRUCTURED INFINITE WORLD
CYPRIAN LASKOWSKI Language Evolution and Computation Research Unit University of Edinburgh. Edinburgh, EH8 9LL, UK [email protected] Over the last decade, computational models and simulations have been used to explore whether words could have emerged in the earliest stages of language evolution through a process of self-organisation in a population. In this paper, a new model of this family is presented, with two major differences from previous models. First, the world consists of an infinite number of objects, while remaining easily manipulable. Second, the agents’ categories are based on prototypes, and their structure reflects the environments in which they are acquired and used. Simulation results reveal that, as in previous models, coherent lexicons still generally emerge, but they are sensitive to certain model conditions, including the world structure.
1. Introduction
Words pose an enigma for language evolution, for they are both fundamental and complex. On one hand, words constitute the basic building blocks of language, and on the surface they appear to be simply pairings of form and meaning. In fact, it makes little sense to speak of linguistic structure or its evolution without presupposing the existence of words, and their emergence is thus considered to constitute one of the earliest stages of language evolution (Jackendoff, 1999). On the other hand, words are distinguished by a set of properties that are not found together in any other animal species’ signals: they are learned, arbitrary, referential and numerous. As such, words are unique to humans and cannot simply be taken as the very starting point of language evolution. Indeed, the evolutionary emergence of words is an unresolved puzzle. Moreover, as for other aspects of language, the origins of words must be explained on at least two levels, biological and cultural. The biological level concerns questions of individual cognitive potential and linguistic preadaptations, such as a conceptual capacity. This can be partially investigated by comparing human and animal cognition, and assessing the extent to which animals can learn human words (Deacon, 1997). However, even if we pin down the necessary prerequisites, it is far from clear how the first words actually came into existence within a population of such individuals. Thus, at the cultural level, we must explain: how did hominins first start using words and agree on their meanings? 195
196
Since animals do not spontaneously invent words, while humans already have them, it is difficult to address these questions with direct empirical methods. However, Steels (1997) designed a simple computational model and showed that a coherent lexicon could emerge through a process of self-organisation. In particular, a population of individuals equipped with certain biological preadaptations gradually converged on a coherent lexicon by engaging in local communicative interactions about objects in a shared environment. Other models have since been developed to explore these issues further, and have generally yielded similar results, despite sometimes significant modifications (e.g., using robotic agents; Vogt, 2000). Further work is required, however, to assess whether the simulation results are contingent on idealisations that are inevitably implicit in such models. This paper uses a new computational model to explore two representational issues, relating to the agents’ world and their categories, respectively. I will first motivate and describe the model. Then I will present some simulation results, and finally discuss the relevance of the findings and possibilities for future work. For a detailed description of the model and simulation results, see Laskowski (2006). 2. A new model
2.1. Changes relative to previous models The current model differs in two important respects from previous work. First, previous models have tended to represent the agents’ world with a finite number of predefined objects (e.g., Steels, 1997), so that agents encounter the same objects many times. Moreover, besides the notable exception of robot-based models (e.g., Vogt, 2000) the agents always perceive a given object identically. However, in the real world, we never perceive exactly the same stimulus twice, since there is a virtually unlimited variety of objects, and the appearance of the same object varies across situations. In addition, the distribution of objects in the real world is not completely random. As a result, in the current model, the world consists of an infinite number of objects, and the world’s structure is easily manipulable via parameters, making it possible to explore the effects of different kinds of structures on simulation results. Second, previous modcls havc not gcnerally used psychologically plausible representations of agents’ categories. Many models, for example, have used discrimination trees (Steels, 1997; Smith, 2003a). Such structures are efficient and simple, but are implementations of the classical theory of categorisation, which are considered obsolete (Murphy, 2002). Some models (e.g., Belpaeme, 2002) have addressed this by basing agents’ categories on prototype theory (Rosch, 1978), which is still recognised as one of the leading psychological theories of concepts (Murphy, 2002). The representation used in the current model is based on that of Belpaeme (2002), but aims to be more sensitive to the context of category acquisition and usage.
197
2.2. The world
As in previous models (Smith, 2003a), the agents’ world is represented with an N-dimensional space, where each dimension can be thought to represent a perceptual feature (e.g., colour, shape, size). Objects are defined as points in this space, whose dimension values are real numbers between 0 and 1 that identify the extent to which the objects have the corresponding features. Every agent-world interaction occurs in a context, which is a random subset of objects taken from thc world. However, in contrast to previous models, each time a context is needed it is generated from scratch, and thus consists of entirely new objects. Therefore, an agent never sees the same object twice. At the same time, however, the real world is not (necessarily) completely random, but has structure. The world is “clumpy” (Smith, 2003b), in the sense that, within dimensions, some values are generally more likely than others (e.g., animals usually have an even numbers of legs). Also, the world is “correlated”, so that values across dimensions tend to correlate to some extent (e.g., things that fly tend to have feathers, and vice versa). Consequently, rather than generating an entirely random vector each time an object is needed, the objects in this model are generated pseudo-randomly in accordance with probability distribution functions defined by the model’s real-valued “clumpiness” and “correlation” parameters.
2.3. Categories Following Steels (1997), agents are equipped with sensory channels which detect object dimension values directly and map them onto their perceptual space (which thus have the same general N-dimensional structure as the world). An agent’s categories are superimposed onto the perceptual space, allowing for object categorisation. In the current model, category structure is based on prototype theory (Rosch, 1978), so that categories have central members and graded membership. Categories are defined as Gaussian functions over the conceptual space which assign a degree of membership (a real number between 0 and 1) to every possible object. The category’s prototype is the point of maximum membership (l), and the rate at which membership decreases as one moves away from the prototype depends on the category’s sensitivity to each dimension. Formally, the category membership of an object o in a category c is given by a Gaussian function,
where ,i identifies a dimension, with oi being the object value, p i the prototype value, and si the sensitivity. This representation is based on that of Belpaeme (2002), with one important difference. In his model, the category’s dimension sensitivities were all rigidly set
198
to one default value, so that every dimension was equally important both across and within dimensions. However, this is not the case in the real world: for example, the shape of a screwdriver (but not a traffic light) is far more relevant than its colour. Consequently, in this model, the dimension sensitivitiesare not fixed, and depend on the contexts in which categories are acquired and used. Although the category representation is relatively plausible psychologically, it makes categorisation of objects more complicated. Rather than identifying the category in whose space an object falls, it is necessary to find the category which best fits the object (i.e.. the category for which the membershipfunction yields the highest value). Moreover, a minimum threshold is defined (as a model parameter) so that an object can only be potentially considered as a member of a category if its degree of membership is above this threshold. Figure 1 shows an example of such a “candidate category” in two-dimensional space for a particular object.
< 08 08 04 02 0
1
Figure 1. Category membership in 2 dimensions: rnernbership,(o), the category membership function for an agent’s category c in a conceptual space of two dimensions,with p o = 0.4, sg = 0.05, p l = 0.6, and $1 = 0.1. The plane shows the value of the minimum membership threshold, and the dot indicates the object being categorised since the dot is above the plane, this is a candidate category for the object.
Each category is also associated with a list of words and association strengths. Words themselves are atomic tokens with no internal structure. The word with the highest association strength is the best or most “natural” word for that category, and is the word that the agent will typically use when communicating about the category. The list can also be empty, in which case the category has not been lexicalised. 2.4. Category development Agents develop and adapt their category systems through interactions with the world. Each interaction takes the form of a discrimination game (Steels, 1997), in which an agent is exposed to a context of objects, attempts to find a distinct
199
category for one of the objects (called the topic), and adapts its category system accordingly. Over many discrimination games in different environments, an agent’s category system gradually grows and adapts to the structure of the world. Discrimination games have 3 basic possible outcomes: the creation of a new category, the splitting off of a subcategory, or the adjustment of an existing category. If the agent has no candidate categories for the topic object, then it will create a new category, whose prototype is set to the topic object, and whose initial dimension sensitivities are a function of how similar the other context objects were to the topic in the different dimensions. Otherwise, it will check whether any of its candidate categories are sufficiently discriminating as not to match any of the other context objects. If there are no such categories, then it takes the most refined candidate category (i.e., the one with the most sensitive dimensions), and creates a subcategory which is identical with it except for being more sensitive in the dimension in which the topic differs the most from the other context objects. If there were discriminating categories, then the topic is categorised with the one for which it has the highest membership, and this category’s prototype and dimension sensitivities are adjusted slightly to fit the topic better.
2.5. Lexical development The other kind of formal game that the agents engage in is the guessing game (Steels, 1997),which is actually built on top of the discrimination game. While discrimination games involve only one agent and do not involve any linguistic exchange, the guessing game is a communicative episode involving two agents and a shared environment. The “speaker” agent utters a word for one of the context objects (the topic), and the “hearer” agent guesses which object the speaker was referring to. The game is a success if and only if the hearer guesses correctly. In each guessing game, a speaker and a (different) hearer are chosen from the population at random, and a new shared context of objects is generated. The speaker chooses a topic object at random, categorises it (via a discrimination game), and utters the word in its lexicon with the highest association score for that category. If the speaker has no word for that category, it randomly invents a new word. The hearer must find the best match between the word heard, a context object, and a word-category pair from its own lexicon. It first identifies all of its categories which have an association for the word. If there are no such categories, the game fails. Otherwise, it considers each possible category-object pair from these categories and the context objects, and determines the combination for which category membership is highest. If the resulting membership is below the minimum membership threshold, then the game fails. Otherwise, the hearer guesses the object from that pair. If this object is the topic, the game succeeds. Otherwise, the speaker points out the topic to the hearer (non-linguistically), and the hearer performs a discrimination game on it. Upon completion of the game, both agents independently update their lexicons, adjusting specific word-category
200 association strengths in accordance with the results of the game.
3. Simulations Simulations were run within this model with three questions in mind. First, would agents converge on a coherent lexicon, despite the more complex world and category representations used in this model? Second, do the simulation results depend substantially on the specific world structure used? Third, assuming that agents did converge, how stable would the results be if one varied specific parameters, such as population and context size? Each simulation consisted of a large number of guessing games in a fixed population of agents, who all began with empty category systems and no lexicons. The guessing games were analysed in sets of 100 called epochs, and the average success rate (the ratio of successful to total number of guessing games) was tracked for each epoch. The first set of simulations explored whether this model would work at all in the simplest cases, with population and context sizes of 2 in a 1-dimensional world. After 200 epochs, regardless of how clumpy the world was (dimension correlation does not apply of course in a one dimensional world), communicative success in the final epoch averaged at around 99% over 100 simulations, despite the fact that the agents ended up with unexpectedly large category systems. In a 3-dimensional world, the final communicative success was still very high, despite extreme manipulations of world structure. Four kinds of world were tested: “random” (dimensions were completely uncorrelated and nonclumpy), “correlated” (highly correlated dimensions but completely non-clumpy), “clumpy” (completely uncorrelated but highly clumpy), and “structured” (highly clumpy and correlated). Final communicative success after 200 epochs was still very high for all four world structure types, ranging from 96% for the random world and 99%for the structured world. Agents ended up with around 300 categories, except for the totally random world, where they tended to have over 500 categories. In order to explore the scalability of the results and their potential dependence on a particular world structure, further sets of simulations were conducted. In each set, one of the main model parameters was manipulated, starting with a base case of a 3-dimensional world, 2 context objects, and 2 agents. Results showed that communicative success was significantly affected by manipulations of these variables, but the extent of the impact depended on the world structure. For instance, world dimensionality had a very large impact, such that in an 8-dimensional world, final communicative success rates tended to stay below a dismal 25% in the random and clumpy worlds. However, they were still in the high 90’s in the correlated and structured worlds. Manipulations of the context size had similar impacts, although less drastic. Context sizes of 64 objects still yielded approximately an 80%final success rate in the correlated and structured worlds, but context sizes of
201
only 16 objects resulted in rates below 50%for both random and clumpy worlds. The effects of population size changes did not follow the same pattern, however. Although higher population sizes corresponded with lower communicative success rates, the communicative success rate reached at least around 75% in all four world structures even with as many as 128 agents, and was best in the clumpy world (around 90%). Moreover, the communicative success rate curves also varied in clear ways between the four world types examined. In worlds with correlated dimensions (i.e., the correlated and structured worlds), communicative success rose very quickly (e.g., to about 75% with 128 agents), but then flattened out. In contrast, worlds with clumpy dimensions started off more slowly, but their communicative success rate curves did not flatten out as dramatically, so eventually they obtained higher success rates (at least in the clumpy world). 4. Discussion
Despite the use of a relatively complex model, in which agents never saw exactly the same object twice and their categories had a context-sensitive prototype structure, simulation results were generally in line with those of previous work. Under a variety of conditions, populations of agents converged onto coherent lexicons after engaging through repeated communicative episodes in shared environments. Although each agent had an independent category system and lexicon which started out empty, communication success rates managed to reach high levels, often close to 100%.These results, then, add support to the idea that a population of hominins equipped with certain cognitive preadaptations could have grounded and developed a large system of learned, arbitrary, referential words through a series of local interactions (Steels, 1997). More specifically, they show that the general results of previous models cannot be easily dismissed on the grounds that they used idealised world and category representations. However, the simulation results were sensitive to more complex conditions, as manifested by manipulations of the model’s parameters. As in previous models, communicative success dropped in simulations in which the context size, population size, or world dimensionality was increased. Although this is not surprising, in some cases the effects were very drastic, and highly dependent on the world’s structure. Moreover, even in successful simulations, the world structure sometimes influenced the rate of convergence. These patterns show that the simulation results do not easily scale up to larger systems, and thus must be treated cautiously. In particular, the world structure can have large consequences for whether a coherent lexicon will emerge and how long it will take. This points to the need for future models to choose their world representations carefully and justify their choices. Returning to the bigger picture, how exactly do these results relate to language evolution? To answer this, we need to revisit the hypothesis and clearly separate what exactly is being given a priori in this model, as opposed to what appears
202 to be emerging (Steels, 2006). We started by asking whether self-organisation was able to explain how a population of hominins could have “invented” a lexicon. However, it’s important to keep in mind that this hypothesis is framed within an implicitly substantial environmental and cognitive infrastructure. We have already seen that the environment that the agents are exposed to can play a crucial role in determining the outcomes of the simulations. The extent of the cognitive prerequisites have not, however, been substantially manipulated here. Agents are instead consistently endowed with unrealistically powerful and facilitating faculties, including perfect word production and perception, powerful joint attention, limitless motivation for communication regardless of success, perfect and equal perception of objects, and perfect ability to use and interpret non-linguistic referential methods. What the simulation results of this model have done is to verify the internal consistency of the argument that, given such abilities, and under simple conditions, a lexicon could have emerged through a process of self-organisation, even if the world and category representations are made more complex in the way described. However, this work cannot address the question of whether the differences between the current idealisations and the real phenomena are significant enough to give misleadingly optimistic results. In order to determine that, more work is needed, including integration with empirical experimental work with both humans and animals, as well as further modelling developments and explorations. References Belpaeme, T. (2002). Factors influencing the origins of colour categories. Unpublished doctoral dissertation, Vrje Universiteit Brussel. Deacon, T. (1997). The symbolic species: the coevolution of language und the hrain. New York: Norton. Jackendoff, R. (1999). Possible stages in the evolution of the language capacity. Trends in Cognitive Sciences, 3, 272-279. Laskowski, C. (2006). Prototype cntegorisation and the emergence of a lexicon in an infinite world. Unpublished master’s thesis, University of Edinburgh. (http://www.lel.ed.ac.uWhomes /cyp/dissertation/dissertation.pdf) Murphy, G. L. (2002). The big book of concepts. Cambridge, MA: MIT Press. Rosch, E. (1978). Principles of categorization. In E. Rosch & B. B. LLoyd (Eds.), Cognition and categorization (pp. 2748). Hillsdale, NJ: Lawrence Erlbaum. Smith, A. D. M. (2003a). Evolving communicution through the inference of menning. Unpublished doctoral dissertation, Theoretical and Applied Linguistics, School of Philosophy, Psychology and Language Sciences, University of Edinburgh. Smith, A. D. M. (2003b). Intelligent meaning creation in a clumpy world helps communication. ArtiJicinlLife, 9, 559-574. Steels, L. (1997). Constructing and sharing perceptual distinctions. In M. van Someren & G. Widmer (Eds.), Proceedings of the European conference on machine learning (pp. 4-13). Berlin: Springer-Verlag. Steels, L. (2006). How to do experiments in artificial language evolution and why. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of language: proceedings of the 6th internntionnl international conference (pp. 323-332). London: World Scientific. Vogt, P. (2000). Lexicon grounding on mobile robors. Unpublished doctoral dissertation, Vrije Universiteit Brussel.
EVOLUTIONARY FRAMEWORK FOR THE LANGUAGE FACULTY ERKKI LUUK Institute of Estonian and General Linguistics, University of Tartu, Postimajapk 149, Tartu, 50002, Estonia [email protected] HENDRIK LUUK Institute of Physiology, University of Tartu, Estonia Due to the nature of the subject, the field of language evolution has to rely largely on theoretical considerations. A coherent fundamental framework for approaching language evolution has to relate principles of evolution of complex traits with those governing the organization of cognitive processes, communication and natural language architecture. We suggest that by treating the language faculty as a complex trait with predefined functional interfaces, it is possible to delineate the evolutionary forces that have led to the emergence of natural language. We analyze embedding and recursion in communication, and propose a conceptual prerequisite for natural language and fully symbolic reference: a hierarchical way of conceptualization termed 'conceptual embedding' (the ability to nest concepts within concepts). We go on to hypothesize that, initially, the selective force driving the development of the language faculty was towards enhanced conceptualization of reality. According to this scenario, the invention of linguistic communication was a secondary event, dependent on conceptual embedding which supports the sophisticated conceptual underpinnings of linguistic meaning.
1. Introduction
We will start by proposing a general evolutionary framework that establishes the language faculty's functional interfaces and asymmetric dependencies between them. We continue by considering the theoretical foundations of recursion, followed by a discussion of relevant empirical data. Next, we propose the notion of conceptual embedding as a prerequisite for the invention of linguistic communication. A complex communication system cannot evolve until there is motivation to convey complex information (Bickerton, 2003; Nowak & Komarova, 200 1). We hypothesize that such motivation requires the perception of reality in terms of independent, combinable concepts that can be embedded to 203
form interdependent conceptual categories that provide the functional basis for the meaning of words. Accordingly, we suggest that the selective force that triggered the emergence of the language faculty's core components was towards an enhanced conceptualization of reality. We assume that formation and embedding of concepts belong to a hierarchical continuum of higher order associational processes performed by the nervous system.
2. Functional interfaces and functional dependencies We suggest that in order to create an evolutionary framework for a complex trait such as the faculty of language (FL), one has to start by defining its functional interfaces and their dependencies. Functional interfaces are defined as the functional outcomes of a complex trait that are most likely to contribute to the fitness of its bearer and thus motivate natural selection. Functional dependencies, on the other hand, are required to create a continuously evolvable hierarchical structure that can acquire new functions by building on and modifying the structures already present. In our opinion, functional interfaces of FL should include thought (complex conceptualization) and linguistic communication. Since highly differentiated conceptual structure is a prerequisite for the development of a complex communication system like natural language (NL), there is an asymmetric dependence between them. This is based on the observations that (1) NL is very much centered around human conceptual structure, (2) elaborate conceptual structure would increase fitness without NL by enabling conceptualization of principles of reality. 3. Recursion and embedding
M.D. Hauser et al's paper posited FLN/FLB distinction' and hypothesized that "FLN comprises only the core computational mechanisms of recursion as they appear in narrow syntax and the mappings to the Sensory-Motor and Conceptual-Intentional interfaces" (Hauser et al., 2002, p. 1573). Lately, this hypothesis has been vigorously challenged (Jackendoff & Pinker, 2005; Parker, 2006; Pinker & Jackendoff, 2005). First, we will focus on the logical contingencies of embedding and recursion. Second, as the narrower claim of Hauser et al. that recursion is unique to our species has subsequently been
' The faculty of language in the narrow sense or FLN = unique aspects of the language faculty The faculty of language in the broad sense or FLB = the whole language faculty, including the aspects shared with other species or faculties.
205 questioned (Marcus, 2006; Okanoya, 2007; Watanabe & Huber, 2006), we will argue that recursion in non-human animal communication has so far not been attested. 3.1. Recursion
There is a confusion underlying the notion of recursion. In fact, there are two logically independent notions of recursion. In computer science and Chomsky's phrase structure grammar, recursion is a procedure or rule (Chomsky, 1956, 1964, 1975). For some other theorists, recursion is a type of structure: a situation where an instance of an item is embedded in another instance of the same item (Jackendoff & Pinker, 2005; Parker, 2006; Premack, 2004). For the sake of convenience, let us call the former procedural and the latter structural recursion. Procedural recursion implies infinity, whereas structural recursion does not. Thus, structural recursion does not imply procedural recursion, nor vice versa. For instance, the recursive center-embedding rule AB-AABB produces the strings AABB, AAABBB etc. It is impossible to tell by looking at these strings if their production procedure was recursion or concatenation. Furthermore, the strings do not exhibit structural recursion. The reason for this is that they comply with serial mode of communication, whereas structural recursion requires parallel communication. Speech, for instance, is parallel communication as a sequence of vocalizations is matched with sequential interpretation. This is not to deny that its interface - a sequence of vocalizations - is serial (Pinker & Bloom, 1990). For speech, sequential interpretation is, of course, an understatement. Linguistic interpretation is sequential and compounding, merging smaller units that are per se meaningful in the code (Chomsky, 1995; Hauser et al., 2002; Studdert-Kennedy, 1998). As far as we know, linguistic code is unique among species in stipulating parallel interpretation (semantic compositionality)2. Cf. Parker (2006): 'I/---/ faced only with a string, and no pointer to its structure, we cannot distinguish tail recursion from simple iteration. Nested recursion, on the other hand, could be evidenced by a complex string alone". We maintain that this "complex string" must comply with parallel communication in order for nested recursion to be evident. The definition of structural recursion was "an instance of an item is embedded in another instance of the same item". In serial communication, the condition 'the same' proves fatal, as the only interpretation of it would be 'identical', and in given conditions
* Parallel interpretation may be closely related to multitasking (consciously managing two or more operations) - an ability admittedly unique to humans (Donald, 1998).
206 (identical items in serial communication) it is impossible to differentiate an item and structural recursion of the same item.
3.2. Embedding. Recursion in non-human communication? Embedding is a situation where an item is embedded in any item (with infinity not implied). According to Chomsky, 'embedding' is logically independent from procedural recursion (i.e. there can be one without the other). Structural recursion, however, is a proper subset of embedding. Unlike structural recursion, embedding is possible in serial communication (pattern within pattern sequences is an example). Songs of ceateans and birds exhibit this property. The fact that embedding is hierarchical has frequently raised speculation about a putative underlying 'recursive' mechanism (or more unfortunately, resulted in confusing embedding with recursion). As Suzuki et al. correctly remark in discussing humpback whale song (in a paper that has attracted some misled attention as an evidence of recursion in non-human animals), "Hierarchical grammars may be efficiently represented using recursion, although recursion is not necessarily implied by hierarchy" (Suzuki, Buck, & Tyack, 2006, p. 1863). There have also been some claims as to the possibility of recursion in non-human communication in connection with Gentner et al's (2006) experiments with European starlings (Marcus, 2006; Okanoya, 2007; Watanabe & Huber, 2006) but, in this case, pattern recognition and/or counting are more simple and plausible explanations than learning a recursive rule. We submit two general points about attesting recursion in communication. First, it is much easier to attest structural than procedural recursion. Second, structural recursion can be attested in parallel communication systems only. This rules out species that, as far as we know, communicate serially (for instance, songbirds). To our knowledge, neither procedural nor structural recursion has been attested in nonhuman animal communication. This is in concordance with the observation by Fitch et al. (2005) that no non-human animal communication system known shows evidence of syntactic recursion.
4. Conceptual embedding We propose another notion instead of syntactic recursion as an underlying feature of the many critical aspects of the language faculty. The notion is conceptual embedding - a type of embedding not to be confused with recursion. Conceptual embedding (CE) is a cognitive phenomenon directly related to conceptual structure. It is possible that CE (the capacity to nest concepts
207 regardless of the presence of syntactic recursion in a language) is specific to humans. "If non-human animals know in some sense that things have parts that have subparts which have subparts, then again their mental representations, independent of language, have a recursive structure. It is not known whether animals are capable of such mental representations" (Hurford, 2004). We will generalize Hurford's point and submit that it is not known whether or to what extent non-human animals have CE. If they do then it is seemingly confined to limited aspects of reality. CE forms the basis of our capacity to operate on sets, construct categories and make categorical distinctions, and of our capacity to model possible worlds. The latter has been frequently cited as a uniquely human trait (Jacob, 1982). It is useful to think of CE as a hierarchical way of conceptualization. We suggest that the ability to conceptualize any properly abstract category (e.g. cause, value, sign, thought, structure, function etc) is a fair indicator of the species' reliance on CE. Obviously, then, CE is an indispensable building block in the development of the language faculty. Although conceptual embedding per se does not imply syntactic embedding, this core syntactic feature of NL is implemented by CE. 4.1. Conceptual embedding in non-human species?
Until recently, comparative studies have focused primarily on the species' ability to accomplish feats of increasing complexity while the nature of cognitive processes that lie behind these achievements has received less attention (see Hauser et al., 2007, for recent developments). We have reviewed experiments with grey parrots, bottlenose dolphins, bonobos, baboons and diana monkeys, and reanalyzed the results of these experiments with respect to the cognitive strategies used by the species. Only one example is presented below. Since several properties characteristic of human language (e.g. representationality, hierarchical structure, open-endedness etc) are evident in nonhuman primates' social knowledge albeit on a rudimentary level, Cheney and Seyfarth (2005) hypothesize that "I---/ the internal representations of language meaning in the human brain initially emerged from our pre-linguistic ancestors' knowledge of social relations. I---/The demands of social life create selective pressures for just the kind of complex, abstract conceptual abilities that are likely to have preceded the earliest forms of linguistic communication" (p 153). In our opinion, the lack of attribution of mental states (the lack of theory of mind) is a dubious cause for not being able to form differentiated concepts of reality. Moreover, in order to attribute mental states the way humans do, one has
208 to be aware that there are different kinds of mental states and that it is possible to link them to any individuals in the first place. We maintain that such process requires CE, as the concepts of mental states need to be embedded with the concepts of other living beings so that new meaning arises. Thus CE must predate the theory of mind. Since CE is also beneficial in situations not involving social relations, the hypothesis that social lifestyle created the selective pressure for the emergence of human-like conceptual structure might not be justified. 5. Conclusion
In the present article, we have proposed an evolutionary framework for the language faculty. We suggest that by defining the common functional interfaces of neurobiological traits involved in language-associated processes it is possible to delineate the selective forces that have acted upon them. Namely, we hypothesize that the common functional interfaces of the language faculty are thought (a sophisticated form of conceptualization) and linguistic communication. We argue that humans possess a hierarchical way of conceptualization termed as conceptual embedding (the ability to nest concepts within concepts - see section 4 for details). More experiments are needed to prove or refute CE in non-humans but we hypothesize that CE may turn out to be a uniquely human trait. We suggest that CE is at the top of the hierarchical continuum of associative processes performed by the nervous system. As the nervous system evolved, increasingly higher-order associative processes became available which resulted in the emergence of CE in human ancestors. We go on to hypothesize that, initially, the selective force driving the development of the language faculty was towards enhanced conceptualization of reality that is functionally relevant in the absence of linguistic communication. According to this scenario, the invention of linguistic communication was a secondary event, dependent on CE which supports the sophisticated conceptual underpinnings of linguistic meaning. Abbreviations CE conceptual embedding, FL the faculty of language, FLB the faculty of language in the broad sense, FLN the faculty of language in the narrow sense, NL natural language
209 Acknowledgements We thank Noam Chomsky for thorough and critical discussions, and Tim Gentner, Kate Arnold, Jaan Valsiner, Jiiri Allik, and Haldur d i m for comments and suggestions. References Bickerton, D. (2003). Symbol and structure: a comprehensive framework for language evolution. In M. H. Christiansen & S. Kirby (Eds.), Language Evolution: The States of the Art. Oxford: Oxford University Press. Cheney, D. L., & Seyfarth, R. M. (2005). Constraints and preadaptations in the earliest stages of language evolution. The Linguistic Review, 22(2-4), 135159. Chomsky, N. (1956). Three models for the description of language. IRE Transactions on Information Theory, IT2, 1 13-124. Chomsky, N. (1964). Syntactic structures. The Hague: Mouton. Chomsky, N. (1975). The logical structure of linguistic theory. New York: Plenum Press. Chomsky, N. (1995). The minimalist program. Cambridge, MA: MIT Press. Donald, M. (1998). Mimesis and the executive suite: Missing links in language evolution. In J. R. Hurford & M. Studdert-Kennedy & C. Knight (Eds.), Approaches to the Evolution of Language: Social and Cognitive Bases. Cambridge: Cambridge University Press. Fitch, W. T., Hauser, M. D., & Chomsky, N. (2005). The evolution of the language faculty: clarifications and implications. Cognition, 97(2), 179-2 10; discussion 21 1-125. Gentner, T. Q., Fenn, K. M., Margoliash, D., & Nusbaum, H. C. (2006). Recursive syntactic pattern learning by songbirds. Nature, 440(7088), 12041207. Hauser, M. D., Barner, D., & O’Donnell, T. (2007). Evolutionary Linguistics: A New Look at an Old Landscape. Language Learning and Development, 3(2), 101-132. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: what is it, who has it, and how did it evolve? Science, 298(5598), 15691579. Hurford, J. R. (2004). Human uniqueness, learned symbols and recursive thought. European Review, 12(4), 55 1-565. Jackendoff, R., & Pinker, S. (2005). The nature of the language faculty and its implications for evolution of language (Reply to Fitch, Hauser, and Chomsky). Cognition, 97(2), 21 1-225. Jacob, F. The Possible and the Actual. Seattle: University of Washington Press.
21 0 Marcus, G. F. (2006). Language: startling starlings. Nature, 440(7088), 11 1711 18. Nowak, M. A., & Komarova, N. L. (2001). Towards an evolutionary theory of language. Trends in cognitive sciences, 5(7), 288-295. Okanoya, K. (2007). Language evolution and an emergent property. Current opinion in neurobiology, I7(2), 271-276. Parker, A. R. (2006). Evolving the narrow language faculty: was recursion the pivotal step? Paper presented at the Proceedings of the 6th International Conference on the Evolution of Language. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13(4), 707-784. Pinker, S., & Jackendoff, R. (2005). The faculty of language: what's special about it? Cognition, 95(2), 201-236. Premack, D. (2004). Psychology. Is language the key to human intelligence? Science, 303(5656), 3 18-320. Studdert-Kennedy, M. (1998). The particulate origins of language generativity: From syllable to gesture. In J. R. Hurford & M. Studdert-Kennedy & C. Knight (Eds.), Approaches to the Evolution of Language: Social and Cognitive Bases. Cambridge: Cambridge University Press. Suzuki, R., Buck, J. R., & Tyack, P. L. (2006). Information entropy of humpback whale songs. The Journal of the Acoustical Society of America, 119(3), 1849-1 866. Watanabe, S., & Huber, L. (2006). Animal logics: decisions in the absence of human language. Animal cognition, 9(4), 235-245.
ARTIFICIAL SYMBOL SYSTEMS IN DOLPHINS AND APES: ANALAGOUS COMMUNICATIVE EVOLUTION? HEIDI LYN Sea Mammal Research Unit, Gatty Marine Laboratories, School of Bioloa, University of St. Andrews, East Sands St. Andrews, KYI 6 8LB. UK
1. Symbol Systems in Dolphins and Apes Complex cognitive and communicative abilities have been documented in dolphins and chimpanzees in three longitudinal studies. Two used interactive keyboard systems, one a study with bottlenose dolphins (Tursiops truncatus) (Dolphin Keyboard Project - DKP) (Reiss & McCowan, 1993) and the other a series of studies with bonobos (Pan paniscus) and chimpanzees (Pan troglodytes) (Brakke & Savage-Rumbaugh, 1996; Savage-Rumbaugh, McDonald, Sevcik, Hopkins, & Rupert, 1986) (Language Research Center LRC). In these studies a key press resulted in an automated auditory response by a computer (a computer-generated whistle in the case of the dolphins, or an English word in the case of the primates) and a response by the researchers, often the offering of the associated object or activity. The early acquisition of keyboard use in these two studies have been compared in earlier research (Lyn, Reiss, & Savage-Rumbaugh, in revision). The third study explored comprehension abilities in dolphins in a series of studies utilizing both auditory and gestural symbolic codes (Herman, Richards, & Wolz, 1984) (Kewalo Basin Marine Mammal Laboratory - KBMML). This study can also be compared to the ape studies as the ape studies also allowed for exploration of comprehension. To further investigate the existence of parallels in the acquisition of functional or symbolic associations in dolphins, chimpanzees, and bonobos, as well as with well-known acquisition strategies in humans (e.g. Tomasello, 2003), this paper reports on analysis from all three of these studies and compares research methodologies, published results, as well as data newly compiled for the purposes of this comparison. 21 1
212
Because the methodologies of these projects are so distinct, some direct comparisons are frequently not possible. We have endeavored to present data that speak to the same underlying concept ( e g imitation, behavioral concordance). In many cases this means that new data are presented from one project to compare to published findings from the other.
2. Methods (See Table 1 for Methodological comparison) 2.1. Dolphin Keyboard Project (DKP) Two captive-born male dolphins, Delphi and Pan, served as participants; both were 1 1 months old at the onset of the study. The study was conducted at Marine World Africa USA in California for 2 non-consecutive years from 198485 (n=56 sessions) and 1987-88 (32 sessions). The keyboard consisted of a 3 X 3 key matrix that displayed 1-5 distinctive white, three-dimensional visual elements. A key press resulted in the dolphins’ exposure to a specific chain of temporally paired events: a model sound (a computer-generated whistle) was automatically broadcast into the pool and a specific category of object or activity was presented to the dolphin that activated the key. This methodology could be considered a free choice paradigm, where the dolphins were allowed to choose a key, and the experimenter response was determined by the key selected. See Reiss and McCowan, 1993, and Lyn, Reiss, and Savage-Rumbaugh, in revision, for further details. Analyses below utilized published reports as well as new data from written and video records.
2.2. Language Research Center (LRC) Early work at the LRC explored keyboard use and comprehension of two chimpanzees who were 2 years old at the onset of the study. As in the study with dolphins (Reiss and McCowan, 1993) key use by the apes in early work at the LRC resulted in an automated acoustic response by the computer (e.g., an English word) and an offering of the associated object or activity by the researchers (Savage-Rumbaugh, 1986). This methodology is in direct contrast to the later work at the LRC where a key press did not always result in an automated response and neither did it necessarily result in an offering of the associated object. This later work at the LRC explored the symbolic abilities of three bonobos and a chimpanzee (Brakke & Savage-Rumbaugh, 1995, 1996; Savage-Rumbaugh, 1986; Savage-Rumbaugh et al., 1986; Savage-Rumbaugh et a]., 1993; Savage-
213
Rumbaugh, Shanker, & Taylor, 1998). This methodology required that key use be treated as an intentional communication, but not automatically result in an offering of the associated item. A crucial factor to the apes learning to use the keyboard was joint attention between the apes and the researchers and the utilization of the keyboard as a means of prediction and control over their environment. Analyses below utilized published reports as well as new data from computerized and video records.
2.3. Kewalo Basin Marine Mammal Laboratory (KBMML) KBMML research explored the symbol comprehension of two female bottlenose dolphins, both 2 years old at the onset of the study, who were operantly conditioned to respond to gestures or computerized whistles. Individual gestures or whistles were glossed as objects, actions, or modifiers and could be combined in sequences of 2-5 symbolic elements with 2 different combinatorial rules (Subject Verb Object (SVO) in the case of the auditory symbols and Object Subject Verb (OSV) in the case of the gestural symbols). Analyses below include published and unpublished data. Table 1 Project Comp;
ns. Direct comparisons of methodologies and results of the three projects. Language Research Center
Subjects
Species Age at exposure to symbols
Length of project Number of symbols Spontaneous productive vocabulary at end of study Comprehension at end of study Communication methodologies Who could use the symbols Keyboard response when keys are pressed
Dolphin Keyboard Project Pan and Delphi
Kewalo Basin Marine Mammal Laboratory Akeakamai, Phoenix, Elele, Hiapo
Tursiops truncutus
Tursiops lrunculus
1 1 months
-12 months
4 years 2-5 5
23 years Up to 30+ NIA
At the level of a 2 112 yearold child S&A - Free Choice Others - Joint attention Subjects and researchers
NIA
30+
Free choice
Trained comprehension
Subjects
Researchers
Early - Key lights up, lexigram appears on upper board, English played by computer
Computergenerated whistle with time and frequency
NIA
Sherman and Austin (S&A), Kanzi, Mulika, Panbanisha, Panpanzee Pun puniscus Pun troRIodvtes S&A - 2 years. Others early infancy: (6 months, birth, 6 weeks, 6 weeks) 25+ years 6-384 Over 200
214 Later - none
Researcher response when key is pressed
Visual symbols relocated during session Acquisition of symbol use/comprehension Acquisition of acoustic stimuli
Comprehension and production
Further abilities
S&A - offer referent Others - Responded as if it was a purposeful communication, but no set action response (apes may or may not get referent) Yes. then No S&A - With patient teaching Others - Over the course of infancy S&A - English comprehension tests were unsuccessful Others - English comprehension at the level of a 2-1/2 year-old child S&A - Comprehension had to be trained separately from production Others - Spontaneous and simultaneous comprehension and production Behavioral concordance, functional use, use of communicative innovations, use of proto-syntactical word order, hierarchical categorization of errors, “fast mapping”
parameters similar to natural dolphin whistles Offer referent
NIA
Yes
NIA
Within first 5 sessions
Operantly conditioned over 100s of trials
Vocal mimicry within 1 1 sessions
Operantly conditioned over 100s of trials
Productive use of keys, comprehension tests were inconclusive
Comprehension of novel sequences. no production possible
Behavioral concordance, possible functional use, use of communicative innovations
Comprehension of anomalous sequences, reporting of presence/absence of objects, categorization of errors
3. Results 3.1. Acquisition and use of artificial systems All three research projects resulted in learned associations between the symbol elements and the referents (See Table 1). Both productive projects (DKP and LRC) also showed behavioral concordance between the use of keys and subsequent behavior and provided evidence for functional use of keys. Early keyboard use at the LRC was a mix of exploratory key presses (running of the fingers over a number of keys), babbling (pressing keys to themselves, frequently by moving the keyboard away from caregivers and conspecifics
215
before touching keys), and imitation (touching keys directly after the use of those keys by caregivers). At the DKP, dolphins were recorded imitating the computer whistles after a key press as an intermediate stage before whistling while interacting with the appropriate items away from the keyboard (Brakke & Savage-Rumbaugh, 1996; Reiss & McCowan, 1993; Savage-Rumbaugh et al., 1986). These acquisition processes are comparable to initial steps in language learning by human children. Neither the apes nor the dolphins showed position preferences when touching a key on the keyboard, in contrast, their key use was directly comparable to known preferences - apes or dolphins choosing keys associated with preferred items or behaviors (Lyn et al., in revision). In addition, apes at the LRC were recorded behaving appropriately after touching a key (for instance, choosing an apple from an array of foods after touching the key for apple) and the dolphins at the DKP were recorded whistling the appropriate whistle while interacting with an item (for instance, whistling the ball whistle while playing with a ball) (Brakke & Savage-Rumbaugh, 1996; Reiss & McCowan, 1993; SavageRumbaugh et al., 1986). These findings indicate an understanding of the association between key and referent. Both the apes and dolphins (DKP) have been shown to acquire the association between symbol and referent with very few exposures. The apes acquired new English words for novel referents in as few as one exposure session (Lyn & Savage-Rumbaugh, 2000). Similarly, the dolphins were recorded imitating a computerized whistle within the first 19 sessions of being exposed to the keyboard (Reiss & McCowan, 1993). Both apes (LRC) and dolphins (KBMML) were reported to respond correctly to complex sequences of symbols (Herman et al., 1984; Savage-Rumbaugh et al., 1993) - even reversible sequences, suggesting understanding of combinatorial rules. This understanding of combinatorial rules has been explored in more depth in the dolphin project (Herman, Kuczaj, & Holder, 1993). Dolphins were exposed to sequences that violated the rules of the system in some way, including: too many elements, elements out of order, and too few elements in a sequence. The dolphin was shown to reject impossible sequences and to construct correct sequences from longer, incorrect sequences. For example, if the dolphin was given the sequence BALL WINDOW BASKET FETCH - she would either reject the sequence (too many subject or object elements) or perform either BALL BASKET FETCH (bring the basket to the ball) or WINDOW BASKET FETCH (bring the basket to the window). BALL WINDOW FETCH would be rejected as window is an immovable object and the dolphin did not reverse
216
symbol sequences to be able to perform them. This level of combinatorial abilities has not been shown in the ape studies to date. However, in both apes and dolphins (KBMML), error patterns have shown that the animals group the symbol referents into categories spontaneously. The apes have displayed hierarchical categorization of object, foods, locations, and animates (Lyn, in press) and dolphins have displayed categorization of their smaller set of referents into two basic categories - movable and immovable (Herman et al., 1984).
3.2. Observations of communicative innovations Finally, observational reports of apes and dolphins in all three studies have shown innovations that suggest an understanding of the system itself by circumventing limitations to further their communicative goals. Two chimpanzees in the early LRC work utilized unassigned keys when presented with an item that had no key equivalent. Importantly, one chimpanzee would initiate the use of a key and the second chimpanzee would immediately use the same key. Thereafter, that key would be utilized to denote that referent. A dolphin in the DKP also utilized a blank key. However, in this instance, the key (associated with “fish”) had been purposely removed by the experimenters since this dolphin would choose that key to the exclusion of all else. In the absence of the fish key, the dolphin swam to the bottom of the pool, retrieved a piece of fish, and held the fish to a blank key. Finally, a dolphin at the KBMML had been trained to report on the presence or absence of objects in her pool by pressing a “YES” or “NO” paddle. When asked to bring an object in her pool to an object that was not in the pool, she spontaneously brought the object to the “NO” paddle. This same dolphin was reportedly trained to respond to “LEFT” and “RIGHT” accidentally while the experimenters were trying to train “CHANNEL” and “WINDOW”. It so occurred that the channel was on one side of the pool, and the window on the other. When the task was run on the opposite pool side, the dolphin turned the wrong way every time. In later research, the “LEFT” and “RIGHT” were used as modifiers for other objects. 4. Discussion
The apes and communicative acquisition of communicative
dolphins in these studies have shown remarkably similar abilities. From imitation as an initial learning stage, to rapid associations - from comprehension of combinations to innovations, both species display a remarkable ability to acquire
217
communicative capacities that were once thought to be uniquely human. However, why should three species, one so far removed from the primate line, have such similar capacities? Three possible explanations exist: 1) analogous or convergent evolution (cognitive abilities that may have developed in separate lines due to similar environmental pressures) 2) more primitive neural substrates for communicative abilities that guide the development of communication (that were present in ancient common ancestors) or 3) generalized rules of communication (that dictate the form of communication as it becomes more complex). The studies listed above are not sufficient to distinguish between these three explanations. However, some of these abilities have been reported in parrots (Pepperberg, 1999) and dogs (Kaminski, Call, & Fischer, 2004). If further research further delineates similar abilities in these (and possibly other) species, the convergent evolution hypothesis will have to be abandoned. Convergent evolution is further endangered by the fact that this level of communication has not been found in wild studies of these species. Studies of artificial symbol systems and other species may not answer the question of how humans began to utilize language, but they can tell us what did not happen. The finding of symbolic capacities within the primate evolutionary line tell us that these capacities did not evolve after the split with our nearest ancestors. The finding of similar abilities outside the primate line tell us that these abilities may have evolved separately in another line, or that they are part of a more ancient communicative capacity. Further research is required to clarify those possibilities, and to further explore the question of why these species would be biologically capable of this level of communication, and not express that capability in the wild.
Acknowledgements The author wishes to acknowledge Diana Reiss, Sue Savage-Rumbaugh, and Louis Herman. It is their groundbreaking research that allows these comparisons to be made. References Brakke, K. E., & Savage-Rumbaugh, E. S. (1995). The development of language skills in bonobo and chimpanzee - 1. Comprehension. Language and Communication, 15(2), 121-148.
218 Brakke, K. E., & Savage-Rumbaugh, E. S. (1996). The development of language skills in Pan - 11. Production. Language and Communication, 16(4), 36 1-380. Herman, L. M., Kuczaj, S. A., & Holder, M. D. (1993). Responses to anomalous gestural sequences by a language-trained dolphin: evidence for processing of semantic relations and syntactic information. Journal of Experimental Psychology: General, 122(2), 184-194. Herman, L. M., Richards, D. G., & Wolz, J. P. (1984). Comprehension of sentences by bottlenosed dolphins. Cognition, 16(2), 129-2 19. Kaminski, J., Call, J., & Fischer, J. (2004). Word learning in a domestic dog: evidence for "fast mapping". Science, 304, 1682-1683. Lyn, H. (in press). Mental representation of symbols as revealed by vocabulary errors in two bonobos (Pan paniscus). Animal Cognition. Lyn, H., Reiss, D. L., & Savage-Rumbaugh, E. S. (in revision). Early Stages of Keyboard Use in Apes and Dolphins: Analogies in Cognition. Lyn, H., & Savage-Rumbaugh, E. S. (2000). Observational word learning by two bonobos: ostensive and non-ostensive contexts. Language and Communication, 20(3), 255-273. Pepperberg, I. (1999). The Alex studies: cognitive and communicative abilities ofgrey parrots. Cambridge, Massachusctts: Harvard University Press. Reiss, D. L., & McCowan, B. (1993). Spontaneous vocal mimicry and production by bottlenose dolphins (Tursiops truncatus): evidence for vocal learning. Journal of Comparative Psychology, 107(3), 301-3 12. Savage-Rumbaugh, E. S. (1986). Ape language: From conditioned response to symbol. New York, NY, US: Columbia University Press. Savage-Rumbaugh, E. S., McDonald, K., Sevcik, R. A., Hopkins, W. D., & Rupert, E. (1986). Spontaneous symbol acquisition and communicative use by pygmy chimpanzees (Pan paniscus). Journal of Experimental Psychology: General, 115(3), 21 1-235. Savage-Rumbaugh, E. S., Murphy, J., Sevcik, R. A., Brakke, K. E., Williams, S. L., & Rumbaugh, D. M. (1993). Language comprehension in ape and child. Monographs of the Society for Research in Child Development, 58(3-4), V-221. Savage-Rumbaugh, E. S., Shanker, S. G., & Taylor, T. J. (1998). Apes, language, and the human mind. New York: Oxford University Press. Tomasello, M. (2003). Constructing a language: a usage-based theory o j language acquisition. Cambridge, MA, US: Harvard University Press.
THE ADAPTIVENESS OF METACOMMUNICATIVE INTERACTION IN A FORAGING ENVIRONMENT
ZORAN MACURA & JONATHAN GINZBURG Department of Computer Science, King’s College London The Strand, London, WC2R 2LS, United Kingdom {zoran. macurajonathan.ginzburg } @ kcl.ac.uk In this paper we will describe an artificial life model that is used to provide an evolutionary grounding for metacommunicative interaction (MC1)-utterance acts in which conversationalists acknowledge understanding or request clarification. Specifically, we ran artificial life experiments on populations of foraging agents who are able to communicate about entities in a simulated environment, where the main difference between the populations is in their MCI capability. Populations which possess MCI capabilities were quantitatively compared with those that lack them with respect to their adaptability in diverse environments. These experiments reveal some clear differences between MCI-realised populations-that learn words using MCI-and MCI-non-realised population-that learn words solely by introspection, where the main finding using this model is that in an increasingly complex language, MCI has overwhelming adaptive power and importance. These results demonstrate in a very clear way how adaptive MCI can be in primordial settings of language use.
1. Introduction A key feature of natural language is metacommunicative interaction (MC1)utterance acts in which conversationalists acknowledge understanding or request clarification. The need to verify that mutual understanding among interlocutors has been achieved with respect to any given utterance-and engage in discussion of a clarification request if this is not the case-is one of the central organising principles of conversation (Schegloff, 1992; Clark, 1996). Given this, acknowledgements, clarification requests (CRs) and corrections are a key communicative component for a linguistic community. They serve as devices for allaying worries about miscommunication (acknowledgements) or for reducing mismatches about the linguistic system among agents (CRs and corrections). Communication is critical to social organisation. But it is a fragile process, and people often differ in their interpretation of utterances, resulting in miscommunication. Current approaches in investigating miscommunication-what causes it and how people try to repair it-appear in psycholinguistic research (Clark, 1996) and Conversation Analysis (Schegloff, 1992). The work conducted by Macura and Ginzburg (M&G) (Macura & Ginzburg, 2006; Ginzburg & Macura, 2007) has provided some evolutionary grounding for 219
220 MCI, which had not previously been addressed. M&G investigate the significance of MCI in a linguistic population from an evolutionary perspective, building on a formal semantic model of Ginzburg (forthcoming). The hypothesis that MCI plays a key role in the maintenance of a linguistic interaction system is tested in M&G’s work through the use of multi-agent simulation studies. Specifically, artificial life experiments are run on populations of agents who are able to communicate about entities in a simulated environment. Populations which possess MCI capabilities are quantitatively compared with those that lack them with respect to their lexical dynamics. M&G investigate the significance of MCI in both mono-generational and multi-generational population settings. In a mono-generational population, where only horizontal language transmission is modelled, both MCI-realised and MCInon-realised (introspective) populations converge to a shared lexicon, although MCI-realised populations are faster at achieving this. In a multi-generational population, where both horizontal and vertical language transmissions are modelled, the ability to use MCI leads to lexicon sharing, whereas lacking this ability leads to a rapid divergence. That is, while MCI is a part of a linguistic interaction system, a stable language can be maintained over generations. Whereas, without this MCI capacity a language effectively fails. In this paper we extend M&G’s model in order to investigate whether MCI capacity provides an adaptive advantage to a population of foraging agents. A detailed description of the model can be found in Chapter 5 in (Macura, 2007).
2. Model of MCI with an Ecologically Functional Language The main emphasis in M&G’s model is on the role of cultural transmission of language rather than on biological evolution. Thus, language in this model has no ecological function, and there is no notion of agents’ ‘fitness’ which can be used as a selective bias. Such cultural transmission models (e.g. Kirby (2001)) do not put much emphasis on the role of natural selection in language evolution and thus discount the ecological value of language. That is, the main concern is on the role of cultural transmission and individual learning in language evolution. In human societies language does have an ecological function, where sharing of information can be used to enhance some aspects of behaviour. This might be increasing the likelihood of locating food by indicating the whereabouts of food resources or avoiding dangers (such as predators) by indicating presence. A number of models have been developed in which language has an ecological effect, improving the viability of agents. Cangelosi and collaborators (Cangelosi & Parisi, 1998; Cangelosi & Harnad, 2001) developed a model in which the emergence of symbolic communication is studied in an environment containing edible and poisonous mushrooms. In this model functional communication systems have been shown to emerge as a consequence of the evolution of internal representations. Another ecological model was inspired by the Vervet monkeys’
221 alarm call system (Jong, 2000). This model demonstrated that agents can successfully develop a functional lexicon (to avoid predators) by developing categories that represent the agent’s and predator’s positions, and the appropriate action to take. In both models language has an ecological function with the emphasis on natural selection. But the language itself is innate (and thus discounting the role of cultural transmission), and only the ‘fittest’ agents are able to reproduce. This is quite a contrast to cultural transmission models, where generational turnover is random. In this paper we extend M&G’s model into a foraging model with an ecologically functional language that is culturally transmitted, and not innate. Agents in this extended model, as well as being able to communicate about plants, can also consume edible plants, ask about their location, and use deception. By consuming edible plants, agents’ level of vitality increases, hence increasing their fitness (i.e. likelihood of reproduction). A more detailed description of the model follows.
2.1. Foraging Environment The environment is modelled loosely after the Sugarscape environment (Epstein & Axtell, 1996), in that it is a spatial grid containing different plants. This environment is resemblant to the mushroom environment in (Cangelosi & Parisi, 1998; Cangelosi & Harnad, 2001). Plants can be perceived and disambiguated by the agents. Agents walk randomly in the environment and when proximate to one another engage in a brief conversational interaction concerning visible plants.a As well as being used as topics for conversation, plants in this extended model are also used as a food resource. Two types of plants exist in the environment: edible and inedible. Edible plants have an energy value, which indicates the energy an agent can gain by consuming them. When a plant is eaten by an agent its energy becomes 0. A plant grows back at the same location according to its ‘growth rate’ after being consumed, which is the same for every plant. Inedible plants are used just as topics for conversation.
2.2. The Agent The agent behaviour is extended from the previous behaviour in M&G model. Agents in this model are endowed with the ability to distinguish edible from inedible plants. At every time step, throughout the simulation, each agent goes through the same process of walking, looking, communicating and in addition feeding. When an agent feeds depends on two conditions: whether the agent can see an edible plant and whether the agent is hungry. Hunger is defined by the time an agent last ate, and it is the same for every agent. An agent can consume a plant aAn agent’s field of vision consists of a grid of fixed size originating from his location. Hence proximate agents have overlapping but not identical fields of vision.
222 only when standing on it-when both the agent and plant are at the same location (i.e. in the same cell in the grid). Upon feeding, an agent gains the amount of energy of the consumed plant. Each agent has a vitality which indicates the energy of an agent-gained by consuming edible plants. The vitality value is exclusively used as a selective bias for reproduction. The higher an agent’s vitality value is, the likelier it is that this agent will be able to reproduce. But vitality is not used to determine agents’ deaths. That is, agents only die from old age (when reaching their maximum age, which is randomly set at the beginning of the simulation) and thus foraging efficiency does not affect the survivability of an agent-only reproduction.
2.3. Communication Protocol The fitness of an agent in this foraging model is dependant on her vitality (i.e. the higher her vitality the likelier that she will ‘reproduce’) and not on communicative success. But in M&G’s model, language has no effect on fitness. Agents communicate about random plants and the outcome of their conversation does not affect their subsequent behaviour. The conversation is only used in order to allow the modeller to compare their lexicon dynamics. In this extended model, conversational interactions are affected by agents’ internal states and they also do affect agents’ subsequent behaviours. That is, unlike in the M&G’s model where a speaker always talks about a random plant in his field of vision, in this model the speaking agent’s state of hunger plays a role in determining the topic of conversation. If the speaker is not hungry then the conversational interaction proceeds as ‘normal’ where the speaker checks for plants in vision and picks a random plant as the topic. The speaker then chooses a word for the topic-the word with the highest association score in his internal lexiconwhich he sends to the hearer. The hearer updates her lexicon in the same way as in M&G’s model and the conversational interaction terminates.
2.3.1. Deception A hungry speaker, on the other hand, chooses an inedible plant or one with the lowest energy value as the topic of conversation (depending on the contextplants in vision). This is because the speaker tries to distract the hearer from the edible plant he sees-giving himself an opportunity to eat the plant while the hearer walks away from it (possibly in the direction of the topic plant). Some motivation for this comes from the deceptive strategies found in primate societies (Waal, 1998). In the wild, chimps usually forage on their own. But sometimes when coming across food in presence of other chimps, a chimp tries to deceive the others either by behaving indifferently as not noticing the food and coming back to it when the other chimps are not looking or by leading the other chimps in the opposite direction away from the food-eventually returning to consume it afterwards.
223
The deceptive strategy can be useful to the speaker-but only if the hearer understands the word and walks towards the ‘correctly’ perceived plant. In this case, the speaker benefits from the deception as the hearer steps away from the edible plant-even though the hearer might see it as well and be hungry herselfpotentially giving enough time to the speaker to consume the edible plant himself. But the hearer might associate the word heard with a different plant thus move towards the edible plant giving herself a greater chance to consume it before the speaker. In this scenario the miss-understanding is not beneficial to the speaker but it is to the hearer. 2.3.2. Asking for Food Location Apart from this new deceptive capability, agents also have the capability to ask for locations of edible plants. This only happens in conversational interactions when the speaker is hungry and has no plants in vision. By asking for food location, the speaker might receive useful information from the hearer potentially reducing the time in finding a food resource. The hearer’s reply to the food location query is of the form [plantName, location] where plantName is the word for a specific edible plant and location is the x and y coordinates of that plant. The hearer can either give the name and location of a plant she last consumed, or of an edible plant that is currently in her field of vision. Upon getting a reply to his query a speaker might react to this information in different ways depending on his MCI capability. If the speaker understands the word plantName and thinks it refers to an edible plant then he starts walking towards the location in the next time step-even though he does not know the plant’s current energy value. On the other hand, if the speaker does not understand plantName or thinks that it is inedible he can either raise a clarification request trying to clarify the edibility of plantName or ignore the hearer’s response and continue with the random walk.
2.4. Summary In this extended model conversational interactions have an effect on agent behaviour. Depending on the situation, speakers and hearers might benefit from successful and unsuccessful conversations. But because of the complex dynamics involved, it is not clear whether MCI capacity has an adaptive advantage. In the next section we will present results of this foraging model where the adaptiveness of MCI agents is investigated in a mixed multi-generational population consisting of both MCI-realised and introspective communities in a 1: 1 ratio.
3. Experimental Results Before running the experiments an environment is created containing 120 randomly distributed plants. A scarce environment is modelled, where 10% of the
224 plants are edible-12 plant instances in total. The number of different plant types that are edible depends on the plant diversity (i.e. the meaning space). For example, a meaning space of 10 indicates that there are 10 plant types with 12 instances of each plant type in the environment, making up a total of 120 plants. In this case only one plant type is edible. Increasing the meaning space to 20 does not affect the number of edible plants in the environment. Rather the number of edible plant types increases to two, but each plant type now has six plant instancesmaintaining the total number of edible plant instances at 12. The foraging model is initialised with a population of 40 randomly distributed agents-20% of which are infants. The population consists of two linguistic communities, one MCI-realised and the other introspective (MCI-non-realised) in a 1:1 ratio. Thus, initially there are 20 MCI agents and 20 introspective agents in the population. The change in the population make-up is monitored over multiple generations in order to determine whether a specific community becomes more predominant in the population-indicating that it has an adaptive advantage. Results for increasing meaning space values are collected at every 5,000 time steps and the simulation is stopped when it reaches 1.5 million time steps. Figure l(a) illustrates the change in the number of MCI and introspective agents when the meaning space is 10. Initially the MCI-realised community increases sharply in numbers reaching a peak of 28 members. The introspective community, on the other hand, reduces in size reaching a total of 12 members. The reason is that MCI agents are more effective in foraging as more of their conversational interactions are successful-MCI agents are faster at converging to a shared lexicon. Thus, MCI agents have higher vitality values and are more likely to have offspring than introspective agents. After the initial MCI flourishing, the introspective community starts converging to a shared lexicon and thus increasing their foraging efficiency. This results in an increase in introspective numbers and eventually both communities stabilise at 20. Because of their lower vitalities, introspective adults rarely reproduce at first. Therefore, with no infants to feed, the introspective adults accumulate their vitality faster than MCI parents, thus increasing their likelihood of reproducing later on and eventually recuperating in numbers. Increasing the meaning space to 20, thus increasing the difficulty of converging to a common language, has a more significant effect on population dynamics as shown by Figure l(b). The MCI-realised community increases rapidly in number reaching a size of around 30 agents, as was similarly the case for the smaller meaning space of 10. After the initial increase the population stabilises, where the ratio of MCI to introspective agents is roughly 3:l. The introspective community does not seem to be able to recover from this initial fall, as was observed in Figure l(a) for a smaller meaning space. Because of the greater difficulty in converging to a common language, introspective agents become less effective in foraging and thus the community is unable to recover in number.
225
:i
Figure 1. Number of MCI-realised and introspective agents in the population when meaning space equals (a) 10, (b) 20 and (c) 40.
When the meaning space is further increased to 40 the effect on the population dynamics is even more pronounced as shown by Figure l(c). The MCI-realised community rapidly rises in number and by the 200,000 time steps makes up the whole population. Unlike in the lower meaning spaces experiments, where the introspective community was able to survive to the end of a simulation run, the introspective community was not able to survive when the meaning space was increased to 40. 4. Conclusions
The results demonstrate in a very clear way how adaptive MCI can be in primordial settings of language use. When the meaning space was low-thus an easily learnable language-both communities performed similarly. Even though initially the MCI-realised community had an advantage and increased in number, the introspective community was able to recover and stabilise. Increasing the meaning space made it harder for the introspective community to recover from the initial drop in numbers. No MCI capability meant the agents could not raise clarificational requests-after asking for food location-when unsure of the edibility of
226 the plant. Due to a high language divergence, introspective agents had to rely almost exclusively on ‘luck’ (random walk) in finding food resources, whereas MCI-capable agents could resort to clarification requests when unsure of the edibility of aplant. This increased their competitiveness as they were more successful in finding food resources via communication than the introspective agents. So, in an increasingly complex language MCI is of overwhelming adaptive power and importance. This underscores the importance of integrating MCI into any potentially realistic model of the evolution of language.
References Cangelosi, A,, & Harnad, S. (2001). The adaptive advantage of symbolic theft over sensorimotor toil: Grounding language in perceptual categories. Evolution of Communication, 4( I), 117-142. Cangelosi, A., & Parisi, D. (1998). The emergence of a language in an evolving population of neural networks. Connection Science, 10(2), 83-97. Clark, H. H. (1996). Using language. Cambridge: Cambridge University Press. Epstein, J. M., & Axtell, R. (1996). Growing artijicial societies: Social science from the bottom up. MIT Press. Ginzburg, J. (forthcoming). Semantics and conversation. Stanford: CSLI Publications. Ginzburg, J., & Macura, Z. (2007). Lexical acquisition with and without metacommunication. In C. Lyon, C. L. Nehaniv, & A. Cangelosi (Eds.), The emergence of communication and language (pp. 287-301). Heidelberg: Springer Verlag. Jong, E. D. de. (2000). Autonomous formation of concepts and communication. Unpublished doctoral dissertation, Vrije Universiteit Brussel. Kirby, S. (2001). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. ZEEE Trunsuctions on Evolutionary Computation, 5(2), 102-110. Macura, Z. (2007). Metacommunication and lexical acquisition in a primitive foraging environment. Unpublished doctoral dissertation, King’s College London. Macura, Z., & Ginzburg, J. (2006). Lexicon convergence in a population with and without metacommunication. In P. Vogt (Ed.), Proceedings of eelc 2006 (pp. 100-1 12). Heidelberg: Springer. Schegloff, E. A. (1992). Repair after next turn: The last structurally provided defense of intersubjectivity in conversation. American Journal of Sociology, 97(5), 1295-1345. Waal, F. de. (1998). Chimpanzee politics: Power and sex among the apes. Johns Hopkins University Press.
ON THE IMPACT OF COMMUNITY STRUCTURE ON SELF-ORGANIZING LEXICAL NETWORKS
ALEXANDER MEHLER Computational Linguistics, Bielefeld University, UniversitatsstraJe 25,
Bielefeld, 0-33415, Germany A 1 exander.Mehler@uni-bielefeld. de
This paper presents a simulation model of self-organizing lexical networks. Its starting point is the notion of an association game in which the impact of varying community models is studied on the emergence of lexical networks. The paper reports on experiments whose results are in accordance with findings in the framework of the naming game. This is done by means of a multilevel network model in which the correlation of social and of linguistic networks is studied.
1. Introduction There is an overwhelming evidence for the exceptionality of social and linguistic networks which are known fcr their Small World (SW) property (Watts & Strogatz, 1998; Blanchard & Kriiger, 2004): other than random graphs, SW-networks do not only have short geodesic distances, but also a high degree of cluster formation. Steyvers and Tenenbaum (2005) relate this property with the time and space complexity of linguistic networks where it is seen to guarantee efficient memory storage and retrieval. On the other hand, Newman (2003) reports on assortativity in social networks where agents with alike connectivity patterns tend to be linked. Simulation models of language evolution make hardly use of these findings. Rather, they rely on unrealistic community models in which for an increasing number of iterations all agents tend to communicate with each other with equal probability. That is, Fully Connected Graphs (FCG) are implicitly assumed as community models where the smaller the number of agents, the less rounds are needed to complete their connections. Conversely, if the number of rounds is small but the population large, agents communicate only with a small number of other agents so that random graphs emerge. Anyhow, FCGs are unrealistic due to their topology, while random graphs lack the clustering of social networks. Note that in this paper we term populations as (language) communities or agent networks. Recently, there had been efforts to utilize more realistic community models in language simulation. This has been done in the framework of the naming game (Steels, 1998; Baronchelli, Felici, Loreto, Caglioti, & Steels, 2006) in which agents collectively learn a meaning function f : V -+ M from a set of 227
228
words to a set of objects. As namings are seen to be independent, M is reduced to a single object. In this scenario, Baronchelli et al. (2006) start with a community model where sender and listener are always randomly chosen among all agents. Baronchelli, Loreto, Dall’Asta, and Barrat (2006) use instead the model of Barabhsi and Albert (1999) (i.e. the BA-model) in which agent connectivity obeys a power law. They show that under this regime, language convergence is slowed compared to FCGs. See Dall’Asta, Baronchelli, Barrat, and Loreto (2006b) for an extensive discussion of the impact of the topology of agent networks on the naming game. This includes memory complexity which in the BA-community model turns out to be less. Dall’Asta, Baronchelli, Barrat, and Loreto (2006a) complement this picture by starting from an agent network based on Watts & Strogatz’s SW-model and also report an acceleration of the convergence process in conjunction with a reduction of memory load. See also Lin, Ren, Yang, and Wang (2006) who use SWs with homogeneous node degree distributions to separately study the effect of agent clustering. Further, Barr (2004) considers a set of words and of objects whose mapping is learned in a FCG community in comparison to a geometric community model which corresponds to a k-regular graph (Mehler, 2007). All these approaches combine a structured community model with an unstructured meaning space. That is, the set-theoretic naming game does not consider meaning-based associations of lexical items which span lexical networks. Thus, we lack a simulation model which studies the impact of social agent networks on the emergence of linguistic lexeme networks. This paper presents such a model. Our basic hypothesis is that the topology of the agent network does not only have an impact on the process of language change (e.g. by reducing its time and space complexity), but also on the topology of the lexical network being learned. In other words: during language evolution, social network structure imprints on linguistic network structure - at least on the level of topological characteristics. The paper presents a simulation model in support of this hypothesis. In order to do this we invent the notion of an association game which complements the notion of a naming game from the point of view of lexical networks. The paper is organized as follows: Section 2 presents the simulation model and defines association games. Section 3 shows the impact of the community structure on self-organizing lexical networks. Finally, Section 4 concludes and prospects future work.
2. A Three-Level Simulation Model of Self-organizing Lexical Networks The basic idea of our approach is to start form a three-level simulation model of lexical networks. In this so called N3 model, a lexical network is learned by interacting agents subject to their neighborhood relations. More specifically, we distinguish the level of text aggregates (generated by the agents) from the underlying community network and the lexical network as output by the multiagent learning. That is, agent, text and lexerne network are the 3 levels of the N3 model: 1. The agent network is the independent variable. By analogy with the naming
229 game we start from a model of intra-generational language change. Thus, we suppose that during the run of a game agents have stable neighborhoods - solely affected by the random choice of interactants. 2. The lexical network is the dependent variable. Its evolvement is observed in terms of small world characteristics where the size of the underlying lexicon is seen to be fixed during the same run. 3. Finally, the intermediary text level bridges the gap between the social and the language network and, thus, conveys information from the social topology to its linguistic counterpart. A three-level network is exemplified by scientific communication where networking occurs on the level of the scientists involved (i.e. a collaboration network), on the level of the documents being generated (spanning a citation network) and on the level of the shared ontology manifested by these documents. Evidently, networking on any of these levels correlates with structure formation within the other two. In this paper we look on linguistic networking from the point of view of social networking thereby studying ontology formation subject to constraints of the underlying language community (as, e.g., in wiki-based systems). In order to simulate this dynamics we now present a model of social networking, of lexical networking and of text generation & processing.
2.1. Agent Networking Agent communities P are represented as undirected graphs G ( P ) = (P ,E ) . In order to vary G ( P )as an independent variable we implement three graph classes: 0
0
0
Random graphs Grand(P)are based on power law-like degree distributions of agent connectivity. k-regular graphs G,,,(P) are graphs in which each vertex has exactly the same number of neighbors, that is, the same degree k . Finally, small world graphs G,,(P) combine a power law-like degree distribution with a high cluster value and short average geodesic distances.
Watts and Strogatz (1998) used the first two classes to introduce SWs which are both unrealistic in terms of social networking: random graphs lack the clustering of social networks, while small average geodesic distances are absent from regular graphs. However, random graphs share the distance property with SWs, while regular graphs have by definition high cluster values. Random and regular graphs are used as baseline community models. That is, we expect that communities of the sort of Grand(P)and Greg(P) lead to deficient lexical networks when underlying language games. This is seen to be due to their disputable status as models of social networks - in contrast to SWs. In this paper, we generate SWagent networks based on the approach of Mehler (2007). It outputs connected graphs with high cluster values, short distances and power law-like node degrees - in accordance with results about social networks (Newman, 2003).
230 2.2. Lexical Networking The language learned by the community P is the dependent variable. Thus, we focus on lexical networks as target languages represented as undirected graphs. Departing from latent semantic analysis (Landauer & Dumais, 1997) as a singleagent model of learning lexical associations we build a multiagent model (Mehler, 2007). This is done by means of an iteratively computable lexical association measure updated per text unit: For a lexicon V and a sequence S, = ( X I , . . . , x T L ) of n texts, the association of two lexical items v i , wj E V is computed as
where Fik is the number of texts in s k in which vi occurs and fik is the frequency of vi in X k . In accordance with models of human text processing, a ( v i ,vj,Sn) is sensitive to the order of texts in Sn.Next, we endow each agent a E P with this learning model so that he can learn lexical associations subject to the communication situations to which he participates. After t iterations of the language game, that is, after processing sequence St, this leads to a distributed semantic space
in which each agent a E P has his own meaning space M t ( a ) = (V,E;, w:) with edge set E; and weighing function w t ( v i ,v j ) = a ( v i ,wj,5':). Note that lexicon V is common to all agents while the sequence S: of the texts processed by agent a at time t is specific to a. For a text zt processed at time t by agent a we write
A Mt(a)
~ t - l ( a )
or M t ( a ) = z f ( ~ ~ - ~ ( a ) )
(3)
where i indicates how often a processes xt at time t. Thus, at time t the memories M t ( a ) of agents may differ dependent on the text sequences S: they have processed till t. This model resembles the one of Hashimoto (1997). The difference is that we concentrate on syntagmatic associations, optimize the model for iterative computability and clarify the topological characteristics of Mt ( P ) .
2.3. Association Games Now we define Association Games (AG) which generate distributed semantic spaces M ( P ) based on community models G ( P ) . That is, AGs are mappings G ( P ) H M ( P ) from social to linguistic networks. They define an association task in which the sender produces a text x to mask the prime word he used to generate x and where the listener has to identify the prime. A round of an AG looks as follows: starting from a randomly chosen sender as E P , all neighbors of as in G ( P )are picked as listeners U L each getting a separate text (Zollman, 2005). For
231
such a listener a L , the sender is masking the word u+ he used to prime the lexical constituents of his output text xt so that the listener U L has to find out which word the sender had in mind when producing xt. The listener processes xt and tells the sender his guess v- so that a s can decide whether he was understood or not. A single round of the AG is successful if both sender and listener associate the same or related words with the same input text. This scenario resembles the children’s game “I spy with my little eve, something beginning with . . . ”. The difference is that in the association game not denotations, but lexical primes are guessed using texts as underspecified descriptions thereof and where agents learn the underlying priming relations (i.e. lexical connotations) by playing the game. More formally: starting from a sender as at round t and a randomly chosen prime u+,a text of length 1 is generated by collecting a subset of 1 nearest neighbors of v+ in M t - l ( a s ) . Initially, lexical neighbors are picked at random. Note that we suppose fixed text lengths for the whole run of a game. Note further that texts are represented as multi-sets so that types E V may recur. Next, the listener uses xt to activate a subspace in his memory Mt-1 ( a ~and, ) based thereon, to context-prime a guess v - . This is done by an inverse function of text generation which finds the “centroid” among the constituents of xt and their neighbors in M t - l ( a ~ ) After . uttering v - , the sender evaluates this guess by the geodesic distance L(v+,v-) in M t - l ( a ~ ) .Here, we start from the hypothesis that any text generation/processing reinforces the associations being manifested in the outputhnput text so that the sender is “his first recipient”, while the listener always tries to “understand” his input. Now, a successful round is rewarded by reinforcing memory update, while otherwise this reinforcement is o m i t t e d :
!
is a further parameter of the model where T = 0 means that v+ = w-. So what does it mean now to speak about terminological alignment via association games? Under a local perspective this means that if sender and listener align their lexical associations as they continually communicate they finally play the game more and more successful. Under a global perspective it means that if the AG is successfully played by the community P as a whole, this leads to a lexical network which - as we hypothesize - has the SW-property subject to the SW-property of the agent network G ( P ) .This is evaluated in the next section.
T
3. Experimentation We test our hypothesis about the imprint of social on linguistic structure by varying the community model with random, small world and 4-regular graphs using 100 agents. We consider a lexicon of 500 words and set the threshold of the summary language to 0.375. That is, an association between two words is seen to belong to the target language if at least 37.5% of the agents share it. Further, we
232 set the size of texts to 5 tokens and r = 2. Finally, we compute 500,000 iterations of the AG per community model and average over 50 runs. Figure 1 exemplifies a run based on a SW-like agent network. For growing iteration we see gradually evolving a connected graph which (as explained below) results in a SW-like lexical network - starting from a completely disconnected graph. So what happens to the topology of lexical networks if the community model is varied? This is answered in Figure 2. We start from the fraction of words in the largest connected component (lcc) (2.a) and observe that an lcc of all words evolves in the lexicon of the SW- and of the random community. However, in the former this happens faster whereas the regular graph-community lacks such an lcc. Figure 2.b shows the cluster coefficient (Watts & Strogatz, 1998). We observe that the lexicon of the SW-community has a much larger degree of clustering - comparable to the values of wikis (Mehler, 2008). In fact, in the random community-based lexicon clustering is much lower - not to mention the regular graph-community. Figure 2.c completes this picture: the average geodesic distance is smaller and emerges faster in the SW-community based lexicon compared to its random counterpart. However, the regular community-based lexicon seems to have the smallest distance value. This is due to the fact that in 2.c L is computed for the lcc. Thus, in 2.d we normalize L by assuming that unconnected vertices are separated by IVI - 1 edges. Now, the random and regular graph-based agent networks are both outperformed by their SW-counterpart. In summary, we observe an imprint of social on linguistic topology: SW-communities result in SW-like lexical networks, random and regular graphs do not. Moreover, in the latter case the lexical networks do not share properties with their social counterparts: the regular agent network has, per definitionem, a high cluster value, but not its linguistic counterpart. These observations confirm a strong impact of social on linguistic networking. We might conclude that semantic structure is a byproduct of social structure. Why? The answer might be that social relationships are organized in a way which retain efficient information processing within the agent community: they are not to sparse so that agents of the same community have a high chance of successful communication even if they did not communicate before. In other words, SW-like communities allow the efficient emergence of a linguistic common ground in a way far away from completely connected and, thus, much to complex agent networks. To the best of our knowledge this has not been evaluated by a multiagent simulation of lexical networks so far. 4. Conclusion
We introduced association games to study self-organizing lexical networks. We have shown that the topology of the agent community has a strong impact on these networks in intra-generational language change. The role of community structure in inter-generational language evolution is object of future work. This includes
233
..... .................................................. ............... Q ~ 4 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 IBP.C..I--...
Figure 1. Having a look at the dynamics of gradually evolving lexical networks: snapshots after 1,000, 25,000, 50,000 and 300,000 rounds of an association game.
integrating the naming and thc association gamea References Barabisi, A.-L., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286, 509-512. Baronchelli, A,, Felici, M., Loreto, V., Caglioti, E., & Steels, L. (2006). Sharp transition towards shared vocabularies in multi-agent systems. Journal of Statistical Mechanics: Theory and Experiment, P06014. Baronchelli, A,, Loreto, V., Dall'Asta, L., & Barrat, A. (2006). Bootstrapping communication in language games. In Proc. of Evolang6 (p. 11- 18). Barr, D. J. (2004). Establishing conventional communication systems: Is common knowledge necessary? Cognitive Science, 28(6), 937-962. Blanchard, P., & Kriiger, T. (2004). The cameo principle and the origin of scale free graphs. Journal of statistical physics, 114(5-6), 399-416. Dall'Asta, L., Baronchelli, A,, Barrat, A,, & Loreto, V. (2006a). Agreement dynamics on small-world networks. Europhysics Letters, 73, 969. Dall' Asta, L., Baronchelli, A., Barrat, A., & Loreto, V. (2006b). Non-equilibrium dynamics of language games on complex networks. Physical Review E , 74. Hashimoto, T. (1997). Usage-based structuralization of relationships between words. In ECAL97 (p. 483-492). "Financial support of the German Research Foundation (DFG) through the SFB 673 Alignment in Communication (www .s f b 6 7 3 .o r g ) at Bielefeld University is gratefully acknowledged.
234
3 ' $0.6. c
._
0.4-
5
frac(RG)
.
'
go.2-
: frac(Reg)
0
Weg)
0 '
'
0
100
200,
300
400
500
400
500
L .
1
iteration 121
:,
::
0
100
300 iteration
200
0
100
200 300 iteration
400
500
Figure 2. Dynamics of SW-characteristics in the association game: 500,000 iterations are performed per class of agent network: Small World graphs (SW), Random Gruphs (RG) and Regular graphs (Reg). Values are averaged over 50 runs of the game and the respective type of agent network. (a) The fraction of words belonging to the largest connected component. (b) Thc clustcr coefficient C of the lexical network. (c) The average geodesic distance of the largest connected component of the lexical network. (d) The normalized average geodesic distance regarding the entire lexicon.
Landauer, T. K., & Dumais, S. T. (1997). A solution to plato's problem. Psychological Review, 104(2), 21 1-240. Lin, B.-Y., Ren, J., Yang, H.-J., & Wang, B.-H. (2006). Naming game on smallworld networks: the role of clustering structure. Mehler, A. (2007). Evolving lexical networks. In Proc. of Language, Games, and Evolution. Workshop at ESSLLI 2007 (p. 57-67). Mehler, A. (2008). Large text networks as an object of corpus linguistic studies. In A. Liideling & M. Kyto (Eds.), Corpus linguistics. Berlin: De Gruyter. Newman, M. E. J. (2003). The structure and function of complex networks. SZAM Review, 45, 167-256. Steels, L. (1998). The origins of ontologies and communication conventions in multi-agent systems. Autonomous Agents & Multiugent Sys., 1(2), 169-194. Steyvers, M., & Tenenbaum, J. (2005). The large-scale structure of semantic networks. Cognitive Science, 29( 1), 41-78. Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of 'small-world' networks. Nature, 393, 440-442. Zollman, K. J. S. (2005). Talking to neighbors: The evolution of regional meaning. Philosophy of Science, 72,69-85.
A CRUCIAL STEP IN THE EVOLUTION OF SYNTACTIC COMPLEXITY JUAN C . MORENO CABRERA Departamento de Lingiiistica, Universidad Autdnoma de Madrid
28049-Madrid, Spain In this paper I propose that the crucial step in the evolution of complex subordinate syntax out of simple paratactic expressions consists in the use of certain grammatical elements (notably, deictic pronouns) for referring to events and other abstract entities. I argue that the development of complex sentences is made possible by event reference together with the predicate argument structure of simple sentences. As a consequence, the evolution towards syntactic complexity can be explained without assuming an alleged syntactic transformation of adjoined sentences into complex subordinate sentential structures.
1. Introduction
It is usually claimed that in linguistic evolution paratactic expressions precede hypotactic constructions (Givon 1979: 223, Givon 1989: 248, Bickerton 1990: 125-126, Jackendoff 1999: 275, Hurford 2003: 53, Gil 2006: 91, Johansson 2006: 163 among others). In addition, it has also been maintained that embedded syntactic structures develop out of adjoined syntactic structures; this has been called the Parataxis Hypothesis (Harris & Campbell 1995: 282). It is important to note that the two assumptions are independent of each other and that, in spite of this, the former is usually interpreted as implying the latter in many accounts. For example, in his brief description of the historical development of concessive clauses in French, von Wartburg notes: “We have only to examine individual forms in order to see how a paratactical association can gradually and by imperceptible degrees become hypotactical. The complicated constructions of a later period can usually be traced back to the simple juxtaposition practised in the preceding period.” (von Wartburg 1969: 98). Johansson (2006: 163) proposes five stages in the development of complex syntax: 235
236 (1) The development of syntactic complexity 1. One-word stage : Basic semantics with no syntax 2. Two-word stage: Structured, but with none of the properties in 3-5 below 3 . Hierarchical structure without recursivity. No subordinate clauses 4. Recursive syntax and flexible syntax 5 . Full modern syntax In this paper I propose to identify and describe the mechanism that creates hypotactic constructions out of paratactic expressions; that is, the putative way in which human language progressed from stage 3 to stage 4. I argue that this process is semantic-cognitive in nature; therefore, it cannot be treated in purely syntactic terms. I also show that exactly the same process is responsible for the property of sentential recursivity, considered essential to human language (Hauser, M. D., N. Chomsky y W. T. Fitch 2002) and characteristic of the last stages in the development of syntactic complexity, as summarized in the evolutionary sequence proposed by Johansson.
2. From parataxis to hypotaxis It can be argued that the transformation of paratactic constructions into hypotactic structures is one of the mechanisms operating in the transition between stages 3 and 4 in the above sequence. However, an exclusive focus on the formal syntactic relationships between paratactic and hypotactic constructions can only yield a description of the structural output of the process, not a true explanation for it. As Harris & Campbell (1985) note, the Parataxis Hypothesis “does not tell us how hypotaxis, true subordination, developed [...I the beginning of subordination is not explained merely through parataxis.” (Harris & Campbell 1995: 286) The supposed transformation of sentence coordination into sentence subordination is usually given as a typical example of the transition from parataxis to hypotaxis. A standard instance of this development is obtained by comparing the following two sentences (Giv6n 1979: 219):
(2) English a. I know that, (i.e.,) it is true b. I know that it is true The first expression is a complex sentence consisting of two juxtaposed simple sentences. Its syntactic structure is shown in the following diagram:
237 S
S
S
I k n o w that
it is true
Figure 1. Syntactic structure of (2a)
Sentence (2a) has a main transitive verb (know) and a subordinate clause as its direct object (that it is true). In its syntactic structure, this subordinate clause is analyzed as a complementizer phrase (Cook & Newson 2007: 102-104) :
S
\
V I’
NP
I
know
c
s
that
it is true
Figure 2. Syntactic structure of (2b)
The development of hypotaxis out of parataxis can, therefore, be exemplified by a transition from the first syntactic structure to the second. This type of development has been described in detail by historical linguists. The mechanism
238 by which the second structure developes out of the first is usually called reanabsis (Givbn 1979: 219, Harris & Campbell 1995: 287). In this particular case, a deictic element (that) is reinterpreted as a complementizer by means of a grammaticalization process. But this strictily formal description provides no indication of what triggers this grammatical development or else makes it possible. To tackle this issue we must turn to semantics. Thus, in order to make sense of (2a), that must be able to refer to an event or state of affairs. Once this pronoun is used in this way, it can also be interpreted cataphorically for referring to the event conveyed by the following adjoined sentence. The crucial cognitive step made here consists in viewing an event as if it were an individual, to which reference can be made. In (2b), that has been reanalyzed as a complementizer (as argued for and described in detail in Roberts & Roussou 2003: 116-121). In this function, that anticipates a following expression denoting a fact or an event and, therefore, it is also cataphoric in a purely grammatical sense. This reanalysis of a demonstrative as a grammatical element is not a novelty of intersentential syntax; a comparable reanalysis is responsible for the use of pronouns as determiners in determiner phrases, as in that boy. In current literature on syntax, phrases as that boy are analyzed as determiner phrases in which the determiner, not the noun, is the head of the nominal phrase (Cook & Newson 2007: 108). In addition, it can be said that the referential qualities of determiner phrases come from the determiner, not from the noun; this makes sense, since determiners are usually a clitic version of demonstrative pronouns, from a diachronic point of view. Once demonstrative pronouns extend their referential range to events they can be reinterpreted as sentence determiners, that is, as complementizers. As a consequence, the crucial step towards the development of sentential subordination has already been taken in the semantic interpretation of sentence (2a). The development of a complementizer out of a pronoun referring to an event is made possible by this important cognitive advance. I am thus claiming that the syntactic structure of I know that it is true is formally identical to the one for I know that boy, the only difference being the categorial labelling of their syntactic constituents:
239
S
NP
VP
A A
V
D
N
Figure 3. I know that boy
As we can see in the above tree diagrams, the only changes in the second tree with respect to the first are in the syntactic categories forming the constituent which has the patient role in the VP. Such changes show a clear correspondence between determiner phrases and complementizers phrases: (3) DP/CP correspondence a. Determiner phrase (DP) * Complementizer phrase (CP) b. Determiner (D) * Complementizer (C) c. Noun (N) * Sentence (S)
Both (2a) and (2b) are well formed expressions in present day English. Sentence (2a) can in no way be seen as a remnant of a primitive stage of the evolution of the English language; it is, transparently, not a linguistic fossil. In fact, it can be viewed as an expression of ordinary spoken English. Many languages (French, Latin, German, Swedish, English, Russian, Finnish, Hungarian and Mandarin Chinese among them) can be produced in which it is easy to find cataphoric elements in subordinate clauses with a
240 function very similar to this use of English that. Let me introduce a couple of illustrative examples (Moreno 1987: 4):
(4) Russian Ja zlilas’ ot togo, chto nie ponimala jego I got angry (fern.) from that, that no understood (fern.) him ‘I got angry because I did not understand him’ (5) Hungarian Az-ert ult le, mert elfaradt That-for sat down, because felt tired ‘He sat down, because he felt tired’ In both cases, the subordinate clause is anticipated by a pronominal expression. It is worth pointing out that the Russian expressionja zlilas’ ot tog6 ‘I became angry for that’ can be used as a complete sentence. This is possible because of a deictic use of tog6 (genitive form of tot ‘that’), a demonstrative pronoun. It is precisely this deictic use what makes possible the proposed cataphoric use in (4). The same could be said of the Hungarian expression azPrt ult le ‘He sat down for that’. This type of semantic subordination has been called cataphoric subordination (Moreno 1987), and is no different from what we have in the EngIish sentence I know that, it is true. English contrasts with Russian or Hungarian in this respect only in the degree of grammaticalization of the construction: it is lower in English - since (2a) is not usually recognized as a regular hypotactic structure - and higher in Russian, German or Hungarian, since (4) and (5) are typical hypotactic constructions in these languages.
3. Event individuation and the evolution of syntactic complexity The mental operation of event individuation, by which states of affairs are seen as individuals, makes it possible for ordinary demonstrative pronouns to refer to events. In addition, this cognitive evolutionary step is responsible for the development of complex syntax and for its recursive character. It is important to note that the referential interpretation of demonstrative pronouns is not the only way of expressing this mental operation; morphological nominalization is widely used in many languages (including English) for the same purpose (Comrie & Thompson 1985, Koptjevskaja-Tamm 1993). Once they are seen as individuals, events can be arguments (agents, patients, goals, instruments) of certain predications, and can also be used to denote properties of other events (in adverbial clauses) or to denote properties of
24 1
individuals (in relative clauses). In the first case, an event-denoting expression can be the subject of the predication (flying planes can be dangerous) or its object ( I know that John died). In the second case, events can be used for a temporal or space location of other events ( I cried when John died). In the third case, events are used to characterize individuals (the man I saw). An individual entity can be identified by the semantic function it has in an event (agent, patient, goal, instrument). Thus the individual referred to by the man can be characterized by his patient role in the event: I saw him. This is the cognitive origin of relative clauses and of other subordinate structures with a similar function in different languages (Comrie & Kuteva 2005: 191-203)
4. Conclusions
In this paper I have tried to show that event reference and abstract object reference in conjunction with predicate-argument structure open up the possibility for a complex syntax of the subordinating type to develop. As a consequence, predicate-argument structure plus the individuation of events should be seen as the two key-points in the evolution of complex syntax. From a formal point of view, the individuation of events contributes substantially to the recursive character of syntax, since it makes it possible to embed a predicateargument structure into an argument position of a main predicate. Acknowledgements This paper reports on the ongoing research project “Typological and Evolutive Aspects of Linguistic Complexity” (HUM2006-05118) financed by the Ministerio de Educacion y Ciencia, Spain and the FEDER funds. Thanks are due to professor Carlos Piera and to three anonymous reviewers for their useful remarks on a preliminary version of this paper. Refer en ces Bickerton, D. (1990). Language and Species. Chicago: University of Chicago Press. Cangelosi, A., A, D. M. Smith & K. Smith (eds.) (2006). The Evolution of Language. New Jersey: World Scientific. Comrie, B. & S. A. R Thompson (1985). Lexical nominalization. In T. Shopen (ed.) Language Typologv and Syntactic Description. III. Grammatical Categories and the Lexicon (pp.349-398). Cambridge: Cambridge University Press.
242 Comrie, B. & T. Kuteva (2005). The evolution of grammatical structures and ‘functional need’ explanations. In M. Tallerman (ed.) Language Origins. Perspectives on Evolution (pp. 185-207). Oxford: Oxford University Press. Cook, V. J. & M. Newson (2007). Chomsky’s Universal Grammar. An introduction. Oxford: Blackwell. Gil, D. (2006). Early human language was isolating-monocategorialassociational. In Cangelosi, A. D. M. Smith & K. Smith (eds.) The Evolution ofLanguage (pp. 91- 98). New Jersey: World Scientific. Givon, T. (1979). On Understanding Grammar,New York: Academic Press. Givon, T. (1989). Modes of knowledge and modes of processing: the routinization of behavior and information. In T. Givon Mind, Code and Context. Essays in Pragmatics (pp. 237-267). London: Lawrence Erlbaum. Harris, A. & L. Campbell (1995). Historical Syntax in Cross-Linguistic Perspective. Cambridge: Cambridge University Press. Hauser, M. D., N. Chomsky y W. T. Fitch (2002). The faculty of language: What is it, Who has it, and How Did it Evolved. Science, 298, 1569-1 579. Hurford, J. R. (2003). The Language Mosaic and is Evolution. In M. H. Christiansen & S. Kirby (eds.) Language Evolution (pp. 38-57). Oxford: Oxford University Press. Jackendoff, R. (1999). Possible stages in the evolution of the language capacity. Trends in Cognitive Sciences, 3-7,272- 279. Johansson, S. (2006). Working backwards from Modern Language to ProtoGrammar”. In Cangelosi, A. D. M. Smith & K. Smith (eds.) The Evolution of Language (pp.160-167). New Jersey: World Scientific. Koptjevskaja-Tamm, M. (1 993). Nominalizations. London: Routledge. Moreno, J. C. (1987). Towards a typology of subordination. FUNCION, II,I, 111. Roberts, I. & A. Roussou (2003). Syntactic Change. A minimalist approach to grammaticalization. Cambridge: Cambridge University Press. Wartburg, W. von (1969). Problems and Methods in Linguistics. Oxford: Basil Blackwell.
EVOLUTION OF THE GLOBAL ORGANIZATION OF THE LEXICON MIEKO OGURA Linguistics Laboratory Tsurumi University Yokohama. 230-8501, Japan
WILLIAM S-Y. WANG Department ofElectronic Engineering, The Chinese University of Hong Kong Shatin, Hong Kong, China We demonstrate that polysemous links have a profound impact on the organization of the semantic graph, conforming it as a small-world network, based on the data from WordNet and A Thesaurus of Old English. We show that the words with higher frequency and therefore with higher number of meanings construct the higher level of the hypernymy tree and this architecture is robust through the times. We then set our argumentation in an evolutionaly perspective. We also suggest that the small-world topology of the brain has enhanced the small-world semantic configuration.
1. Introduction
In this study we further Sigman and Cecchi (2002) and demonstrate that polysemous links have a profound impact in the organization of the semantic graph, conforming it as a small-world network, based on the data from WordNet (version 2.0, Cognitive Science Laboratory, University of Princeton, 2003) and J. Roberts et al., A Thesaurus of Old English (TOE) (Centre for Late Antique and Medieval Studies, King's College London 1995). We examine the effects of word frequency on the small-world semantic network. Furthermore, we set our argumentation in an evolutionary perspective, and suggest that the small-world topology of the brain has enhanced the small-world semantic configuration. Watts (1999) presents a lattice substrate and a tree substrate for the models of graphs of the small-world networks. Dictionaries make implicit use of the hypernym relationship in defining a word by its hypernym and its specific attributes. Thus we use the hypernymy tree as a base graph. The lexicon then defines a graph, where the nodes are the semantic categories or meanings composed of a set of vertices, i.e., synonyms, and semantic relationships are the 243
244 links. Graph theory provides a number of measurements that characterize the structure of a graph: the characteristic length, which is the median of the minimal distance between pairs of vertices, the distribution of links, i.e., first neighbor connections, and the clusters, which define regions of very high internal connectivity. The methods of computation are as follows. We assume that if the lexicon is composed of monosemous words alone, the length from word i to the synonymous words within the semantic category to which i belongs is 0. The distance from i to the synonymous words in different semantic categories is calculated by climbing up and down the hypernymy tree. In this way the semantic categories connected by the hypernymy tree form vertical networks. When polysemous words are included in the lexicon, the length from word i to the synonymous words within the semantic category A to which i belongs is 0, and that from i to the semantic category B to which the semantic category A is linked through a polysemous word j is 1. If word k in the semantic category B is a polysemous word which is connected to the semantic category C, the distance from i to the synonymous words in the semantic category C is 2. This process continues until there is no polysemous word in the semantic category. In this way semantic categories connected through polysemous words form horizontal networks. For a polysemous word and a monosemous word that joins horizontal networks through polysemous words, the distance via the horizontal networks and that via the vertical networks are compared and the shorter value is adopted. For other monosemous words the distance is calculated via vertical networks. As for links, we assume that word i is linked to the synonymous words in the semantic category (categories) to which i belongs, and to the synonymous words in the hypernym immediately above the semantic category (categories) to which i belongs. As for clusters, we assume that if semantic categories are connected through a polysemous word, synonymous words in the semantic categories to which a polysemous word belongs form the maximal possible number of connected neighbors. If the semantic category is not connected to the semantic categories, synonymous words in that semantic category form the cluster.
2. Small-World Networks of Nouns and Verbs in WordNet and TOE Figure 1 shows the average minimal length as a function of the number of links for 61 10 monosemous verbs (left) and 61 10 monosemous + 5 196 polysemous verbs (right) in WordNet. The figure for the monosemous + polysemous verbs shows power-law distributions of both average minimal lengths and the number
245 of links. The verbs with the greatest number of links form hubs (marked with arrows), which correspond to the most polysemous and frequent verbs: break, make, and get (from the left). 6468 out of 11306 verbs form horizontal networks. The average number of verbs whose distances are calculated via horizontal networks for the average minimal length of each verb is 4854. The inclusion of polysemy changes average minimal length, number of neighbors and connected neighbors and has a profound impact on the organization of the lexicon. On average, the inclusion of polysemy reduces the characteristic length from 8.97 to 6.98, and increases the number of neighbors and connected neighbors from 4.2 to 8.2, and from 8.3 to 27.53 respectively. Thus the inclusion of polysemy creates a clustered short-range, i.e., small-world semantic network. Sigman and Cecchi (2002) analyze 66025 nouns in WordNet (version 1.6). We are now analyzing 114648 nouns in WordNet (version 2.0) by our methods of calculation. mnosemus +polysemus
mnosemus
I
"" 0
5
10
IS
20
numberoflinks
25
3u
0
511
I IN1
150
Z(X1
number oflinks
Figure 1 . Small-world networks of monosemous verbs (lea) and monosemous ipolysemous verbs (right) in WordNet.
We have analyzed 18265 (14792 monosemous + 3473 polysemous) nouns and 7161 (5019 monosemous + 2142 polysemous) verbs in TOE. 16765 out of 18265 nouns and 6825 out of 7161 verbs form horizontal networks. The average numbers of nouns and verbs whose distances are calculated via horizontal networks for the average minimal length of each noun and verb are 16477 and 6660 respectively. On average, the inclusion of polysemy reduces the characteristic length from 9.88 to 4.2, and increases the number of neighbors and connected neighbors from 43.31 to 52.94, and from 1301 to 1549.66 respectively in nouns; and it reduces the characteristic length from 6.77 to 2.99, and increases the number of neighbors and connected neighbors from 32.33 to 52.37, and from 316.59 to 1109.45 respectively in verbs. Figure 2 shows the average minimal length as a function of the number of links for 5019 monosemous verbs (left) and 5019 monosemous + 2142 polysemous verbs in
246
TOE. The verbs with the greatest number of links form hubs (marked with arrows), which correspond to the most polysemous and frequent verbs: (ge)healdan ‘hold’, (ge)niman ‘take’, began ‘bego’ and iiwendan ‘awend’ (from the left).
monosemus + polysemous
(1
200
400 nurrher oflinks
6011
nini
Figure 2. Small-world networks of monosemous verbs (left) and monosemous + polysemous verbs (right) in TOE.
Comparing the results for verbs in TOE with those in WordNet, we find that the change in characteristic length and the average number of neighbors and connected neighbors are greater in TOE than WordNet. That is, the degree of small world is greater in TOE, though the percentage of polysemous verbs is lower. We assume that the interaction between synonymy and polysemy in the horizontal networks is crucial for the degree of the small world. The average number of synonyms in a given semantic category of TOE is 8.59 (max. 84), and that of WordNet is 1.82 (max. 24). The higher number of synonyms form the larger horizontal networks. 3. The Effects of Word Frequency
Table 1 classifies 11306 verbs in WordNet according to the dates of origin that are based on the Oxford English Dictionary, version 2.0 on CD-ROM (OED2). It shows number of words (monosemous words in parentheses), number of meanings, word frequency and number of hypernyms. We find that the older the date of origin, the more the number of meanings and the word frequency, and the less the number of hypernyms. Table 2 classifies 7161 verbs in TOE according to the date of retention that are based on OED2. They show number of words (mondsemous words in parentheses), number of meanings, word frequency and number of hypernyms.
247 The frequency counts are based on Old English Corpus on the World-Wide Web (University of Toronto, 1997). We find that the more frequent the words, the later the date of retention and the more the number of meanings. Table 1. The effects of word frequency on verbs in WordNet number of orixin
OE 12th c. 13th c. 14th c. 15th c. 16th c. 17th c. 18th c. 19th c. 20th c.
words 67 1( 167) 1 15(36) 612(167) 1284(421) 763(278) 1822(830) 1495(837) 732(449) 1886(1269) 1144(938)
number of meanings 4.85
word frequency 102.6
number of hypernyms 2.16
3.96 36 4 2.95 2.58 2.15 1.8 1.64 1.48 1.23
48.1 20.1 10.6 7.6 5.1 1.99 1.27 0.87 0.35
2.1 2.24 2.38 2.5 1 2.65 2.57 2.69 2.57 2.72
Table 2 . The effects of word frequency on verbs in TOE
date of retension
number of words
OE 12th c. 13th c. 14th c. 15th c.
4735(3768) 60(35) 247(126) 263(144) 157(67)
16th c. 17th c. 18th c. 19th c. 20th c.
126(61) 98(57) 45(27) 661(377) 769(357)
number of meanings 1.33 2.2 2.09
word frequency 5.61 8.68 14.43
number of hypernvms 1.66 1.65 1.44
2.13 2.52
11.18 27.3 1 12.81 16.58 7.49 23.36 53.61
1.44 1.33 1.67 1.72 1.73 I .72 1.53
2.24 1.98 1.64 2.05 2.55
We may state that the words with higher frequency and therefore with higher number of meanings construct the higher level of the hypernymy tree and this architecture is robust through the times. Trees have definite roots and branches that distinguish some vertices as more central than others and some links as more significant in that their deletion would result in larger subgraphs becoming
248
disconnected. The obsolescence of more specific words that entered in more recent times would only affect the peripheral semantic structure.
4. An Evolutionary Perspective Hurford (2007) considers that the ability to form complex conceptual structures is crucial to the emergence of human language. He asserts that no such complex communication system could have evolved without reliable cooperativeness, and suggests Tomasello et al. (2005)’s concept of shared intentionality as a key ingredient of humans’ striking willingness to play complex language games with each other. The crucial last biological step towards modern human language capacity was the development of a brain capable of acquiring a much more complex mapping between signals and conceptual representations, giving rise to the possibility of the signals and the conceptual representations themselves growing in complexity (Hurford 2003). Modern complex linguistic systems must have arisen from simpler origins. Computer modelers of emerging language assume that vast quantities of variable and random utterances could gradually converge on fixed forms with fixed meanings by a process of self-organization (Bickerton 2003). We assume that this process shows shortening or elimination of the distance of the meanings and as a consequence, the distance of the utterances by linking or the cooperation in the interactions between the speaker and the listener; vast quantities of variable and random utterances are converted into a small-world of fixed forms with fixed meanings. Fixed forms result in high-frequency forms. Small-world networks are known to optimize information transfer, increase the rate of learning, and support both segregated and distributed information processing. We assume that human cooperation evolved to maximize efficiency in information transfer and cultural learning. In keeping with ideas from grammaticalization theory about meaning, the earliest languages would have had, in their semantics: no metaphors; no polysemy; no abstract nouns; fewer subjective meanings; less lexical differentiation; fewer hyponyms and superordinate terms (Hurford 2003). We have shown how lexicons have evolved and organized through Old English to Present-day English. TOE, our database for OE uses as its main source material the word senses from the OED and standard Anglo-Saxon dictionaries. These dictionaries are based on 2000 some surviving Old English texts. In spite of the limitations of Old English data, we may assume that the
249 numbers of words and meanings have increased from Old English to the Present-day English. The lexicon itself has grown in complexity. The lexicon has grown from the higher level of the hypernymy tree to the lower. The words with higher frequency and therefore with higher number of meanings construct the higher level of the hypernymy tree and this architecture is robust through the times. Furthermore, the inclusion of polysemy produces a drastic global reorganization of the hypernymy tree, i.e., it is converted into a small world, where all meanings are closer to each other. Bassette & Bullmore (2006) state that brain network architecture has likely evolved to maximize the complexity or adaptivity of function it can support while minimizing costs. Several aspects of brain structure are compatible with a selection pressure to minimize wiring costs. However, it is evident that the complete minimization of wiring would allow only local connections, leading to delayed information transfer and metabolic energy depletion. To counteract this effect, the brain also minimizes energy costs by adding several long-distance connections, creating a small-world network. We may assume that small-world topology of the brain would have led to a small-world semantic configuration. Ogura (1996) shows that metaphoric transfer of a lexeme from one sensory modality to another, one of the most common types of metaphoric transfer in languages, begins in frequent words first among synonymous words, and forms a polysemous word. Synonyms may be realized cortically by functional webs largely overlapping in their semantic, mainly extra perisylvian, part. The best activated word web, whose internal connection strength is likely influenced by the word frequency, would ignite first (Pulvermuller 2002). Nerve cells of the cerebral cortex are arranged in clusters, each cluster corresponding to a column of the cerebral cortex. An excited cluster projects on to other columns in the cerebral cortex by association fibers so that there is sequential activation of cluster to cluster (Eccles 1977). We may assume that the best activated word web may converge with a cluster of cells from another modality, forming a small-world neural network. Brain function depends on adaptive self-organization of large-scale neural assemblies, but little is known about quantitative network parameters governing these processes in humans. Our quantitative analysis of structural connection patterns in semantic networks provides insights into the functioning of neural architectures in the human brain.
250 5. Conclusion
We have demonstrated that polysemous links have a profound impact in the organizarion of the semantic graph, conforming it as a small-world network, based on the data from WordNet and TOE. The higher frequency words construct the higher level of the hypernymy tree. This architecture is robust through the times, forming the basis of the small-world network. Then we have set our argumentation in an evolutionary perspective. We have also suggested that the brain network architectures have evolved to maximize the complexity while minimizing costs and enhanced the evolution of the configuration of semantic structure. Acknowledgements
This work is supported by the grants from the Human Frontier Science Program and the Ministry of Education, Culture, Sports. Science and Technology of Japan. References
Bassett D. S . and Bullrnore E. (2006). Small-World Brain Networks. The Neuroscientist, 12,512-523. Bickerton, D. (2003). Symbol and Structure: A Comprehensive Framework for Language Evolution. In M. H. Christiansen and S. Kirby (Eds.), Language Evolution (pp. 77-93). Oxford: Oxford University Press. Eccles, J. C. (1977). The Understanding of the Brain. New York: McGraw-Hill. (Second edition) Hurford, J. R. (2003). The Language Mosaic and its Evolution. In M. H. Christiansen and S. Kirby (Eds.), Language Evolution (pp.38-57). Oxford: Oxford University Press. Hurford, J. R. (2007). The Origins of Meaning. Oxford: Oxford University Press. Ogura, M. (1 996). Lexical Diffusion in Semantic Change: With Special Reference to Universal Changes. Folia Linguislica Historica XVZ, 29-73. Pulvermiiller F. (2002). The Neuroscience oflanguage. Cambridge: Cambridge Univ. Press. Sigman, M. and Cecchi G. A. (2002). Global Organization of the Wordnet Lexicon. PNAS, 99, 1742-1747. Tomassello, M. et al. (2005). Understanding and Sharing Intentions: The Origins of Cultural Cognition. Brain and Behavioral Sciences, 28, 675-735. Watts, D. J. (1999). Small Worlds. Princeton/ Oxford: Princeton Univ. Press.
FROM MOUTH TO EYE DENNIS PHILPS Department of English (IRPALLKAS), University of Toulouse-Le Mirail, 5, alle‘es Antonio-Machado, 31 058 Toulouse cedex 9, France Within a semiogenetic theory of the language sign (SGT), 1 claim that human speech emerged and evolved as a consequence of the implementation of an unconscious, somatotopically mapped, self-referential body-naming strategy. This strategy would notably have involved recruiting the cyclical, open-close mandibular movements of nonlinguistic, goal-orientated oral activities such as biting and chewing, and of pre-linguistic oral activities such as primitive calling and shouting, for purposes of articulated communication. Specifically, the occlusive sounds produced by these open-close movements may have come to function metonymically (the sound for the movement) as initial ‘building blocks’ on which to construct, by syllabification and consonant accretion, fully linguistic signs ‘naming’ the speech organs concerned, their movements and positions relative to one another, and their immediate physiological environment. The resulting signs, vectoring submorphemic iconicity, may then have been extended by conceptual transfer to denote other parts of the body exhibiting cyclical, goal-orientated, open-close movements, including the hands, characterized by their prehensile function, and the eyes, characterized by the opening and closing of their lids.
1. Introduction As pointed out by Oudeyer (2006: 6 ) and others, one of the most problematic issues in science is the way in which human beings came to talk, an issue which concerns the language faculty and language evolution in general. With respect to those scholars working on the origins of the human language faculty, Oudeyer states that in this field, “Linguists, even though they may continue to provide crucial data on the history of languages, are no longer the main actors.” (2006: 8). As a linguist, I would suggest that one of the reasons for this is the very nature of Saussurian linguistics, based as it is on the principle of the arbitrary (i.e. externally unmotivated) nature of the sign. By this account, a sign acquires its value not as a result of language-external conditions, but solely on account of the differential, language-internal relations it maintains simultaneously with other, like signs in the system, whether these relations be (op)positional, distributional or functional. In an attempt to address this issue, I have conducted research within a field located at the crossroads of cognitive linguistics and the origins of thc human language faculty, namely that of semiogenetics, which addresses the possible conditions of emergence and evolution of the language sign. The latter expression refers to the sign conceptualized as having become arbitrary over time, rather than as being arbitrary by definition. The underlying assumption, 251
252 that the sign was originally ‘natural’ to some extent, is based on converging evidence not only from structural linguistics, cognitive linguistics, and neurophysiology, but also from cultural, historical and philosophical sources. A general claim such as this does, of course, need to be backed up by wide cross-linguistic research, and other scholars’ work evokes this requirement (e.g. Paget 1930, Johanesson 1949, Allott 2001). However, in this paper, most of the language data I adduce involve certain Proto-Indo-European (PIE) root patterns. The main reason for this limitation is that my claim has necessitated adapting cognitive tools of analysis such as conceptual transfer (Heine 1997), selfreferentiality (Searle 1983), body-naming strategies (Matisoff 1978) and topological invariance (Lakoff 1990) to the submorphemic structure of these roots (see Hinton, Nichols, & Ohala 1994: 5) and their derivatives in various branches of Indo-European, particularly Germanic. Although these tools are employed by other scholars in their work (Svorou 1994, Enfield, Majid, & van Staden 2006, etc.), they are not often applied to submorphemic data (but cf. Mannheim 1991: 187-188), or to proto-languages, which in some cases have not even been reconstructed. In short, the bodynaming strategy I postulate below is, with a few morphologically marked or unmarked exceptions, e.g. Fr. doigt (de la main, de pie), Middle Eng. knop ‘rounded protuberance formed by the front of the knee or the elbow-joint’, and Mod. Eng. nail finger-hoe-), no longer visible linguistically in modern IndoEuropean languages other than at submorphemic level. An example of the latter is provided by Eng. kn-, phonologically In/, in knee, knop (obs.), and knuckle (n.), all of which denote protrusive, bony, angular or rounded parts of the body associated with articulated movements, and having locative potential.
2. Towards a serniogenetic theory of the language sign Within a semiogenetic theory of the conditions of emergence and evolution of the sign (SGT) sketched out in Philps 2006, I have claimed that if open-close articulatory gestures are neurophysiologically coordinated with goal-orientated, open-close hand gestures, as Gentilucci et al. (2001) have demonstrated experimentally, then the occlusive sounds produced by such articulatory gestures could themselves have become goal-orientated, by which I mean referential. These gestures would first have acquired the capacity, at an early stage in the emergence of speech, to refer back to the parts of the human vocal apparatus involved in their production metonymically (the sound for the movement), a capacity I call ‘autophonic’. The occlusive sounds produced by these open-close mandibular movements may then have served, if coordinated with open-close hand movements, to refer to the hands and to manual gestures deictically, and to stand for them iconically (Peirce 1991: 251-252) through the
253 implementation by the mind of an unconscious, somatotopically mapped, bodynaming strategy. The transformation of occlusive sounds from pre-linguistic to linguistic status would have been effected morphogenetically, I propose, by syllabification and subsequent consonant accretion, i.e. C- > CV- > CVC- (see Southern 1999: 152 and Oudeyer 2006: 28). In some language families, this process would ultimately have resulted in homonymic roots such as PIE *gher- ‘to cry out’, whose derivatives in various Indo-European languages denote mouth-related activities, and *gher- ‘to grasp, scrape, scratch’, whose derivatives denote handrelated activities (*gh- > *ghe- *gher-). I have also suggested that this self-imitative, articulatory simulation strategy may also have served to ‘name’ symmetrical, perceptually salient parts of the body other than the hands, notably the knees, by means of a neurocognitively grounded process variously known as ‘conceptual projection’ (Fauconnier & Turner 2002: 305), ‘conceptual mapping’ (Lakoff & Johnson 2003: 256), or ‘conceptual transfer’ (Heine 1997: 7). Heine, for instance, states that on the basis of the conceptual transfer patterns they attest in many languages, basic body-parts “may serve as structural templates to denote other body-parts’’ (1997: 134), while F6nagy suggests that “Speech organs may represent other organs of the human body.”(1983: 18, my translation). Here, I claim that this bodynaming strategy may have served to ‘name’ another symmetrical, perceptually salient part of the body exhibiting goal-orientated, open-close movements, namely the eyes, characterized by the opening and closing of their lids, otherwise known as ‘blinking’, a process which would have given rise to words denoting eye-related phenomena and their expressive connotations. 3. Mouth-eye coordination
Before presenting the PIE language data adduced to substantiate the above body-naming hypothesis, I shall examine whether evolutionists and other specialists of the human language faculty provide any clues as to its plausibility. Let us begin with Darwin, who states, in his The Expression of the Emotions ... (1 998), that mouth-eye coordination characterizes certain facial gestures. Speaking of the movement of the eyebrows when a state of attention changes into one of surprise, he asserts that the eyebrows, after first being slightly raised, are then raised to a much greater extent, with the eyes and mouth widely open. He continues: “The degree to which the eyes and mouth are opened corresponds with the degree of surprise felt; but these movements must be coordinated; for a widely open mouth with eyebrows only slightly raised, results in a meaningless grimace.” (1 998: 278, my underlining). Studdert-Kennedy & Goldstein, who also remark on the close link between facial expression and vocal tract configuration noted by Darwin, build on recent work on mirror neuron systems
254 (see Rizzolatti & Craighero 2004), which leads them to hypothesize that “vocal imitation evolved by coopting and extending the facial mirror system with its characteristic somatotopic organization.” (2003: 247). As reported by Gibbs (2006: 222), Piaget (1952) evokes the actions of infants learning to imitate acts that they cannot see themselves perform, notably blinking. Before accomplishing the correct action, they may open and close their mouths and hands, or cover and uncover their eyes with a pillow. Elsewhere, Piaget notes that “on seeing someone else’s eyes close and open again, he [the baby] will open and close his mouth, thus wrongly assimilating the visual schema of the model’s eyes to the tactilo-kinssthetic schema of his own mouth.” (195 1: 201). Also, in MacNeilage’s “FrameKontent Theory”, the claim is made that the lipsmack could be a precursor to speech, one of the reasons being that the lipsmack is “an accompaniment of one-on-one social interactions involving eye contact, and sometimes what appears to be turn-taking. This is the most likely context for the origin of true language.” (1998: 504). MacNeilage sees the open-close mandibular cycle as the main articulatory building block of speech production, and the evolution of the mouth open-close alternation for speech as “the tinkering of an already available motor cyclicity into use as a general purpose carrier wave for time-extended message production.” (1 998: 506). 4. Submorphemic evidence for mouth-to-eye transfer from PIE
While ascribing to the view that an etymological approach cannot enable us to trace linguistic descent further back than Proto-Indo-European, I renew my claim that reanalysing root-initial occlusives (an older term for ‘plosives’) which function as core invariants in PIE as articulatory gestures of occlusion (Browman & Goldstein 1992), can allow us to trace the static, manner feature [occlusive] in its dynamic, gestural guise as [occlusion] as far back as theories of speech evolution will permit. It will be noted that a core invariant is the smallest structural unit within a given subset of words to which a common notion may be attributed on the basis of submorphemic invariance. In Philps 2006, I backed up my hypothesis with evidence furnished by a submorphemic analysis of the relation between certain identical PIE root forms, e.g. *gher- ‘to call out’/*gher(oldest form *gher-) ‘to grasp, scrape, scratch’ (Rix et al.: 2001: 177). I further showed that there is evidence, in the form of root-final *-1-/*-r-alternation that does not correlate with a change in basic meaning (e.g. *ghel- ‘to call’/*gher‘to call out’), even though consonant alternation in root-final position normally entails semantic differentiation (Benveniste 1964: 35), that the consonant occupying the CI slot in the canonical PIE root structure CleCz- (here, the voiced, aspirated tectal occlusive *gh-), functions as a core invariant and C,,
255
consequently, as a variable. Recall that the medial vowel in the PIE root plays a role that is essentially morphological (aspectotemporality). Furthermore, if one examines homonymic PIE roots such as *gher- ‘to call out’/*gher- ‘to grasp, scrape, scratch’ from a notional point of view, there emerges a fairly consistent, though statistically limited, pattern of reference to the conceptual domain of ‘orality’ on the one hand (here, ‘to call out’), and that of ‘manuality’ on the other (here, ‘to grasp, scrape, scratch’). While it may be argued that this root homonymy is due to originally arbitrary connections, the systematic patterns of conceptual transfer between certain bodily domains attested to by the PIE roots discussed here, and others (e.g. *gem- ‘jawbone, chin’/*genu- ‘knee’, Philps 2007), can scarcely be denied. Hence the possibility that this homonymy is externally motivated cannot safely be ruled out a priori. In the same way as *gh- in *ghel-/*gher- ‘to call (out)’, root-initial *bh- and *gh- appear to function as core invariants in *bhel- ‘to shine’ and *ghel- ‘to shine’, the presumed source (*bhel-) or possible source (*ghel-) of verbs in the Germanic languages denoting eye-related phenomena, e.g. Eng. blink, blind, Germ. blinzeln, blenden, Dutch blinken, blikken (< *bhel-), and Eng. glare, glance, glimpse (< *ghel-), some of which may however be late, analogicallydriven innovations. Again, this analysis is based on the postulate that in the ‘doublets’ *bhel- ‘to shine’l*bherh,g- ‘to shine’ (cf. *bher- ‘brown’) and *ghel‘to shine’/*gher- ‘to shine’ (the latter in Baltic, Slavic and Germanic only), *-Iand *-r- function as variables. If this is so, then the above body-naming hypothesis may legitimately be extended to mouth-to-eye projection, since the eyes are also characterized by the cyclical, open-close movement of their lids. In other words, if open-close movements and positions of the jaws and mouth a r e neurophysiologically coordinated with voluntary or involuntary open-close movements and positions of the eyes and eyelids for expressive purposes, as Darwin suggests, then the occlusive sounds produced by these open-close mouth movements could have served, at an early stage in the evolution of speech, as an articulatory ‘building block’ around which to construct iconically motivated signs denoting eyerelated phenomena. The transformation of these sounds into linguistic units would also have been effected, conceivably, by syllabification and consonant accretion, ultimately producing roots such as PIE *bhel- (*bh- > *bhe- *bhel-) and *ghel- (*gh- > *ghe- > *g^hel-) with the potential to denote eye-related phenomena such as blinking, glaring, shining, and colour. It may also be noted in this respect that Eng. cry (v.), though not traceable beyond its probable onomatopoic origins in Classical Latin (quirttare), has denoted both a mouthrelated action (‘to shout’) and an eye-related action (‘to weep’) since the 16‘h century. In language families other than PIE, Andersen, for instance, has identified a pattern of conceptual transfer between the eyes and the face in Mayan, by
256 analysing polysemous lexical items, e.g. 15 (Sango) ‘eye, face’, as well as items that have both these meanings in Tarascan, Huastec, and over thirty other Mayan languages. In the Semitic languages, Bohas, one of the few scholars to tackle submorphemic phenomena in this light, has noted similar patterns in Arabic involving the eyes, the chin, and the notional invariant ‘(concave) curvature’, at matrix level (2000: 1 1 1). 5. Conclusions
By reanalysing the root-initial occlusive occupying the C, (core invariant) slot in, e.g., PIE *ghel-/*gher- ‘to call (out)’, as a gesture of occlusion, and hypothesizing that the sound thus produced originally served to refer autophonically to the goal-orientated, open-close mandibular movements involved in pre-linguistic oral activities such as primitive calling, we may project this gesture back to a possible scenario of the emergence of human speech. According to this scenario, which partly echoes that of MacNeilage (1998), speech, seen as a motor function, would have emerged when the cyclical, open-close mandibular alternations characteristic of the non-linguistic and pre-linguistic oral activities detailed above underwent a series of sequenced adaptations resulting in them being employed in content-modulated syllabic frames for purposes of visuofacial and phonatory communication. My own semiogenetic theory of the sign does however go a step further than MacNeilage’s, in that it proposes a possible route by which what is effectively oral self-referentiality may have expanded into bodily selfreferentiality (for nasal self-referentiality, see Philps 2006: 248). As described above, the route in question involves the putative open-close, occlusive sounds produced self-referentially by the movements of the vocal organs being recruited, albeit unconsciously, to simulate the homologous, open-close movements of other symmetrical parts of the body, particularly the hands and, as I have claimed here, the eyes, with which the movements and positions of the jaws and mouth are apparently coordinated. This embodied process of selfsimulation would then be empathetic in nature (Gibbs 2006: 35-36), and would imply the existence of dedicated “as-if’ mechanisms, as well as a neurally based mirror neuron system allowing what Gallese et al. (2007) refer to as (mutual) “intentional attunement”. This anthropocentric body-naming strategy would have had the evolutionary advantage of providing continuity and stability of reference in space-time for all the members of a given speech community, since it is based not only on knowledge of self and others, but also on a body-schema which is presumably common to each member of the community in question. As Reed points out, “The body schema includes the invariant properties of the human body. For example, it stores information about the spatial relations among body
257
parts, the degrees of freedom for movements at joints, and knowledge of body function. ... Since it contains information relevant to all bodies, the body schema is used to represent others as well as the self.” (2002: 233). Finally, my claim that the open-close movements of the eyelids, and eyerelated phenomena in general, may originally have been simulated iconically by the occlusive sounds produced by coordinated, open-close movements of the jaws and mouth, implies that these sounds could have been employed almost interchangeably-PIE *bh- in *bhel- involves (aspirated) bilabial occlusion, and *gh- in *gheZ- (aspirated) tectal occlusion-during the initial learning curve, until discretization and conventionalization became fully operational. As reported by Studdert-Kennedy & Goldstein (2003: 25 l), Ferguson & Farwell (1975) suggest that during the initial stages of language learning, infants’ vocal gestures are poorly controlled and uncoordinated, resulting in the almost interchangeable appearance of certain realizations, including stops and glides. It will be recalled in this respect that MacNeilage’s “FrameKontent Theory” is itself based on empirical studies of early consonant articulation and syllable-formation in infants’ babbling. Although the process of language acquisition by modern-day infants is surely very different from that experienced by the earliest humans, both necessarily involve a learning curve, as speakers move on from a wobbly command of emerging, goal-orientated vocal gestures to the mastery of a stable, discretized, self-organizing system of speech in which possible traces of original submorphemic iconicity would gradually have become arbitrary. References Allott, R. (2001). ?he natural origin of language. The structural inter-relation of language, visual perception and action. Knebworth, Able Publishing. Andersen, E. S. (1978). Lexical universals of body-part terminology. In J. H. Greenberg (Ed.), Universals of human language, vol. 3: Word structure (pp. 335-368). Stanford: Stanford Universi?, Press. Benveniste, E. (1964 [1939]). Repartition des consonnes et phonologie du mot. In Etudes phonologiques de‘die‘es a la me‘moire de M le Prince N.S. Trubetzkoy (pp. 27-35). Alabama: University of Alabama Press. Bohas, G . (2000). Mafrices et e‘tymons.De‘veloppemenfsde la the‘orie.Lausanne: iditions du Zebre. Browman, C. P., & Goldstein, L. (1992). Articulatory phonology: an overview. Phonetica, 49, 155180. Darwin, Ch. (1998 [1872]). The expression of the emotions in man and animals. Introduction, afterword and commentaries by P. Ekman. London: HarperCollins. Enfield, N. J., Majid, A,, & van Staden, M. (2006). Cross-linguistic categorisation of the body: introduction. Language Sciences, 28, 137-147. Fauconnier, G., & Turner, M. (2002). The way we think. Conceptual blending and the mind’s hidden complexities. New York: Basic Books. Ferguson, C. A,, & Farwell, C. B. (1975). Words and sounds in early language acquisition. Language, 5 1 , 4 19-43 9. Fonagy, I. (1983). La vive voix. Paris: Payot.
258 Gallese, V., Eagle, M. E., & Migone, P. (2007). Intentional attunement: mirror neurons and the neural underpinnings of interpersonal relations. Journal of the American Psychoanalytic Association, 55, 131-176. Gentilucci, M., Benuzzi, F., Gangitano, M., & Grimaldi, S. (2001). Grasp with hand and mouth: a kinematic study on healthy subjects. JournalofNeurophysiology, 86, 1685-1699. Gibbs, R. W., Jr. (2006). Embodiment and cognitive science. Cambridge: Cambridge University Press. Heine, B. (1997). Cognitivefoundations of grammar. New York: Oxford University Press. Hinton, L., Nichols, J., & Ohala J. J. (1994). Sound symbolism. Cambridge: Cambridge University Press. Johanesson, A. (1949). Origin oflanquage: four essays. Reykjavik: H. F. Leiftur. Lakoff, G. (1 990). The invariance hypothesis: is abstract reasoning based on image-schemas? Cognitive Linguistics, 1-1, 39-74. Lakoff, G., & Johnson, M. (2003 [1980]). Metaphors we live by. Chicago & London: University of Chicago Press. MacNeilage, P. F. (1998). The framdcontent theory of evolution of speech production. Behavioral and Brain Sciences, 21,499-546. Mannheim, B. (1991). The language of the Inka since the European invasion. Austin: University of Texas Press. Matisoff, J. A. (1978). Variational semantics in Tibeto-Burman: the “organic”approach to linguistic comparison. Philadelphia: Institute for the Study of Human Issues. Oudeyer, P.-Y. (2006). Self-organization in the evolution of speech. Oxford: Oxford University Press. Paget, R., (1 930). Human speech. London: Kegan Paul, Trench, Trubner & Co. Peirce, Ch. (1991 [1906]). Prolegomena to an apology for pragmaticism. In J. Hoopes (Ed.), Peirce on signs (pp. 249-252). Chapel Hill: The University of North Carolina Press. Philps, D. (2006). From mouth to hand. In A. Cangelosi, A. Smith & K. Smith (Eds.), The evolution of language (pp. 247-254). Singapore: World Scientific Publishing. Philps, D. (2007). Conceptual transfer and the emergence of the sign: a semiogenetic approach to PIE *&nu- ‘jawbone, chin’ and *&nu- ‘knee’. Cognitextes, 1.2. http://aflico.asso.univM e 3 .fr/cognitextes/journal.htm. Piaget, J. (1951). Play, dreams and imitation in childhood. Trans. by C. Gattegno & F. M. Hodgson. New York: W. W. Norton. Piaget, J. (1952). The origins of intelligence in children. Trans. by M. Cook. Madison: International Universities Press. Reed, C. L. (2002). What is the body schema? In A. N. Meltzoff & W. Prinz (Eds.), The imitative mind. Development, evolution, and brain bases (pp. 233-243). Cambridge: Cambridge University Press. Rix, H. (Ed.), Kummel, M., Zehnder Th., Lipp, R., & Schirmer, B. (2001). Lexikon der indogermanischen Verben. Wiesbaden: Reichert. Rizzolatti, G., & Craighero, L. (2004). The mirror-neuron system. Annual Review of Neuroscience, 27, 169-192 Searle, J. R. (1983). Intentionality. An essay in the philosophy of mind. Cambridge: Cambridge University Press. Southern, M. R. V. (1999). Sub-grammatical survival: Indo-European s-mobile and its regeneration in Germanic. Washington: Journal of Indo-European Studies 34. Studdert-Kennedy, M., & Goldstein, L. (2003). Launching language: the gestural origin of discrete infinity. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 235-254). Oxford: Oxford University Press. Svorou, S. (1994). The grammar of space. Amsterdam: John Benjamins.
WHAT USE IS HALF A CLAUSE? LJILJANA PROGOVAC English Department, Wayne State University, 5057 Woodward, Detroit, MI 48230, USA The erroneous notion ... has been that the intermediate stages in the evolution of structures must he useless - the old saw of ‘What use is half a leg or half an eye?’ (Carroll, 2005, 170-171).
1. How to Halve Syntax
Focusing on the evolution of syntactic structures, the first goal of this paper is to show that ‘half-clauses’ do exist, and that they are indeed used and useful, even in present-day languages. Moreover, there is reason to believe that comparable (proto-syntactic) creations constituted an evolutionary stepping stone into more complex syntax (see e.g. Pinker & Bloom (1990) and Jackendoff (1999, 2002) for the proposal that syntax evolved gradually).” Intriguingly, in modern-day languages, such half-clauses actually serve as the foundation upon which finite clauses/sentences are built, leading to quirks and complexities that best befit a gradual evolutionary scenario. The following (half-)clauses will be discussed, in comparison to their full counterparts: ‘incredulity’ clauses (e.g. Sheila sad?!) (Section 2), perfective clauses (e.g. Problem solved.) (Section 3), and Serbian perfective unaccusative clauses (Section 4).b Relying on the theoretical framework of Minimalism (e.g. a My goal is also to demonstrate that a case for gradual evolution of syntax can be made even using the tools of Minimalism, the mainstream syntactic theory, although Chomsky himself does not subscribe to the gradualist view (see e.g. Chomsky 2005 and references there). The challenges for subjecting syntax to evolutionary scrutiny are greater than with other domains, as originally pointed out by Bickerton ( e g 1990, 1998) (see also Newmeyer, 2003, and references there), leading to the wide-spread view, summarized in Benvick (1998, 338-339), that “there is no possibility of an ‘intermediate’ syntar between a non-combinatorial one and full natural language - one either has Merge in all its generative glory, or one has no comhinatorial syntax at all ...” I, In a similar sense, certain marginal ‘exocentric’ ( i t . headless) compounds, available crosslinguistically (e.g. dare-devil, kill-joy, scare-crow, turn-coal),can be seen as ‘half-compounds,’ in comparison to the more articulated endocentric counterparts (e.g.joy-killer, head-turner, mindreader) (see Progovac 2007b and other papers quoted there). The former also precede the latter in child language acquisition.
259
260
Chomsky, 1995), I will show that the full counterparts of each of these clauses involve at least one additional layer of syntactic structure, and are thus at least double in syntactic size.‘ Moreover, even though half-clauses and their full counterparts partly overlap in their function, they also exhibit a degree of specialization (with respect to e.g. mood, tense, aspect and agreement). As put in Carroll (2005, 170-171), “multifunctionality and redundancy create the opportunity for the evolution of specialization through the division of labor.. .”
2. Incredulity Half Clause: Specialization for Mood Modern syntactic theory (including e.g. Chomsky, 1995, 2001) analyzes every clause/sentence as initially a small clause (SC), call it half-clause (examples (a) below), which gets transformed into a full/finite clause, considered to be a T(ense) P(hrase), only upon subsequent Merger of tense (examples in (b)), and subsequent Move of the subject to TP (examples in (c)) (e.g. Stowell, 1981, 1983, Burzio, 1981, Kitagawa, 1986, Koopman & Sportiche, 1991, Hale & Keyser, 2002). In other words, the layer of TP is superimposed upon the layer of small clause:d (1) a. Small Clause: [SC/AP Sheila sad] b. [ ~ is p [Ap Sheila [Ax sadJJ]-+ c. Sentence: [TP Sheila [T’ is [AP t [A’ s a d ] ] ] ] (2) a. Small Clause: [ S C / V ~Peter retire] b. [Tp will [w Peter [v retire]]] c. Sentence: [Tp Peter [T, will [Vp t [vz retire]]]] (“t” stands for the trace of the moved subject.) -+
With some modifications, as one reviewer suggests, the insights of this paper may also be expressed in Chomsky’s later work on phases (e.g. Chomsky 2001). However, I present this paper without introducing formalisms of particular versions of the Minimalist framework, not only because of the lack of space to introduce such formalisms to interdisciplinary readers, but also because they change from year to year, and vary from researcher to researcher. Another reviewer in fact complains that there are already too many theory-internal assumptions in the paper. Instead, I base my paper on the discoveries and claims which are reasonably uncontroversial in this framework, which have withstood the test of time and empirical scrutiny, and which both predate Minimalism and survive into its later versions. These claims include the layering of sentential structure and the derivation of the sentence (TP) from the underlying small clause (half-clause), as discussed in the following section. In this paper, I abstract away from the assumption in Minimalism that there are two verb phrases in a clause, a VP and a VP, and for ease of exposition just represent the whole v P N P shell as VP. However, I believe that the VP shell can also be seen as an evolutionary innovation, which was at some point superimposed over the layer of VP, introducing agency and transitivity. This paper only discusses intransitive clauses.
261
The (a) examples involve only one clausal projection, which is sometimes referred to uniformly as SC (Small Clause), while other times it is considered to be the projection of the predicate, thus an AP (Adjective Phrase) in ( l ) , and VP (Verb Phrase) in ( 2 ) . The full finite clauses in (c) have at least two layers of clausal structure: the inner SC layer, and the outer TP layer. Full clauses even have two subject positions: one in which the subject is first Merged as the subject of the small clause (‘t’ in the (c) examples), and the other in which the subject actually surfaces, after Move. In fact, in certain sentences, both subject positions can be filled (see e.g. Koopman & Sportiche, 1991): ( 3 ) [TP The jurors [T’ will [VP [V’ r i s e ] ] ] ] . In this sense, then, a SC is indeed half a clause in comparison to the corresponding finite clause. But, what use is half a clause like that? In fact, each of these half-clauses can be shown to have some utility even in modern-day languages, as illustrated below (see Progovac, 2006, and references there).e (4) Sheila sad?!Peter retire?! Him worry?! All rise! Everybody out! While full tensed counterparts (TPs) specialize for indicative mood and assertion, half-clauses in (4) seem restricted to (elsewhere, non-indicative) ‘irrealis’ functions, ranging over expressions of incredulity, commands, wishes. In the evolutionary perspective, if there was a stage of proto-syntax characterized by such small clauses, then in that stage such clauses may have been able to express assertions as well, there not yet having arisen the opportunity for the division of labor.‘The emergence of Tense/TP would have created such an opportunity for specialization between half-clauses and full finite clauses. A similar scenario has been reported for the grammaticalization of tense and indicative mood in more recent times, in pre-Indo-European (pre-IE). According to e.g. Kiparsky (1968), there was a form unmarked for tense and mood, injunctive, which, upon the grammaticalization of tense, began to specialize for non-indicative/irrealis moodsg Arguably, child language The syntactic analysis of this kind of ‘nonsentential’ speech is based on Barton (1990), Barton & Progovac (2005), and Progovac (2006) (see also Tang 2005 for some discussion). Fortin (2007), who embeds her analysis in the phase framework of Minimalism ( e g Chomsky 2001) also argues for the nonsentential analysis of certain syntactic phrases, such as adverbials, vocatives, and bare unergative verbs, but she specifically argues against such an analysis of any propositional constructs, such as small clauses in (4), which arc the sole focus of this paper. Progovac (2007a, b) argues that this small clause grammar represents a ‘living fossil’ of an early stage. of grammar (according to Ridley (1993, 525), living fossils are species that have changed little from their fossil ancestors in the distant past, such as e.g. lungfish). The notion of language ‘fossils’ was introduced in Bickerton (1990, 1998), and adopted for syntax in Jackendoff (1999,2002). In this injunctive stage of pre-IE, according to Kiparsky (1968), it was possible to express time by temporal adverbials, which, unlike grammaticalized tense, were neither obligatory nor associated with a specific functional position, and which can best be described as adjuncts. In fact, in Greek and
262 acquisition proceeds in the comparable fashion (e.g. Radford, 1988, Lebeaux, 1989, Ouhalla, 1991, Platzak, 1990), providing, at the very least, corroborating evidence for the syntactic simplicity/primacy of half-clauses (small clauses), relative to finite T P S . ~ 3. Perfective Half Clauses in English: Specialization for Time/Aspect
English also makes use of marginal perfective clauses such as (9,which can also be characterized as half-clauses with respect to their full counterparts (6). ( 5 ) Problem solved. Case closed. ( 6 ) The problem is solved. The case is closed. Again, as established in the previous section, modern syntactic theory derives the full counterparts from the small clause layer, by adding a TP layer, and by moving the subject into it:’ (7) a. Small Clause: [sc,w [Problem solved]] b. [TP is [vp the problem [vv solved]]]+ c. Sentence: [TP The problem [T, is [vp t [v sohedl]]] While the determiner the (instantiating the DP layer) is obligatory in the TP domain, it is not in the SC domain, suggesting that the subject of the halfclause does not check/assign structural nominative case (see Progovac, 2006). This surprising property is more readily observable with pronoun subjects in half-clauses, which surface in the (default) accusative form, rather than nominative form (e.g. Him retire?!, Mefirst!). It is as if half-clauses do not have enough functional power to give their subjects a structural case. In Minimalism, structural nominative case is typically associated with the projection of TP. In contrast to the incredulity clauses of the previous section, the perfective half-clauses in (5) can and do express statements/assertions - their anchoring in time and reality is most probably facilitated by the perfective Sanskrit, verbs are commonly put into (what looks like) present tense when modified by adverbs denoting past time (Kiparsky, p. 47), and this is considered to be a vestige of the Proto-IE injunctive. To respond to a reviewer’s question, it is probable that the use of temporal adverbs preceded the grammaticalization of tense in the evolution of syntax. In this view, the availability of relevant words (in this case temporal adverbs) does not imply the existence of a corresponding functional projection (in this case TP), but it can potentially lead to its grammaticalization. For the opposing views on L-1 acquisition, see e.g. Guasti (2002) and references therein. For some old and some recent views on the relationship between ontogeny/DEVO (development in children) and phylogenyEV0 (development in species), the reader is referred to e.g. Ridley (1993), Rolfe (1996), Fitch (1997), Carroll (2005), Locke & Bogin (2006). ’ Here and elsewhere in this paper, 1 abstract away from the possibility that there may be intermediate functional projections involved in the derivation of these clauses, such as perhaps Asp(ect)P or Agr(eement)P, or that the theme subjects in (5) could be Moved from complement positions (see Progovac 2006 for some discussion). Even if these projections and derivations turn out to be necessary, they would not take away from the basic argument here that half-clauses lack at least one layer of functional structure found in full finite clauses.
263 (completed) aspect of the participle form. Even though of a different nature, specialization with respect to full clauses/TPs is evident here as well: while their full counterparts can range over different times, half-clauses specialize only for reporting on events which have just manifested themselves, in the here-and-now, disallowing modification by adverbs denoting remote past: ( 8 ) ??Problem solved three years ago. ??Case closed three years ago. Serbian unaccusative perfective clauses share this property with English perfective clauses, as will be shown in the following section. 4. Unaccusative Half Clauses in Serbian: Time, Aspect, Agreement,
Word Order
Consider the following examples of full/finite (perfective) unaccusative clauses in Serbian: ( 9 ) PoSta j e stigla. Vlada je pala. maib3sG AUX.3SG arrivedFsG government^^ AUX.3SG fallenFSG ‘The mail has arrived.’ ‘The government has fallen.’ Unaccusative verbs (e.g. arrive, fall, come, appear) are analyzed crosslinguistically as starting/Merging their subjects as complements/objects of the small clause, rather than as its subjects (e.g. Burzio, 1981). Given this widely accepted analysis, full/finite unaccusative clauses are derived as follows: (10) a. Small clause: [scpala [Np vlada]] 3 b. [Tpje [VP pala [NP vladalll+ c. Sentence: [TP vlada [T- j e [W pala t ] ] ] Again, there is a half-clause layer involved in the derivation of the full clause, but this time, following the logic of unaccusative syntax, the subject is Merged after the verb. Again, what use is half a clause like that? As it turns out, such half-clauses ( l l ) , and necessarily with that (unaccusative) word order, are used productively in Serbian, alongside with the full finite counterparts illustrated in (9) (Progovac, 2007a): (1 1) Stigla pos’ta. (cf. ???Pos’tastigla.) Pala vlada. (cf. ?* Vlada pala.) As is the case with English perfective clauses discussed in the previous section, Serbian unaccusative half-clauses specialize for the here-and-now, reporting on an event that has just manifested itself. Consequently, these clauses cannot be modified by adverbs denoting remote past, such as ‘three years ago’ (?*Stigla poSta pre tri godine.), leading again to a division of labor. Moreover, some
’ Serbian otherwise has flexible word order, but typically SVO. The closest English equivalents occur in fossilized expressions such as Come winter, she wiN fruvel to Rome), which are also necessarily found in the unaccusative word order (cf. *Winter come, she wiN be in Rome).
264 formulaic unaccusative clauses (12) are only possible as half-clauses, and not as full clauses, when used to perform a speech-act in the context of a card game: (12) Pala kartu. (cf. ?*KurtupuZu. / ?*Karta je pala.) fallen card ‘The card is already on the table - you cannot take it back now.’ These clauses first of all provide a forceful argument that half-clause syntax is real: their word order can only be explained if the widely-adopted unaccusative hypothesis is coupled with the half-clause analysis. The awkwardness of the (otherwise default) SV order (11-12) makes it clear that they are not just abbreviated/elIiptical versions of some finite counterparts. Rather, these half-clauses, as well as the ones illustrated for English in the previous sections, demonstrate consistent and systematic properties of a different, simpler clausal syntax: a syntax that involves one (less) layer of clausal structure, the basic (underived) word order, non-finite verb forms, default case (for details, see Progovac 2006,2007b). From the evolutionary point of view, it is significant that half-clauses (1 1) to some extent overlap in function with their full equivalents (9), even though they show a degree of specialization as well. While the participles in halfclauses contribute to the perfective aspect (but have no tense or TP), the full counterparts mark both perfective aspect (with the participle) and (past) tense (with the auxiliary). This expression of time/aspect must be redundant at least to some extent (especially for the here-and-now situations), given that only past tense auxiliaries are compatible with these participle forms. In any event, these unaccusative half-clauses demonstrate that it is possible to have simpler (nonTP) syntax and still express statementdassertions. Agreement properties of these clauses exhibit redundancy and overlap even more obviously. As indicated in the glosses in (9), the participle form agrees with the subject in number and gender, but not in person, the type of agreement that also characterizes adjectives in Serbian. On the other hand, the auxiliary verb agrees with the subject in person and number (but not in gender). It is as though both layers of the clause have their own subject position (see Section 2), their own separate agreement properties, which partly overlap, and their own ways of encoding time/aspect, which again partly overlap.
5. Retracing the Steps The above established quirky (rather than optimal) properties of modern-day clauses, attested cross-linguistically, begin to make sense if they are seen as byproducts of evolutionary tinkering.k My proposal in this respect is that a layer of TP (or a comparable functional projection) was at one point in evolution
‘ See Calvin & Bickerton (2000),
especially the Appendix, for the idea that one should use evolutionaly considerations in constraining syntax, rather than only theory-internal constraints.
265 superimposed upon the layer of a small clause (half-clause), the proto-syntactic construct which already was able to express some basic clausal properties: predication, subjecthood, and even some temporaVaspectual properties. If so, then half-clauses would have been useful to our ancestors. A half-clause is still useful, even in expressing propositional content - much more useful than having no clausal syntax at all, and much less useful than articulated finite syntax. This is exactly the scenario upon which evolution/selection can operate. Even finite clauses/sentences in modern-day languages are constructed upon the foundation of half-clauses - as if the building of the sentence retraces evolutionary steps (Progovac, 2007b). Stratification accounts have been proposed for the brain development in general: according to e.g. Vygotsky (1979/1960, 155-156) “brain development proceeds in accordance with the laws of stratification of construction of new levels on old ones... Instinct is not destroyed, but ‘copied’ in conditioned reflexes as a function of the ancient brain, which is now to be found in the new one.” In this perspective, half-clauses can be seen as the older/lower structures, which are retained in, and subordinated to, the newerhigher sentential/TP structures. As put in Bickerton (1998, 353) “the creation of a new neural pathway in no way entails the extinction of the previous one. The fact that we remain capable of functioning in the protolinguistic mode . .. indicates the persistence of the older link.”
Acknowledgements For many good comments and discussions, I am grateful to the three anonymous reviewers, as well as to: Martha Ratliff, Eugenia Casielles, David Gil, Tecumseh Fitch, John Locke, Ana Progovac, and the (other) audiences at 2006 MLS, 2007 GURT, 2007 ILA, 2007 Max Planck Workshop on Complexity, 2007 ISU Conference on Recursion, and 2007 FASL. All errors are mine.
References Benvick, R. C. (1998). Language evolution and the Minimalist Program: The origins of syntax. In J. R Hurford et al. (Eds.), (pp. 320-340). Barton, E. (1 990). Nonsentential Constituents. Amsterdam: John Benjamins. Barton, E. & Progovac, L. (2005). Nonsententials in Minimalism. In Elugardo, R. & Stainton, R. (Eds.), Ellipsis and Nonsentential Speech (pp. 71-93). New York: Springer. Bickerton, D. (1990). Language and Species. Chicago: University of Chicago Press. Bickerton, D. (1998). Catastrophic evolution: The case for a single step from protolanguage to full human language. In J . RHurford et al. (Eds.), (pp. 341-358). Burzio, L. (I98 1). Intransitive Verbs and Italian Auxiliaries. Ph.D. Dissertation, MIT. Calvin, W. H. & Bickerton, D. (2000). Lingua ex Machina: ReconcilingDarwin and Chomsky wifh the Human Brain. Cambridge, MA: MIT Press. Carroll, S . B. (2005). Endless Forms Most Beautifil: The New Science of Evo Devo and the Making of the AnimalKingdom. New York: W. W. Norton & Company. Chomsky, N. (1995). The Minimalist Program. Cambridge, MA: MIT Press. Chomsky, N. (2001). Derivation by phase. In M. Kenstowicz (Ed.) Ken Hale: A Life in Language (pp. 1-52). Cambridge, MA: MIT Press. Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36, 1-22.
266 Fitch, W. T. (1997). Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. Journal of the Acoustical Society ofAmerica, 102, 1213-22. Fortin, C. (2007). Some (not all) nonsententials are only a phase. Lingua 117,67-94. Guasti, M. T. (2002). Language Acquisition: The Growth of Grammar. Cambridge, MA: MIT Press. Hale, K. & Keyser, S. J. (2002). Prolegomena lo a 77zeory ofArgument Structure [Linguistic Inquiry Monograph 391. Cambridge, MA: MIT Press. Hurford, J. R., Studdert-Kennedy, M., and Knight, C. (Eds). (1998). Approaches to the Evolulion of Language: Social and Cognitive Bases. Cambridge: Cambridge University Press. Jackendoff, R. (1999). Possible stages in the evolution ofthe language capacity. Trends in Cognitive Sciences, 3.7,272-279. Jackendoff, R. (2002). Foundations of Language: Brain, Meaning, Grammar, Evolution. Oxford: Oxford University Press. Kiparsky, P. (1968). Tense and mood in Indo-European syntax. Foundations ofLanguage. 4,30-57. Kitagawa, Y . (1 986). Subjecfs in English andJapanese. Ph.D. Dissertation, University of Massachusetts, Amherst. Koopman, H. and D. Sportiche. (1991). The position of subjects. Lingua. 85, 21 1-258. Lebeaux, D. (1989). Language Acquisition and the Form ofthe Grammar. Ph.D. Dissertation, University of Massachusetts, Amherst. Locke, J. L., & Bogin, B. (2006). Language and life history: A new perspective on the evolution and development of linguistic communication. Behavioral and Brain Science, 29. 259-325. Newmeyer, F. J. (2003). What can the field of linguistics tell us about the origins of language? In Christiansen, M. H., & Kirby, S. (Eds.), Language evolution (pp. 58-76). Oxford: Oxford University Press. Ouhalla, J. (1991). Functional Categories and Parametric Variation. London. Routledge and Kegan Paul. Pinker, S.. & Bloom. P. (1990). Natural language and natural selection. Behavioraland Brain Sciences, 13,707-784. Platzak, C . (1990). A grammar without functional categories: A syntactic study of early chiidlanguage. Nordic Journa/ofLinguislics, 13, 107-126. Progovac, L. (2006). The syntax of nonsententials: Small clauses and phrases at the root. In L. Progovac, K. Paesani, E. Casielles & E. Barton (Eds.). The Syntax ofNon-sententials: Multidisciplinary Perspectives (pp. 33-7 1). Amsterdam: John Benjamins. Progovac, L. (2007a). Root small clauses with unaccusative verbs. Presented at FASL 16 (Formal Approaches to Slavic Linguistics), Stony Brook, May 2007. Submitted to the Proceedings. Progovac, L. (2007b). Layering of grammar: Vestiges of evolutionary development of syntax in modern-day languages. Presented at the Workshop on Language Complexity, Max Planck, Leipzig, Germany, 2007. Submitted to the volume to be published by Oxford University Press. Radford, A. (1988). Small children’s small clauses. Transactions of the PhilologicalSociety, 86, 143. Ridley, M. (1 993). Evolution. Oxford: Blackwell Scientific Publications. Rolfe, L. (1996). Theoretical stages in the prehistory of grammar. In A. Lock and C. R. Peters (Eds.), Handbook of Human Symbolic Evolution (pp. 776-7921, Oxford: Clarendon Press. Stowell, T. ( 1 98 1). Origins of Phrase Sfructure. Ph.D. Dissertation, MIT. Stowell, T. (1983). Subjects across categories. The Linguistic Review, 2/3,285-3 12. Tang, S-W. (2005). A theory of licensing in English syntax and its applications. Korean Journal of English Language and Linguistics, 5 , 1-25. Vygotsky, L. S. (1979/1960). The genesis ofhigher mental functions. In J. V. Wertsch (Ed.), The Concept ofActivity in Soviet Psychology (pp. 144-188). New York, M.E. Sharpe.
THE FORMATION, GENERATIVE POWER, AND EVOLUTION OF TOPONYMS: GROUNDING A SPATIAL VOCABULARY IN A COGNITIVE MAP RUTH SCHULZ, DAVID PRASSER, PAUL STOCKWELL, GORDON WYETH, AND JANET WILES School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia We present a series of studies investigating the formation, generative power, and evolution of toponyms (i.e. topographic names). The domain chosen for this project is the spatial concepts related to places in an environment, one of the key sets of concepts to be grounded in autonomous agents. Concepts for places cannot be directly perceived as they require knowledge of relationships between locations in space, with representations inferred from ambiguous sensory data acquired through exploration. A generative toponymic language game has been developed to allow the agents to interact, forming concepts for locations and spatial relations. The studies demonstrate how a grounded generative toponymic language can form and evolve in a population of agents interacting through language games. Initially, terms are grounded in simple spatial concepts directly experienced by the agents. A generative process then enables the agents to learn about and refer to locations beyond their direct experience, enabling concepts and toponyms to co-evolve. The significance of this research is the demonstration of grounding for both experienced and novel concepts, using a generative process, applied to spatial locations.
1. Introduction For autonomous agents to interact effectively with humans, they require the ability to connect their internal representations of the world to human language. Grounding refers to the relationship between things in the world, internal categories, and their symbols (Harnad, 1990). While researchers have emphasised different aspects of the grounding problem, the central role of grounding is to provide meaning for primary concepts and to associate language terms with those concepts. Our approach emphasises interaction between concepts and language, rather than the primacy of one or the other. Human language is generative rather than being a one-to-one labelling of symbols to concepts. Hence a complete theory requires the grounding of concepts that
267
268 cannot be directly experienced. Appropriate representations are a way of bridging between symbols and the world. In particular, a cognitive map provides an internal representation of places and their relations in the world (O’Keefe & Nadel, 1978). The most basic spatial concepts correspond to areas in space and are referred to by labels for places, such as city or suburb names. Areas within an environment or along a path can also often be described by single words, such as corner or corridor, or larger regions such as kitchen or office. W e call names for places in an environment toponyms (i.e. topographic names), and a set of such terms to comprehensively describe an environment a toponymic language. In this study, we have drawn on insights from behavioural studies of spatial language, related mathematical and computational models, and agent-based language games. In English, spatial relations are generally referred to by spatial prepositions, with directions and distances combined to form spatial terms such as ‘in front of’, ‘near’, and ‘at’. Human experiments (Logan & Sadler, 1996) and theoretical investigations (O’Keefe, 1996; Zwarts, 1997) have described spatial templates for terms defining areas in the world. Models of spatial language have been developed, including language game studies where agents formed a vocabulary for predefined concepts of agents and spatial relations (Steels, 1995), and where a shared spatial language emerged to describe directions, distances, and object names (Bodik & Takac, 2003). Studies to date that have demonstrated grounding in a spatial domain have used location concepts that were unambiguous and known by all agents, and an absolute direction system, where all agents know the reference direction. The challenge for this project is to combine grounding and generative languages by forming a generative language in embodied agents. As spatial locations cannot be directly perceived, the representations must abstract from direct sensory inputs to allow knowledge about locations relative to other locations in the world. RatSLAM (Milford, Schulz, Prasser, Wyeth, & Wiles, 2007) is a robotic platform that meets these requirements. The objective is for two or more agents, each with unique representations of the world based on their own experiences, to learn to communicate with each other, and to be able to direct each other to locations. Language games can be played to form concepts from these representations through interactions with the world and other agents. The overall goal of the project is to explore issues in the relationship between language, concepts, and grounding in autonomous agents with respect to spatial locations. The specific aims are to show that autonomous agents can form toponymic concepts and vocabulary, that both concepts and labels can be
269
formed indirectly through a generative process, and can be learned and used by successive generations. Three studies were designed to investigate the formation, generative power, and evolution of toponyms. In the first study, autonomous agents (simulated robots) played a toponymic language game. In the second study, the toponymic language game was extended to include a generative task. The third study investigated the evolution of the language over generations.
2. S
~ 1. ~~ o r~~ a y~of i oToponyms n
The basic spatial concepts of areas in space require an understanding of locations. For the first study, we designed a spatial naming game to investigate the formation of toponyms and scaling effects in a simulation world (see Figure la,b) with two agents. In toponymic language games, agents interact whenever they are within hearing distance of each other. The speaker agent chooses the best word for its current location, and the hearer agent updates its lexicon. In the RatSLAM system, each robot learns a unique representation of the world as a topological map of experiences, constructed during an exploration phase (see Figure 1). An experience map is an approximate x-y representation of the world that each robot constructs from its visual information and odometry. At any point in time one experience in the map is active, encoding the robot's best estimate of its position (for more information, see Milford et al., 2007).
Figure 1 a) Simulation robot view, h) World map, and c) Experience map. The world is an open plan office. In the map, the black hexagons are desks, and the path of the robot is shown. In the experience map, each dot shows the location of an experience in the robot's internal map.
A lexicon table stores the associations between the experiences of the robot and distinct words. The association between an experience and a word is strengthened when they are used together. For each location the word with the highest information value is chosen. The information value, Zwprfor the word, w, in location, p , is the relative information of the word within a neighbourhood of size D cornpared to the total usage of the word, calculated as follows:
270 where N is the number of experiences within D of the location, p ; A,,,,, is the association between the word, w , and an experience, n; d,lp is the distance between an experience, n, and the location, p ; and M is the total number of experiences in the robot’s experience map. In each interaction, words are invented with probability, p , as follows:
p = e -l/(l-S)T where S is the success of the interaction, equal to the information value of the location-word combination, and T is the temperature, which sets the level of success accepted by an agent. Using a word invention rate corresponding to the success of the interaction allows agents to use words where they provide significant information about the current location, and to invent words otherwise. Varying the temperature alters the rate of word invention, where a higher temperature increases the probability of inventing a new word. Our study used simulated agents rather than robots with a hearing distance of 3m, and a neighbourhood size, D, of 5m. Within a trial, the temperature for word invention was set at a fixed temperature, T, and agents evolved a set of words. Three conditions were tested, based on low, medium, and high temperatures, with each condition run for 2000 interactions. In all three conditions, the agents developed a shared set of toponyms (see Figure 2), showing that toponyms can be formed at different levels of scale by using different rates of word invention. Each location is referred to by a toponym in its vocabulary, interpreted as the most informative point on the experience map. A higher temperature resulted in a more specific toponymic language. The study demonstrated how toponyms could be formed for all places in the world visited by both agents, by playing toponymic language games when within hearing distance.
Figure 2 Toponym meanings shown as toponym usage templates. Each set (a-c) shows four of the words for one agent from a trial. Each cell shows the locations in the experience map of the agent where the word is used. (a) The lowest temperature, T=0.25, resulted in the smallest number of words, with four of the five words covering large areas; (b) The medium temperature, T=0.5, resulted in 18 words, with 11 covering large or medium areas; (c) the highest temperature, T=0.75, resulted in the greatest number of words, 28, with 21 covering small areas.
3. Study 2. Generative Power of Toponyms
To go beyond simple concepts requires a generative process. In the second study relations are formed between toponyms, and used to generate concepts and
271
labels for places that cannot or have not been visited by the agents. A key challenge for embodied language games is to take into account the different perspectives of the agents. The generative toponymic language game, adapted from previous language games (Bodik & Takac, 2003; Steels, 199S), is based on naming three locations: Both agents are located within hearing distance at the first (current) location, they are facing the second (orientation) location, hence aligning their perspectives, and then they talk about a third (target) location (see Figure 3a). Given the three locations, agents can describe the target location with spatial words of distance and direction. For computational tractability, the second study used a simple grid world (see Figure 3b,c). Each agent’s experience map is simulated by a corresponding grid of experiences, with each location in the grid equivalent to an experience used in Study 1. Orientation
a)
b)
Figure 3 a) The elements involved in a generative language game: The agent is at Current facing Orientation and talking about Target; toponyms are selected for the current, orientation, and target locations, and spatial words are selected for the direction, 8, and distance. d. b) Empty grid world map of size 15x15 c) Grid world map of size 15x15 with desks similar to the world of Study 1.
Each toponym has a corresponding template, which is calculated from the association between the toponym and all nodes in the experience map. The experience with the strongest association has a value of 1.0. The success of the toponym for an interaction is the value of the toponym template for the experience being used by the agent for the interaction. Toponyms are selected and invented as in Study 1, with the neighbourhood for calculating information being the four nearest neighbour locations. The probability of inventing new words is calculated as in Study 1. The direction and distance lexicon tables of the agents are vectors of SO values that words are associated with, corresponding to a range of directions and distances. Each combination of the spatial words of distance and direction words has a corresponding template which is calculated from the associations between the spatial words and the vectors of values. The spatial words forming the template that best matches the target toponym template are selected by the speaker. The success of the generative interaction is calculated by comparing the templates for the target toponym and the spatial words. The probability of
272 inventing spatial words is calculated as for the toponyms using the success of the generative interaction. Every time the agents interact, the lexicon tables of the hearer are updated. The speaker’s lexicon is updated when a new word is invented. The templates of the target location and the spatial words are used to update the lexicon tables for the target toponym and spatial words, increasing the lexicon associations across the experiences and vectors of values. In this study, two conditions were tested based on the empty world and the world with desks. The hearing distance for the agents was the four nearest neighbour locations. The temperature, T, was 0.25, which allowed a level of specificity for toponyms of 5-10 experiences. The study consisted of five trials of 10,000 interactions for each condition. In both the empty world and the world with desks, the rate of word invention was highest for the first 100 interactions, and agents continued to invent words throughout each trial. The toponyms invented and used by the agents in the empty world were all specific, and some of the toponyms used by agents in the world with desks were general (see Figure 4). The average final lexicon in the empty world had 27.8 toponyms, and in the world with desks had 3 1.4 toponyms. There were more toponyms in the world with desks because they include the general toponyms, which cover similar areas.
Figure 4 Toponym templates. Non-white regions show that the word is one of the top five words providing information about a location, with black indicating that the word will be used at a location. Each set (a-b) shows templates for 10 of the words for one agent from a trial. a) In the empty world all templates were specific; b) In the world with desks, most templates were specific, but some were general, formed by referring to a location through the generative process.
4. Study 3. Evolution of Toponym Languages are not just created within a single agent’s lifetime. They evolve and are refined over generations of agents. The third study investigated the evolution of a generative toponymic language. The words, concepts, selection of words, comprehension, and measures of success were the same as in the Study 2. The world was a 15 by 15 grid with desks (see Figure 3c). Generations consisted of a set number of interactions, g. In the initial population two agents play negotiation games. In subsequent
273
-)
generations, the older agent was replaced by a new agent, initially as a hearer. After g/2 interactions, the new agent could interact as a speaker or a hearer. In this study, two conditions were tested based on g = 1000, and g = 2000, each consisting of five trials of 20,000 interactions. The first generation for each trial formed their language through negotiation, in which the success of the toponymic and generative games increased as the languages were formed (see Figure 5e). Over generations, specific toponyms tended to remain stable, as did the concepts for directions and distances while the more general toponyms shifted to become more specific (see Figure 5a-d). The results presented are for the first ten generations of the condition where g = 1000. Similar results were obtained for the remainder of the generations and for the condition where g = 2000.
el
Meractions
Figure 5 Language games over generations. (a-d) Toponym templates over generations. Each row shows how a toponym is used throughout the trial, with each cell being the toponym’s template for the agent leaving the population at that generation. Each row (a-d) is an example of different types of toponyms: a) shows a specific toponym that does not alter much throughout the generations; b) shows a toponym that initially refers to multiple specific locations, but only refers to one of these after several generations; c) shows a specific word that becomes more general; d) shows a general word that became more specific. e) Success of language games over generations. The success of a toponym language game is the information value of the word used for the current location. The success of a generative language game is how well the toponym template matches the spatial words template for the words used. The peak average success was just over 0.6 for the generative language game, and just over 0.7 for the toponym language game. As a new agent entered the population, they began by learning from the older agent, which caused a drop in success that quickly returned to a high level as the new agents learned the language.
5. General Discussion and Conclusion The studies in this paper have shown how a generative toponymic language may form and evolve in a population of agents. Agents were able to form concepts for locations, directions, and distances as they interacted with each other and associated words with underlying values. Relations between existing concepts
274
were used to expand the concept space to new locations. Evolution allowed the general toponyms referring to new locations to become more specific. The key contribution of the research is the demonstration of grounding for both experienced and novel concepts using a generative process, applied to spatial locations. We have shown that generative grounding can be achieved with an appropriate representation of the concept space (in this case, an approximate x-y representation of the world), a way to form and label intrinsic concepts (in this case, toponyms), and a generative process that creates both the concepts and the labels. We are currently extending this study into the simulation world, and investigating other concepts, including verbs describing the robot's motion through the world. Acknowledgements
RS and PS were supported by Australian Postgraduate Awards. This research is funded in part by a grant from the Australian Research Council. References Bodik, P.. & Takac, M. (2003). Formation of a common spatial lexicon and its change in a community of moving agents. In B. Tessem. P. Ala-Siuru, P. Doherty & B. Mayoh (Eds.), Eighth Scandinavian Conference on AI. Amsterdam: 10s Press. Harnad, S. (1990). The symbol grounding problem. Physica D: Nonlinear Phenomena, 42, 335-346. Logan, G. D., & Sadler, D. D. (1996). A computational analysis of the apprehension of spatial relations. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space. Cambridge, Massachusetts: The MIT Press. Milford, M., Schulz, R., Prasser, D., Wyeth, G.. & Wiles, J. (2007). Learning spatial concepts from RatSLAM representations. Robotics and Autonomous Systems From Sensors to Human Spatial Concepts, 55(5),403-410. O'Keefe, J. (1996). The spatial prepositions in English, vector grammar, and the cognitive map theory. In P. Bloom, M. A. Peterson, L. Nadel & M. F. Garrett (Eds.), Language and Space. Cambridge, Massachusetts: The MIT Press. O'Keefe, J., & Nadel, L. (1978). The hippocampus as a cognitive map. New York: Oxford University Press. Steels. L. (1995). A self-organizing spatial vocabulary. Artificial Life, 2(3), 3 19-332. Zwarts. J. (1997). Vectors as relative positions: a compositional semantics of modified PPs. Journal of Semantics, 1 4 , 5 7 8 6 .
ON THE CORRECT APPLICATION OF ANIMAL SIGNALLING THEORY TO HUMAN COMMUNICATION THOMAS C. SCOTT-PHILLIPS Language Evolution and Computation Research Unit.University of Edinburgh [email protected] The defining problem of animal signalling theory is how reliable communication systems remain stable. The problem comes into sharp focus when signals take an arbitrary form, as human words do. Many researchers, including many in evolutionaty linguistics, assume that the Handicap Principle is the only recognised solution to this paradox, and hence conclude that the process that underpins reliability in humans must be exceptional. However, this assumption is false: there are many examples of cheap yet reliable signals in nature, and corresponding evolutionary processes that might explain such examples have been identified. This paper briefly reviews the various processes that ay stabilise communication and hence suggests a three-way classification: signals may be kept honest either by (i) being an index, where meaning is tied to form; (ii) handicaps, in which costs are paid by the honest; or (iii) deterrents, in which costs are paid by the dishonest. Of these, the latter seems by far the most likely: humans are able to assess individu,al reputation, and hence hold the threat of social exclusion against those who signal unreliably.
1. The Problem of Reliability The ethological question of what keeps signals reliable in the face of the evolutionary pressure to do otherwise is generally regarded as the defining problem in animal communication theory (Maynard Smith & Harper, 2003). It is typically cast in the following terms. I f one can gain through the use of an unreliable signal then we should expect natural selection to favour such behaviour. Consequently, signals will cease to be of value, since receivers have no guarantee of their reliability. This will, in turn, produce listeners who do not attend to signals, and the system will thus collapse in an evolutionary retelling of Aesop’s fable of the boy who cried wolf. What processes keep communication systems stable, and which might apply to human communication? This problem has, somewhat surprisingly, received only limited attention from language evolution researchers, and too often only the most well-known solution - the Handicap Principle (Grafen, 1990; Zahavi, 1975) - or its variants (e.g. Zahavi &
275
276
Zahavi, 1997) have been considered. However, contrary to a popular belief both within and outwith evolutionary linguistics, several alternatives to the Handicap Principle are recognised by animal signalling theorists (Maynard Smith & Harper, 2003); there are a number of other well-recognised processes by which signals may be arbitrary yet cheap. This paper’s purpose is therefore to briefly consider these alternatives and hence show that we can explain the stability of human communication systems within a traditional behavioural ecology framework and without recourse to post-hoc evolutionary stories. A brief terminological aside is merited at the outset. In its everyday use, honesty makes reference to the relationship between a proposition and its truth value. Although this is roughly the meaning used in animal signalling theory, an obvious but very important caveat is required; namely, that the term honesty is necessarily metaphorical. That is, no assumption is made that an animal has ‘meanings’ that are either true or false. The term is instead used simply as a convenient shorthand to describe animal communicative behaviour. We assign an ‘intended’ ‘meaning’ to the behaviour and this allows us to subject it to evolutionary analysis, but this does not at all suppose that the animal necessarily has ‘intentions’ or ‘meanings’ in any psychologically real sense. Such shorthand is mostly harmless in the case of animal behaviour (Dennett, 1995; Grafen, 1999), but risks confusion when applied to humans. For that reason, I suggest that the term reliability be preferred, and I use it hereafter. 2. The Handicap Principle
The logic of the Handicap Principle is that costs are paid by the signaller as a guarantee of their honesty (Zahavi, 1975). The paradigmatic example is the peacock’s tail. Bigger tails leave the peacock less dexterous and less agile, and hence appear to be evolutionarily costly. However, peahens choose to mate with the peacocks with the biggest tails. Why? Because only those peacocks who are of very high quality can afford the cost - the ‘handicap’ - of big tails. A distinction should be drawn between eflicacy costs and strategic costs (Maynard Smith & Harper, 1995). Efficacy costs are costs that are necessary for the physical production of the signal. These may be minimal but they are never entirely cost-free; if nothing else there is the opportunity cost of the time spent in production. Strategic costs, on the other hand, are those additional costs that the Handicap Principle imposes on an organism as a guarantee of reliability.
277
3. Alternatives to the Handicap Principle
Although undeniably important, the Handicap Principle cannot explain all instances of animal signalling: there are many signalling systems that impose no strategic costs on signallers. Many male passerines, for example sparrows, typically display dominance badges on their plumage; the larger the badge, the greater the bird’s Resource Holding Potential (an index of all factors that influence fighting ability (Parker, 1974)). However, there appears to be no cost associated with the badge, and no obvious barrier to falsification (Rohwer, 1975; Whitfield, 1987). What alternatives to the Handicap Principle might explain this and other examples? Broadly speaking, four possibilities have been identified by animal signalling theorists. 3.1. Indices
An index is a signal in which meaning is fundamentally tied to form, thus preventing even the possibility of unreliability. The classic example is the roar of Red Deer, in which formant dispersion is reliably (negatively) correlated with the deer’s size (Reby & McComb, 2003). 3.2. Coordination games
In a coordination game each party has a different preference for the outcome of the interaction, but some overriding common interest is shared (Maynard Smith, 1944). An example is the female fruit fly, which mates only once in its lifetime. If a male attempts to court her after this mating she will display her ovipositor towards him, at which point the male immediately ceases courtship (Maynard Smith, 1956). And so although both parties may have conflicting interests (over their desire to mate with one another) both share an overriding common interest: not to waste time if the female has already mated. 3.3. Repeated interactions
If individuals meet each other repeatedly over time it may be in both parties’ longer-term interests to communicate reliably rather than take whatever shortterm payoff may be available through dishonesty (Silk, Kaldor, & Boyd, 2000). This is the essential logic behind reciprocal altruism. Depending upon the specifics of the relationship, the most optimal strategy may be generally honest with occasional deception (Axelrod, 1995; Axelrod & Hamilton, 1981).
278 3.4. Punishment of false signals
If dishonesty is punished then that will obviously reduce or nullify any possible benefit of unreliability (Clutton-Brock & Parker, 1995). Many examples exist; one is the interaction between chicks of the blue-footed booby, in which older chicks will aggressively peck and jostle any younger chicks that signal any attempt to challenge them (Drummond & Osorno, 1992). This does of course raise the second-order problem of why punishing behaviour will evolve if it is itself costly.
4. Three Routes to Stability Although these processes are often treated as distinct in the animal communication literature, the last three share a common framework: all describe scenarios in which unreliable signals incur costs. With regard to coordination games, this will prevent the shared interest from overriding other considerations: the female fruit fly would not display her ovipositor and hence the male would continue to court her, which is a waste of his time and a distraction for her. In repeated interactions unreliability would result in non-cooperation in the future. This would remove the expected future benefits of the relationship; or, put another way, would incur costs relative to the expected payoff over time. Finally, the imposition of costs as a consequence of unreliability is precisely what punishment is. In general, then, all of these processes describe deterrents. We may hence define a three-way classification of the different ways in which signals are kept reliable: Indices, in which meaning is causally related to form Handicaps, in which costs are incurred by reliable signallers Deterrents, in which costs arc incurred by unreliable signallers
5. Reputation as Deterrent Which of the above most likely applied to human communication, and especially language? Indices are clearly not appropriate: linguistic symbols words - are famously unrelated to form. Some scholars have suggest ways in which the Handicap Principle might apply to human language. For example, handicaps have been used to explore politeness phenomena (van Rooij, 2003), but even if this is correct, it is only concerned with one (rather small) aspect of language. Another suggestion is that ritualised performance acts as a costly signal of commitment to the group, and thus helps to build trust and ultimately ensure reliable communication (Knight, 1998; Power, 2000). But this is a
279
hypothesis about the reliability of the ritualised behaviour, not about words themselves. In general, it is hard to argue that there are any strategic costs associated with utterance production, a point recognised by the inventor of the Handicap Principle: “Language does not contain any component that ensures reliability. It is easy to lie with words” (Zahavi & Zahavi, 1997, p.223). That leaves us with deterrents. The idea of a deterrent has been formalised in a paper (Lachmann, Szamad6, & Bergstrom, 2001) that, given that it explicitly addresses human language as an application of its ideas, has received bafflingly little attention from evolutionary linguists. It has not, for example, received a single citation in any of the collections of work that have arisen from the Evolang conferences that have taken place since the article’s publication (Cangelosi, Smith, & Smith, 2006; Tallerman, 2005; Wray, 2002). The basic logic is that although it is cheap and easy to deceive, there are costs to be paid for doing so. In game-theoretic terms, the costs are paid away from the equilibrium; they are paid by those who deviate from the evolutionarily stable strategy (ESS). This contrasts with costly signalling, in which the costs are paid as part ofthe ESS. (See also Gintis, Alden Smith, & Bowles, 2001, who show that signalling can be a Nash equilibrium if unreliability is costly.) Under what circumstances will this logic of deterrents be preferred over the logic of handicaps? Sufficient conditions for cost-free signalling in which reliability is ensured through deterrents are that signals be verified with relative ease (if they are not verifiable then individuals will not know who is and who is not worthy of future attention) and that costs be incurred when unreliable signalling is revealed. These conditions are fulfilled in the human case: individuals are able to remember the past behaviour of others in sufficient detail to make informed judgements about whether or not to engage in future interactions; and refusal to engage in such interactions produces costs for the excluded individual. At the extreme, social isolation is a very undesirable outcome for a species like humans, in which interactions with others are crucial for our day-to-day survival. This is not, of course, punishment in the conventional sense, but the functional logic is the same: individuals who do not conform will incur prohibitive costs, in this case social exclusion. Moreover, this process would snowball once off the ground, as individuals would be able to exchange information - gossip - about whether others were reliable communication partners (Enquist & Leimar, 1993); and that exchange would itself be kept reliable by the very same mechanisms. Importantly, the imposition of these costs - the refusal to engage with unreliable individuals - is not costly, and hence the second-order problem does
280
not arise. Indeed, such refusal is the most adaptive response if there is good reason to believe that the individual will be unreliable. It should be explicitly noted that this process allows signals to take an arbitrary form (Lachmann, Szamado, & Bergstrom, 2001). The fact that utterances are cheap yet arbitrary is too often taken to be paradoxical: “resistance to deception has always selected against conventional [arbitrary TSP] signals -with the one puzzling exception of humans” (Knight, 1998, p.72, italics added). This is, as the passerine example and the analysis above both show, simply not true. Instead, once we remove the requirement that costs be causally associated with signal form, as we do if we place the onus of payment on the dishonest individual, then the signal is free to take whatever form the signaller wishes. This paves the way for an explosion of symbol use. 6 . Concluding Remarks
This necessarily brief survey suggests that there is a single most likely explanation for the stability of human communication: that individuals are deterred from the production of unreliable signals because of the social consequences of doing so. This explanation places a heavy load on the mechanism of reputation, a conclusion that chimes nicely with the emerging consensus from the literature on the evolution of cooperation that reputation is crucial to the stability of human sociality (e.g. Fehr, 2004; Milinski, Semmann, & Krambeck, 2002). More generally, we should recognise that this process allows us to explain the stability of human communication with the existing tools of animal signalling theory. Evolutionary linguistics has too often resorted to intellectual white flags: the willing abandonment of traditional Darwinian thinking when faced with the heady puissance of natural language. A chronic example of this trend is the suggestion that a capacity for grammar could only have come about via some macro-mutational event (Bickerton, 1990)’. The assumption that cheap yet arbitrary signals can only be stabilised by the Handicap Principle is not of the same magnitude, but it is the same type of error. A more learned survey of the animal signalling literature offers a number of alternatives, one of which fits tightly with our intuitive ideas of how social contracts work. Future research should therefore focus on the empirical testing of such ideas rather than the generation of additional post-hoc hypotheses in which language is treated as a special case.
’ To his credit, Bickerton has since (2003) recognised the implausibility of this suggestion.
281
Acknowledgement
TSP is funded by a grant from the Arts and Humanities Council of Great Britain. References
Axelrod, R. (1995). The evolution of cooperation. New York: Basic Books. Axelrod, R., & Hamilton, W. D. (1981). The evolution of cooperation. Science, 211, 1390-1396. Bickerton, D. (1990). Language and species. Chicago: University of Chicago Press. Bickerton, D. (2003). Symbol and structure. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 77-93). Oxford: Oxford University Press. Cangelosi, A., Smith, K., & Smith, A . D. M. (Eds.). (2006). The evolution of language. Singapore: World scientific publishing company. Clutton-Brock, T. H., & Parker, G. A. (1995). Punishment in animal societies. Nature, 373,209-216. Dennett, D. C. (1995). Darwin's dangerous idea. London: Penguin. Drummond, H., & Osorno, J. L. (1992). Training siblings to be submissive losers: dominance between booby nestlings. Animal behaviour, 44, 88 1-893. Enquist, M., & Leimar, 0. (1993). The evolution of cooperation in mobile organisms. Animal Behaviour, 45(4), 747-757. Fehr, E. (2004). Don't lose your reputation. Nature, 432,449-450. Gintis, H., Alden Smith, E., & Bowles, S. (2001). Costly signaling and cooperation. Journal of theoretical biology, 213, 103-1 19. Grafen, A. (1 990). Biological signals as handicaps. Journal of theoretical biology, 144, 517-546. Grafen, A. (1 999). Formal Darwinism, the individual-as-maximizing-agent analogy and bet-hedging. Proceedings of the Royal Society of London, series B, 266,799-803. Knight, C. (1998). Ritualhpeech coevolution: a solution to the problem of deception. In J. R. Hurford, M. Studdert-Kennedy & C. Knight (Eds.), Approaches to the evolution of language (pp. 68-91). Cambridge: Cambridge University Press. Lachmann, M., Szamad6, S., & Bergstrom, C. T. (2001). Cost and conflict in animal signals and human language. Proceedings of the National Academy ofsciences, 98(23), 13189-13194. Maynard Smith, J. (1956). Fertility, mating behaviour and sexual selection in Drosophila subobscura. Journal of genetics, 54,261 -279. Maynard Smith, J. (1994). Must reliable signals always be costly? Animal behaviour, 47, 11 15-1 120.
282
Maynard Smith, J., & Harper, D. G. C. (1995). Animal signals: Models and terminology. Journal of theoretical biology, 177,305-3 1 1. Maynard Smith, J., & Harper, D. G. C. (2003). Animal signals. Oxford: Oxford University Press. Milinski, M., Semmann, D., & Krambeck, H.-J. (2002). Reputation helps solve the 'tragedy of the commons'. Nature, 415,424-426. Parker, G. A. (1974). Assessment strategy and the evolution of animal conflicts. Journal of theoretical biology, 47,223-243. Power, C. (2000). Secret language use at female initiation. In C. Knight, M. Studdert-Kennedy & J. R. Hurford (Eds.), The evolutionary emergence of language (pp. 81-98). Cambridge: Cambridge University Press. Reby, D., & McComb, K. (2003). Anatomical constraints generate honesty: Acoustic cues to age and weight in the roars of Red Deer stags. Animal behaviour, 65, 3 17-329. Rohwer, S. (1 975). The social significance of avian winter plumage variability. Evolution, 29, 593-610. Silk, J. B., Kaldor, E., & Boyd, R. (2000). Cheap talk when interests conflict. Animal behaviour, 59,423-432. Tallerman, M. (Ed.). (2005). Language origins: Perspectives on evolution. Oxford: Oxford University Press. van Rooij, R. (2003). Being polite is a handicap: Towards a game theoretical analysis of polite linguistic behaviour. Paper presented at the 9th conference on the theoretical aspects of rationality and knowledge. Whitfield, D. P. (1987). Plumage variability, status signalling and individual recognition in avian flocks. Trends in ecology and evolution, 2 , 13-18. Wray, A. (Ed.). (2002). The transition to language. Oxford: Oxford University Press. Zahavi, A. (1975). Mate selection: A selection for a handicap. Journal of theoretical biology, 53,205-214. Zahavi, A., & Zahavi, A. (1997). The handicap principle: A missing piece of Darwin's puzzle. Oxford: Oxford University Press.
NATURAL SELECTION FOR COMMUNICATION FAVOURS THE CULTURAL EVOLUTION OF LINGUISTIC STRUCTURE
KENNY SMITH Division of Psychology, Northumbria Universiv, Northumberland Road, Newcastle-upon-Tyne, NEI 8ST. UK [email protected]
SIMON KIRBY Language Evolution and Computation Research Unit, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL, U K There are two possible sources of structure in language: biological evolution of the language faculty, or cultural evolution of language itself. Two recent models (Griffiths & Kalish, 2005; Kirby, Dowrnan, & Griffiths, 2007) make alternative claims about the relationship between innate bias and linguistic structure: either linguistic structure is largely determined by cultural factors (Kirby et al., 2007), with strength of innate bias being relatively unimportant, or the nature and strength of innate machinery is key (Griffiths & Kalish, 2005). These two competing possibilities rest on different assumptions about the learning process. We extend these models here to include a treatment of biological evolution, and show that natural selection for communication favours those conditions where the structure of language is primarily determined by cultural transmission.
1. Introduction Language is a consequence of two systems of transmission: biological and cultural. The human capacity for language uncontroversially has some grounding in specifically human biology - no other species uses a similar system in the wild. Language is also, again uncontroversially, socially learned - we learn the language of our speech community. To what extent is the detailed structure of language determined by biology or culture, and how have cultural and biological evolution acted to shape language? The position here is less clear. The standard account attributes the structure of language to the biological evolution of an innate language faculty (Pinker & Bloom, 1990). An alternative account, grounded in the computational modelling of cultural transmission, allows a significant role for cultural evolution (e.g. Kirby & Hurford, 2002; Kirby, Smith, & Brighton, 2004): under this account, the structure of language is explained primarily as a consequence of the adaptation of language to the cultural transmission medium (e.g. partial, noisy, or frequency-skewed data: 283
284
Kirby, 2001). Two recent studies have sought to explicitly address the link between language structure, biological predispositions, and constraints on cultural transmission (Griffiths & Kalish, 2005; Kirby et al., 2007). Both assume that learners apply the principles of Bayesian inference to language acquisition: a learner’s confidence that a particular grammar h accounts for the linguistic data d that they have encountered is given by
and allows a contribution both from a prior (presumably innate) belief in each grammar, P ( h ) ,and the probability that that grammar could have generated the observed data, P(d1h). Based on the posterior probability of the various grammars, P(hld),the learner then selects a grammar and produces utterances which will form the basis, through social learning, of language acquisition in others. Within this framework, Griffiths and Kalish (2005) show that cultural transmission factors (such as noise or the transmission bottleneck imposed by partial data) have no effect on the distribution of languages delivered by cultural evolution: the outcome of cultural evolution is solely determined by the prior biases of learners, given by P(h).” Kirby et al. (2007) demonstrate that this result is a consequence of the assumption that learners select a grammar with probability proportional to P(h(d)- if learners instead select the grammar which maximises P(hld),then cultural transmission factors play an important role in determining the distribution of languages delivered by cultural evolution: for example, different transmission bottlenecks lead to different distributions. Furthermore, for maximising learners, the strength of the prior bias of learners is irrelevant over a wide range of the parameter space.b These models suggest two candidate components of the innate language faculty: firstly, the prior bias, P ( h ) ,and secondly, the strategy for selecting a grammar based on P(hld) - sampling proportional to P(hld),or selecting the grammar which maximises P(h1d).We can therefore straightforwardly extend models of this sort to ask how we might expect the evolution of the language faculty to unfold: does biological evolution favour sampling or maximising learners, strong or weak priors? Specifically, we are interested in asking which selection strategies and priors are evolutionarily stable (Maynard Smith & Price, 1973; Smith, 2004): which strategies and priors are such that a population adopting that strategy or prior will ’Griffiths and Kalish (2005) point out that the prior need not necessarily take the form of a language spec$c innate bias in the traditional sense. bFor a treatment of both sampling and maximising learners, see Griffiths and Kalish (2007), who provide similar results to those of Gnffiths and Kalish (2005) and Kirby et al. (2007).
285
not be invaded by some other strategy or prior under the influence of natural selection? This breaks down into two sub-questions: (1) what language will a population consisting entirely of individuals with a particular strategy and prior have?; (2) what level of communicative accuracy will some individual inserted into such a population have? The first question is answered by the work of Griffiths and Kalish (2005) and Kirby et al. (2007), which shows the relationship between prior, selection strategy, cultural transmission factors and distribution of languages in a population. Answering the second requires some additional machinery, described in Section 3 .
2. The model of learning and cultural transmission We adopt Kirby et al.'s (2007) model of language and language learning. A language consists of a system for expressing m meanings, where each meaning can be expressed using one of k means of expression, called classes (e.g., meanings might be verbs, signal classes might be alternative inflectional paradigms for those verbs). We will assume two types of prior bias. For unbiased learners, all grammars have the same prior probability: P ( h ) = l / k m . Biased learners have a preference for languages which use a consistent means of expression, such that each meaning is expressed using the same class. Following Kirby et al. (2007), this prior is given by the expression
where r ( x ) = (x - l)!,nj is the number of meanings expressed using class j and a determines the strength of the preference for consistency: low a gives a strong preference for consistent languages, higher a leads to a weaker preference for such languages. The probability of a particular data set d (consisting of b meaning-form pairs) being produced by an individual with grammar h is:
where all meanings are equiprobable, x is a meaning, y is the signal class associated with that meaning in the data, and P(ylx, h) gives the probability of y being produced to convey x given grammar h and noise E :
1- E
lcCl
if y is the class corresponding to TC in h otherwise
Bayes' rule can then be applied to give a posterior distribution over hypotheses given a particular set of utterances. This posterior distributions is used by a learner
286 to select a grammar, according to one of two strategies. Sampling learners simply
select a grammar proportional to its posterior probability: P ~ ( h l d = ) P(h1d). Maximising learners select the grammar with the highest posterior probability: PL(hld)
=
{
1 if P(hld) > P(h’1d) for all h’ 0 otherwise
#h
A model of cultural transmission follows straightforwardly from this model of learning: the probability of a learner at generation n arriving at grammar h, given exposure to data produced by grammar h,-l is simply
P ( h , = iIh,-l
=j
)=
c
PL(h,
= ild)P(dlh,-l
=j
)
d
The matrix of all such transition probabilities is known as the Q matrix (Nowak, Komarova, & Niyogi, 2001): entry Q i j gives the transition probability from grammarj to grammar i. As discussed in Griffiths and Kalish (2005)and Kirby et al. (2007), the stable outcome of cultural evolution (the stationary distribution of languages) can be calculated given this Q matrix, and is proportional to its first eigenvector. We will denote the probability of grammar i in the stationary distribution as Qf. Table 1 gives some example prior probabilities and stationary distributions, for various strengths of prior and both selection strategies.c As shown in Table 1, strength of prior determines the outcome of cultural evolution for sampling learners, but is unimportant for maximising learners as long as some bias exists. Table I . P ( h )for three grammars given various types of bias (unbiased, weak bias [a = 401, strong bias [a = 11, denoted by u, bw and bs respectively), and the frequency of those grammars in the stationary distribution for sampling and maximising learners. Grammars are given as strings of characters, with the first character giving the class used to express the first meaning and so on. h
aua aah abc
U
0.0370 0.0370 0.0370
P(h) bw 0.0389 0.0370 0.0361
bs 0.1 0.0333 0.0167
Q’, sampler bw bs 0.1 0.0370 0.0389 0.0370 0.0370 0.0333 0.0370 0.0361 0.0167 U
Q’. maximiser U
0.0370 0.0370 0.0370
bw 0.2499 0.0135 0.0014
bs 0.2499 0.0135 0.0014
3. Evaluating evolutionary stability In order to calculate which selection strategies and priors are evolutionarily stable we need to define a measure which determines reproductive success. We make the following assumptions: (1) a population consists of several subpopulations; ‘All results here are f o r m = 3, k = 3, b = 3, E = 0.1. Qualitatively similar results are obtainable for a wide range of the parameter space.
287
(2) each subpopulation has converged on a single grammar through social learn-
ing, with the probability of each grammar being used by a subpopulation given by that grammar’s probability in the stationary distribution; (3) natural selection favours learners who arrive at the same grammar as their peers in a particular subpopulation, where peers are other learners exposed to the language of the subpopulation. Given these assumptions, the communicative accuracy between two individuals A and B is given by:
h
h’
where the superscripts on Q indicates that learners A and B may have different selection strategies and priors. The relative communicative accuracy of a single learner A with respect to a large and homogeneous population of individuals of type B is therefore given by rca(A,B ) = cu(A,B ) / c a ( B ,B ) . Where this quantity is greater than 1 the combination of selection strategy and prior (the learning behaviour) of individual A offers some reproductive advantage relative to the population learning behaviour, and may (through natural selection acting on genetic transmission) come to dominate the population. Where relative communicative accuracy is less than 1 learning behaviour A will tend to be selected against, and whcn relative communicative accuracy is 1 both learning behaviours are equivalent and genetic drift will ensue. Following Maynard Smith and Price (1973), the conditions for evolutionary stability for a behaviour of interest, I , are therefore: (1) rca(J,I ) < 1 for all J # I ; or ( 2 ) rcu(J,I ) = 1 for some J # I , but in each such case r c a ( I ,J ) > 1. The second condition covers situations where the minority behaviour J can increase by drift to the point where encounters between type J individuals become common, at which point type I individuals are positively selected for and the dominance of behaviour I is re-established. Table 2. Relative communicative accuracy of each strategy played off against all alternatives. s denotes sampling, m maximising, bias types are as for Table I . Cases in which the minority learning behaviour can potentially invade the population via drift are boxed. Cases where the minority learning behaviour will be positively selected for are boxed and shaded. Values are given to two decimal places unless rounding would obscure a selection gradient. (s.bw)
0 9997 0 99
Majority behaviour (mJ4 (s.bs)
0 81 0 82
-
0.88 0 88 0 86
(m,bw)
(m,bs)
0.38 0.38 0.60
0 38 0 38 0 60
288
Table 2 gives the relative communicative accuracies of 6 learning behaviours when played against each other: two selection strategies and three types of prior bias. Several results are apparent. Firstly, none of the sampling behaviours are evolutionarily stable: all are prone to invasion by biased maximisers, and all but the strongly biased samplers are subject to invasion by unbiased maximisers. Secondly, abstracting away from strength of prior, maximising is an ESS: samplers entering a maximising population have low relative communicative accuracy. In other words, natural selection prefers maximisers, at least under the fitness function described above. Maximisers boost the probability that the most likely grammar will be learned, and are consequently more likely to arrive at the same grammar as some other learner exposed to the same data-generating source. Thirdly, strength of prior is relatively unimportant. In sampling populations (where the stationary distribution is determined by strength of prior), it is best to have the same strength of prior as the rest of the population (at least given the large difference between strong and weak priors used here). If your prior is stronger than the norm, you will be less likely to learn the less common languages from the stationary distribution, if it is weaker you will be more likely to misconverge on those minority languages, which are themselves less likely to occur due to the stronger bias of the population. The situation regarding the evolution of priors in maximising populations is slightly more complex. Strong and weak biases for maximisers turn out to be equivalent: for the parameter settings used here (and a wide range of other parameter settings) a = 1and a = 40 generate equivalent Q matrices (and hence equivalent stationary distributions, as shown by Kirby et al., 2007). Strong and weak biases in maximising populations are therefore equivalent in terms of communicative accuracy, and can invade each other by drift: they form an evolutionarily stable set (Thomas, 1985). In unbiased maximising populations, all levels of bias are interchangeable: all languages are equally probable, and the preference of biased learners for consistent languages is counterbalanced by their difficulty in acquiring the equally probable inconsistent languages. Unbiased maximising populations can therefore be invaded by drift by biased maximisers. However, unbiased maximisers cannot in turn invade biased maximising populations: in such populations, as can be seen in Table 1, the distribution of languages is skewed in favour of consistent languages, and it therefore pays to be biased to acquire these languages. Unbiased maximisation is therefore not an ESS, by condition 2 of the definition. If we assume that strong prior biases have some cost, there are conditions under which only weak bias would be evolutionarily stable. There will be some high value of a, which we will call a*,for which: (1) the prior is sufficiently weak that its costs relative to the unbiased strategy are low enough to allow the (m,a*) behaviour to invade (m,u) populations by drift; (2) the prior remains sufficiently strong that the (m,a*) population is resistant to invasion by (m,u), due to the
289
selection asymmetry discussed above. Under such a scenario, (m,a*) becomes the sole ESS: evolution will favour maximisation and the weakest possible (but not flat) prior. The actual value of a* will depend on the cost function used. For example, if we assume that higher values of a are associated with decreasing costs, but high a (say a = 100, which yields a Q matrix identical to that for a = 40 under the parameters used here) has a cost very close to that associated with a flat prior, then (m,a = 100) becomes the sole ESS: it benefits from both low costs and a skewed stationary distribution. While a more principled cost function is desirable, the insensitivity of the stationary distribution to a for maximising learners and the factorial in the expression for P ( h ) means we have been unable to explore sufficiently large values of a under more complex treatments of cost. 4. Discussion and conclusions
The main result from this analysis of evolutionary stability is that maximising is always preferred over sampling: combining this with the findings of Griffiths and Kalish (2005) and Kirby et al. (2007), we can conclude that evolution prefers precisely those circumstances in which strength of prior bias has least effect and cultural evolution (driven by transmission factors such as the bottleneck and utterance frequency) has the greatest scope to shape the linguistic system. The second result to highlight is that the strength of the prior is relatively unimportant from the perspective of biological evolution. In the (disfavoured) sampling strategies, it is best to have the same bias as the rest of the population. In maximising populations some bias is better than no bias, but strength of that bias is unimportant. Furthermore, if we assume that strong biases have some cost, then evolution will prefer the weakest bias possible. While this latter result runs counter to the phenomenon known as the Baldwin effect (see, e.g., Briscoe, 2000) whereby initially learned traits tend to become nativised, we note that this model is not designed to elicit the Baldwin effect - nativisation of a particular language is not allowed by our definition of prior bias, and the Baldwin effect requires that learning be costly, whereas in our model it is costless. The model described above deals with a limited range of learning behaviours. Strength of prior, given by a, is a continuous parameter and amenable to a more fine-grained analysis. Similarly, the dichotomy between sampling and maximising can be recast into a continuum by a means suggested in Kirby et al. (2007): if P ~ ( h l dis) proportional to [P(dlh)P(h)]', then a range of strategies lie between sampling (given by T = 1)and maximising (infinitely large r). Preliminary analysis of this much larger space yields results broadly similar to those presented here: higher values of T are preferred, and Q exhibits large-scale neutrality in populations with any maximising tendency (Smith & Kirby, in preparation). The general picture remains that natural selection for communication favours those conditions where cultural transmission factors plays a significant role in shaping language,
290
and strength of innate predispositions is relatively unimportant. Acknowledgements
Kenny Smith is funded by a British Academy Postdoctoral Research Fellowship. The initial stages of this research took place at the Masterclass on Language Evolution, organised by P. Vogt and B. de Boer and funded by NWO. References
Briscoe, E. J. (2000). Grammatical acquisition: Inductive bias and coevolution of language and the language acquisition device. Language, 76,245-296. Griffiths, T. L., & Kalish, M. L. (2005). A Bayesian view of language evolution by iterated learning. In B. G. Bara, L. Barsalou, & M. Bucciarelli (Eds.), Proceedings of the 27th annual conference of the cognitive science society (pp. 827-832). Mahwah, NJ: Erlbaum. Griffiths, T. L., & Kalish, M. L. (2007). Language evolution by iterated learning with Bayesian agents. Cognitive Science, 31,441430. Kirby, S. (2001). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5, 102-1 10. Kirby, S., Dowman, M., & Griffiths, T. L. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Science, 104,5241-5245. Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (pp. 121-147). Springer Verlag. Kirby, S., Smith, K., & Brighton, H. (2004). From UG to universals: linguistic adaptation through iterated learning. Studies in Language, 28,587-607. Maynard Smith, J., & Price, G. R. (1973). The logic of animal conflict. Nature, 146, 15-18. Nowak, M. A., Komarova, N. L., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114-1 17. Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13,707-784. Smith, K. (2004). The evolution of vocabulary. Journal of Theoretical Biology, 228,127-142. Smith, K., & Kirby, S. (in preparation). The evolution of language learning in Bayesian agents. Thomas, B. (1985). On evolutionarily stable sets. Journal of Mathematical Biology, 22, 105-1 15.
SYNTAX, A SYSTEM OF EFFICIENT GROWTH ALONA SOSCHEN Department of Linguistics and Philosophy, MIT, 77 Massachusetts Ave., Cambridge, MA, United States General physical laws are evident as universal syntactic principles governing a computational system of the human language. The optimal space tilling condition has to be satisfied in every system of efficient growth. This principle can be attested in syntax, exemplified as the Fibonacci (Fib)-patterns where each new term is the sum of the two that precede it. This rule accounts for the essential features of syntactic trees: limitations imposed on the number of arguments, and phase formation in derivations. The article provides a functional explanation of binary branching, labeling, and two types of Merge. It is shown that in contrast with other Fib-based systems of natural growth syntactic constituents are the instances of both discreteness and continuity.
1. Natural Law 1.1. Fibonacci Numbers
The Fibonacci Sequence (FS) is one of the most interesting mathematical curiosities that pervade the natural world. The Fib-numbers are evident in every living organism.' They appear e.g. in the arrangement of branches of trees and spiral shapes of seashells. Early approaches to FS in nature were purely descriptive with a focus on the geometry of patterns. Later, Douady and Couder (1992) developed a theory of plant growth (phyllotaxis), which explained the observed arrangements as following from space filling. This system is based on simple dynamics that impose constraints on the arrangement of elements to satisfy optimality conditions. In humans, the Fib-sequence appears in the geometry of DNA and physiology of the head and body. On a cellular level, the ' 13' (5+8) Fib-number present in the structure of microtubules (cytoskeletons and conveyer belts inside the cells) is useful in signal transmission and processing. The brain and nervous systems have the same type of cellular building units, so the response curve of the central nervous system may also
' The number of 'growing points' in plants corresponds to FS: X(n) =n-1) +X(n-2), 0, 1, 1, 2, 3, 5, 8, 13,... The limit ratio between the terms is .618034..., Golden Ratio GR. 291
have FS at its base. This suggests a strong possibility that a general physical law applies to the universal principles underlying the Faculty of Language. Then our task is to identify and explore the features that make this Faculty so unique. 1.2. Syntactic Trees
The idea that Fib-patterns may be playing a role in human language is first explicitly defended in Uriagereka’s ‘Rhyme and Reason’ (1998). Recently, Carnie et al. (2005) and Soschen (2006) confirmed that syntactic models exhibit certain mathematical properties. Tree structures are maximized in such a way that they result in a sequence of categories that corresponds to FS. The tree is generated by merging two elements; the next operation adds a new element to the already formed pair. Each item is merged only once; every subject/ specifier (Spec) and every object/complement (Comp) position is filled. In the traditional sense of Chomskyan X-bar theory, a label immediately dominated by the projection of another category is an XP(hrase). Other non-terminal nodes are annotated as X’, and Xs are ‘heads’. If XP(n) is the number of XPs in the nth level L, then XP(n) = Fib(n) (fig. 1). XPIX’X L 110 0 1 1/1 0
2
211 1
3
Figure 1
The optimality requirement explains why the trees are constructed out of binary units. If Merge M were allowed to optionally select e.g. three terms, then FS of maximal categories would disappear. The branching system of this kind shows a Fib-like sequence; however, the arrangement of elements displays a ratio different from GR.* The same principle of optimization provides an external motivation for M to distinguish between syntactic labels in a particular way. Determining whether a node is XP or X follows directly from the functional pressure of cyclic derivation: the Fib-based system includes sums of terms and single terms (XP! X). Thus, the assumption that syntactic structures have an intermediate X’ projection appears to be a stipulation.
* Chomsky (2006) asserts that “Merge cannot create objects in which some object W is shared by the merged elements X, Y . It has been argued that such objects exist. If so, that is a departure from SMT, hence a complication of UG.”
293 1.3. Zero Merge
The requirement to have Spec and Comp positions filled faces a problem: it creates a ‘bottomless’ tree by eliminating a line with only terminal Xs. However, real sentences always have an ending point. The solution lies in redefining binarity to include zero-branching - in other words, to start FS with 0 instead of 1. This follows directly from the requirement to combine each successive element with a sum of already merged elements. For example, merging 2 with 1 yields a new element 3, while merging two elements one of which is not a sum (2+0) does not. New terms are created in the process of merging terms with sets, to ensure continuation of motion. The newly introduced type of M, zero (0)-M distinguishes between terms {1}/X and singleton sets { 1, O}/XP, the latter indispensable for syntactic recursion. When the sum of terms is present at each step, it provides the ‘bottom line” in the tree. The suggestion to regard an empty element as functional in M has serious consequences for the theory of binary branching. The minimal building block that enters into linguistic computation is re-evaluated to include 0 - M , and identified as the product of 0-M. As a result, binarity is preserved, while there is no problem caused by the requirement to fill Specs and Comps. XPs and Xs are disambiguated, which eliminates the necessity to proceed with further branching below the bottom level. Furthermore, if the same element can be represented as either a singleton set or a term, it follows that labels X and XP are not syntactic primitives3 The idea that constituent structures have labels appears to be a stipulation - this part of Merge should be abandoned in favor of a rule with a higher explanatory adequacy. As the grammar evolves toward a more generalized syntactic representation, the only necessary mechanism is the one that determines whether a node is a result of Merge or not. Thus,
A bottom node is XP iffthe node undergoes 0-M; otherwise, X. A node is XP iffthe node is the result of Merge; otherwise, X . 2. Types of Syntactic Merge 2.1. Argument Structure (External Merge)
Merge is the operation responsible for the construction of elementary trees and combination of these pieces into larger structures. The Strong Minimalist Thesis entails that Merge of a, fi is unconstrained. Under External Merge (EM), a and j3 are separate objects; under Internal Merge (IM), one is part of the other, and Merge yields the property of displacement (Chomsky 200 1). The pressure for the tree to be maximized justifies the basic principle of organization in both
’ Heads can behave like Phrases and vs. (Carnie (2000), Collins (2002), Chomsky (2004,2005)).
294 types of M. Move is just one of its forms: EM induces IM by virtue of the fact that already conjoined elements have to be linearized at the level relevant for pronunciation. The argument structure is the product of EM. The Fib-rule application makes interesting predictions about the constraints on EM: it accounts for a fixed number of nodes in thematic domains. Assume that 0-M, the operation that takes place prior to lexical selection, is responsible for constructing elementary argument-centered representation^.^ This kind of Merge is relevant at the point where a distinction between terms { 1}/X and singleton sets { l,O}/XP is made, which follows directly from the functional pressure of cyclic derivation to merge elements of dzflirent types only. This type-shift (lowering) from sets to entities occurs at each level in the tree. For example, at the point where 2 is merged with I, 2 is the sum of 1 and 1, but 1 is a single term. As is shown in (fig. 2), ad1 is type-shifted from singleton set { a 1,0} (XP) to entity al (X) and merged with a2 (XP). The type of a2/l is shifted from singleton set {a2 , 0) (XP) to entity a2 (X) and merged with (XP).
Rigure 2.
Recall that the argument structure is built upon hierarchical relations automatic for recursive operations (Chomsky 2005). In the present system, the recursively applied rule adjoins each new element to the one that has a higher ranking, starting with the term that is ‘0-merged first’.There is a limited array of possibilities depending on the number of positions available to a term adjoining the Fib-like argument structure. This operation either returns the same value as its input (0-M), or the cycle results in a new element (N-M). 1. Term al is 0-merged ad infiniturn. The result is zero-branching structures. Chomsky (2006) specifies that there exist other argument-based constructs such as e.g. Pritchett’s (1992) theta-driven model of perception, ‘relevant to the use of language’. In such and similar models, a verb is a theta-role assigner. The (Fib-based) model of EM offered in this paper is argument-centered.
’ Conventions adopted in Fig2 are as follows: a is entitykerm, a, (XP) and a2(XP) are singletonsets, p and y are nonempty (non-singleton) sets.
295
2 . 0-merged al is type-shifted from set (XP) to entity (X) and N-merged with a2. The result is a single argument position, e.g. in Evel laughs, The C U D ~ broke. 3. Both terms a and a 2 are type-shifted; the result is two argument positions, e.g. in loves Adam?. 4. There are exactly three positions to accommodate term 1 (i, ii, and iii). This may explain why in double object constructions the number of arguments is limited to three (Eve, gave Adamz an applej) (fig. 3).
YJ3
(X> Figure 3 .
2.2. Phase Formation (Internal Merge) The explanation of IM is very straightforward if we assume that derivations proceed by phases and movement depends on the qualification of phrases as phases.6 In this paper, phases are primarily characterized by their ability to induce a cycle by projecting extra Spec positions, to ensure continuation of movement in derivations. Research on phases has resulted in a complex representation that consists of two levels: one involves two individuals, and another expresses an individual-event relation (Pylkkanen 2003, among others). Sentences John baked gave [Mary],,,drv,dua/ [a cake] ,,,drv,dua/ are the first type, and [John baked a cake1 event Vor M a v l md,v,duo/ 1 [John gave a cake1 event [to Mary’] ,ndrvrdl,a/are the second. It was suggested that a relation between individuals is established by means of the Individual Applicative (Appl) Head in I-Appl Phrase, and by means of the Event Appl Head in E-Appl Phrase (fig. 4). E-ApplP
VP
Figure 4.
‘
For the discussion of phase formation see BoskoviE (2002), Epstein and Seely (2002), Legate (2003), Suranyi (2004), and Wexler (2004).
296 Are phases propositional? According to Chomsky, the answer is most probably yes. In the above-cited linguistic literature, it was maintained that only the relation between individuals and events constitutes a (propositional) phase, to provide an account of passive formation in the Applicative and Double Object constructions. It was concluded that the absence of a ‘landing site’ crucial for restructuring - an extra Spec-position in I-Appl Phrase - disqualifies it from phases, by blocking Direct Object (DO) movement. As a result, sentences of the kind A cake was baked tcake for Mary and A cake was given t C & to Mary are grammatical (DO movement of NP a cake to Spec, E-ApplP), while A cake was baked Mary tc& and A cake was given Mary tcake are not. However, IApplicatives behave like phases in other languages, by allowing DO-movement in passives (Soschen 2006). In synthetic (inflectional) languages such as Russian, Italian, and Hebrew, I-ApplPs exhibit the properties of minimal, internal phases. The absence of these (min)-phases is characteristic of languages with fixed word order, where subject and object have to be ordered with respect to the verb (e.g. analytical languages English and Icelandic), while both groups are characterized by maximal (propositional) phases (i.e. E-ApplP). Thus, syntactic phase formation can be regarded as language-specific if phases are redefined as maximal1 propositional and minimall non-propositional, or internal sub-phases. It follows then that any X can in principle head a phase. 2.3. Strict Cycle Condition
Chomsky (1973) states that ‘no rule can apply to a domain dominated by a cyclic node A in such a way as to affect solely a proper sub-domain of A dominated by a node B which is also a cyclic node’. This condition is borne out in languages with min-phases that allow DO-movement, while Indirect Object movement in is blocked: sentences such as Mary10 was baked a cakeDo are ungrammatical in these languages. Once an object is moved through an existing Spec position, any other movement is blocked. From a more general perspective, in a system where X(n) = X(n-1) +X(n-2), GR between the terms is preserved only when each term is combined with the one that immediately precedes it. Once a phase is complete, it is impossible to extract yet another element from its domain. For example, 5 is a sum of 3 and 2. If the sum were formed by adding 1 (instead of 2) to 3 etc., a sequence would yield (1, I, 2, 3, 4, 6, 9 ,...), violating GR. 3. Natural Law and Syntactic Recursion A species-specific mechanism of inJinity makes Syntactic Recursion SR
crucially different from other discrete systems found in nature: there is no limit to the length of a meaningful string of words. Language is also discrete: there are neither half-words nor half-sentences. Syntactic units are also continuous:
297 once a constituent is formed, it cannot be broken up into separate elements. As an example, sentence The dog chased the cat is the basic representation; in a passive construction The cat was chased ,he ca, by the dog, NP the cat moves to the beginning of the sentence only as a constituent, the reason why Cat was chased theca, by the dog is ungrammatical. In the present work, the impenetrability (or continuity vs. discreteness) of already formed constituents as a sub-case of a more basic operation type-shift is viewed as the key requirement of syntactic recursion. In contrast, segments comprising other GR-based systems of growth can in principle be separated from one another. We have shown that a general physical law that appears in every living organism applies to the universal principles of grammar. Consequently, SR as a sub-system of optimal space filling can be represented graphically. Depending on whether a phase (stage of growth) is complete or not, each constituent appears either as part of a larger unit or a sum of two elements. In fig. 5 (left), one line that passes through the squares ‘ 3 ’ , ‘2’, and ‘1’ connects ‘3’ with its parts ‘2’ and ‘1’; the other line indicates that ‘3’ as a whole is a part of ‘5’.
Figure 5
The pendulum-shaped graph representing constituent dependency in SR is contrasted with a non-linguistic representation to the right where one line connects the preceding and the following elements in a spiral configuration of a sea-shell. The distance between the ‘points of growth’lsegments of a sea shell can be measured according to GR, the requirement of optimization. This system does not comply with IC - for example, ‘5’ is a sum of ‘3’ and ‘2’, while ‘2’ is comprised of separate elements ‘ 1 ’ and ‘ 1 ’. In sum, we have reached some conclusions concerning the underlying principles of CHL by developing the idea that linguistic structures possess the properties of other biological systems. Syntactic Recursion is part of a larger mechanism designed for optimal distance between elements and continuation of movement. While Language Faculty obeys the rule of optimization, the Impenetrability Condition (type-sh$) is viewed as the basic rule applicable in
298 SR only. In contrast with other GR-based natural systems of efficient growth, each syntactic constituent can be represented as either discrete or continuous. References
BoscoviE, Z. (2002). A-movement and the EPP. Syntax 5, 167-218. Carnie, A., Medeiros D., & C. Boeckx. (2005). Some Consequences of Natural Law in Syntactic Structure. Ms. University of Arisona, Harvard University Press. Carnie, A. (2000). On the Definition of X) and XP. Syntax 3, 59-106. Chomsky, Noam. 1973. Conditions on transformations. In S. Anderson and P.Kiparsky (Eds.), A Festschriftfor Mossis Halle, pp. 232-286. New York: Holdt, Winehart and Winston. Chomsky, Noam. 2001. Derivation by Phase. In M. Kenstowicz, ed. Ken Hale: A Life in Language. Cambridge, Mass.: MIT Press, 1-52. Chomsky, Noam. 2004. Beyond Explanatory Adequacy. In A. Belletti, ed. Structures and Beyond: The Cartography of Syntactic Structures, Vol. 3. Oxford: Oxford University Press, 104-13 1. Chomsky, Noam. 2005. On Phases, to appear in C.P.Otero et al, eds., Foundational Issues in Linguistic Theory, MIT. Chomsky, Noam. 2006. Approaching UG from Below. Ms., MIT. Collins, Chris. 2002. Eliminating Labels. in Samuel Epstein and Daniel Seely (eds.), Derivation and Explanation in the Minimalist Program. Oxford: Blackwell Publishing. Douady, S., and Couder, Y. (1992) Phyllotaxis as a physical self-organize growth process. Physical Review Letters 68,2098-2101. Epstein, S. D., and T. D. Seely. 2002. Rule Applications as Cycles in a LevelFree Syntax. In Derivation and Explanation in the Minimalist Program, eds. S . D. Epstein and T. D. Seely. Oxford: Blackwell, 65-89. Legate, J. A. 2003. Some interface properties of the phase. Linguistic Inquiry 34.3. Pritchett, Bradley I. 1992. Grammatical competence and parsing performance. Chicago and London: University of Chicago press. Pylkkanen, Liina. 2003. Introducing arguments. Doctoral Dissertation, MIT. Soschen, A. 2006. Natural Law: The dynamics of syntactic representations in MP. Linguistics in Potsdam 25. Hans Broekhius and Ralf Vogel (eds.): Optimality Theory and Minimalism: a Possible Convergence? ZAS, Berlin. Suranyi, B. 2004. The left periphery and Cyclic Spellout: the case of Hungarian. In: D. Adger, C. de Cat and G. Tsoulash (eds.) Peripheries and Their Effects. Dordrecht: Kluwer, 49-73. Wexler, Kenneth. 2004. Theory of phasal development: perfection in child grammar. MZT Working Papers in Linguistics 48, 159-209.
SIMPLE, BUT NOT TOO SIMPLE: LEARNABILITY VS. FUNCTIONALITY IN LANGUAGE EVOLUTION
SAMARTH SWARUP’ AND LES GASSER’22 ‘Graduate School of Library and Information Science, ‘Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana-Champaign, IL 61801, USA {swarup,gasser} @uiuc.edu We show that artificial language evolution involves the interplay of two opposing forces: pressure toward simple representations imposed by the dynamics of collective learning, and pressure towards complex representations imposed by requirements of agents’ tasks. The push-pull of these two forces results in the emergence of a language that is balanced: “simple but not too simple.” We introduce the classijication game to study the emergence of these balanced languages and their properties. Our agents use artificial neural nctworks to learn how to solve tasks, and a simple counting algorithm to simultaneously learn a language as a form-meaning mapping. We show that task-language coupling drives the simplicity-complexity balance, and that both compositional and holistic languages can emerge.
1. Introduction In recent years, the application of the evolutionary metaphor to language change has gained currency. A natural question this raises is, what determines fitness of a language? Linguists often answer by attributing extrinsic sources of fitness, such as the prestige of the speaker (Croft, 2001; Mufwene, 2002). For human languages it is generally accepted that intrinsic factors such as the learnability of a language do not vary across languages. A child born into a Hindi-speaking community will learn Hindi as easily as a child born into an English-speaking community will learn English. Modern languages, however, have adapted over a long period of time. If we go far enough into the history of language, it is clear that (aspects of) early languages had differential fitness. For example, Phoenicians were the first to develop a phonetic alphabet. This innovation quickly became established in many languages, even though the Phoenician language itself died out. One explanation is that phonetic alphabets fixated because a phonetic writing system is much easier to learn. However, if learnability were the only source of fitness for a language, we would expect to see maximally simple, possibly trivial, languages prevail, since these can be learned most easily. Indeed, simulations have shown this to be the 299
300 case (Swarup & Gasser, 2006). To allow the emergence of more complex, and thus more useful, languages in simulation, a bias for increasing complexity has to be built in by the experimenter (Briscoe, 2003). Where would this bias come from, if it exists, in a natural system? It seems intuitive that the counter-pressure to make language more complex must come from the functionality of language. A language is for something. In other words, the agents gain some benefit from having a particular languageh-epresentation. If the use of a particular language gives an agent high reward (perhaps through low error on some task), then part of that reward gets transferred to the language as a boost in fitness. Languages that are too simple, however, are unlikely to be very functional, because their information-carrying capacity is low; an agent should feel a pressure to discard such a language. Thus we can imagine that languages occupy a complexity line, with complexity increasing to the right, as shown in figure 1. Learnability increases with simplicity, to the left, and expressiveness or functionality increases with complexity, to the right. Together, these two define the intrinsic fitness of languages,
Learnable
Just Right
Expressive
Figure 1. The complexity line for languages.
In terms of this complexity line, we would like the languages that evolve to be in the region that is “just right”, where the language that evolves is both easily learnable and adequately useful. Such a language would be well-adapted, by some measure, to the tasks at hand. The goal of this paper is to relate language to task in a way that allows a population of agents to jointly learn a shared representation that is well-adapted to the complexity of the task they face. We do this by setting up a classification game (below), where agents interact with each other while learning to perform some classification task. The interaction between agents results in the emergence of a shared representation, or language, that is simple but not too simple. The rest of this paper is organized as follows. First we describe the Classification Game, where we relate language to task by treating the agents’ hypothesis space as the meaning space. We then present experiments that illustrate the kinds of languages that can emerge with and without interaction between agents. We show that both holistic and compositional languages can emerge, and that the emergent languages are more efficient than representations learned without communication. Finally, we discuss related work and speculate on future work.
301
2. The Classification Game Now we describe the experimental setup in which agents interact with each other and learn to solve a classification problem while also learning to communicate about it. The learning task in all the experiments below (except the first), is the XOR classification task; it is well known in classification learning and its results are easy to visualize. Inputs consist of just two bits. Thus there are four possible inputs: 00, 01, 10, and 11. The output, or label, is 1 if the inputs are different, otherwise it is 0. We choose hyperplanes (i.e. straight lines) as our hypothesis class. One crucial consequence of this is that at least two hypotheses are needed to solve the XOR task, as will be obvious from the figures later. This choice of hypothesis space leads very naturally to an artificial neural network (ANN) implementation. Each hidden layer node of an ANN, called a perceptron, can be thought of as representing a hyperplane. The number of hidden layer nodes in each agents' neural network defines the maximum number of hyperplanes it can use to classify the training examples. We also refer to the hidden layer as the encoder, since it encodes the inputs into features that are to be communicated.. The second, or output layer, has just a single node. We also refer to this as the decoder, since it decodes the features extracted by the encoder to find the output. The agents also convert the outputs of the encoder layer into a public language using a learned form-meaning mapping (FMM), a matrix [fi,] where each entry defines the likelihood of pairing that form (typically a letter of the alphabet), with that meaning (hidden layer node number).
Speaker
Figure 2. Speaker-hearer interaction.
The game protocol is simple. At each step, we select two agents uniformly randomly. We assign one agent to the role of speaker, and the other to hearer. Next, we present both with the same training example. The speaker treats its active encoder outputs as a set of meanings to be encoded into an utterance. For each active encoder (i.e. hidden layer) output, the speaker looks through the corre-
302 sponding row of its FMM and chooses the column with the maximum value as the corresponding form. It then updates that FMM locus by adding a constant 6 to it. At the same time, all other entries in the same row and column are decremented by a smaller constant, E . This decrement, called lateral inhibition, discourages synonymy and polysemy, and is inspired by a mutual exclusivity bias seen in the language acquisition behavior of young children (Markman & Wachtel, 1988). Symbols corresponding to each of the active encoder units are put together to generate the speaker’s utterance. The hearer tries to decode this utterance via its own form-meaning mapping, and uses its decoder to generate a label. We then give both agents the expected label; they calculate error and update their neural networks. The hearer also uses the backpropagated meaning vector, paired with the speaker’s utterance, to update its FMM. Since the hearer cannot now know which meaning is paired with which form, it simply does updates, including lateral inhibition, for all possible formmeaning pairs in the utterance and meaning vectors it has, with the assumption that the correct mapping will emerge from the statistics of the form-meaning vector pairs. Figure 2 shows the process.
3. Experiments Now we show the results of a series of experiments which demonstrate the effects of collective learning on the emergent representations.
3.1. Driving simplicity: communication without task learning
I
Input
11
Label
11
Utterance
I
4 1
- 4 - 2
0
2
4
6
X
Figure 3. The result of ungrounded language learning. The agents come to consensus on a trivially simple mapping, which assigns the same utterance, and same label, to all the points.
This experiment shows how collective language learning in the ungrounded case (without environmental feedback) leads to linguistic simplicity. At each step, we give the speaker a random 2-bit Boolean vector, and the label generated by the speaker is treated as the expected label for the hearer’s neural network. Thus
303 the hearer tries to update its neural network to match the speaker’s. Speaker and hearer both update their form-meaning mappings as described previously. An example representation that the agents converge upon is shown in figure 3. We see that all the hyperplanes have been pushed off to one side. It so happens that hyperplanes A, B, and D are oriented in such a way that they label all the points as zero, while hyperplane C has ended up in the reverse orientation with respect to the others, and labels all the points as one. This is why the table shows C as the utterance for all the points. The decoder for all the agents, though, decodes this as the label zero for each point, as shown in the table, and also by the empty (unfilled) circles in the figure.
3.2. Driving complexity: task learning without communication The second experiment shows the opposite: pressure towards expressiveness. In this case agents all learn individually to solve the XOR problem from training examples, and they don’t communicate at all. Figure 4 shows an example of an individually learned solution to the task. They are not updating their form-meaning mappings and thus it does not really make sense to talk about their languages in this case. But, if we assume a mapping that assigns symbol A to the first hidden layer node, B to the second and so on, we can derive what their language would have been if this solution had been adopted by all agents.
3 P
0
B
Figure 4. A learned solution to the XOR problem, without communication between agents. Different agents learn different solutions, but they all generally learn ovcrly complex solutions,
The agents had four hidden layer nodes available to them to encode hypotheses and this agent uses them all. While it solves the problem perfectly, the learned representation, as we can see, is overly complex. Different agents learn different solutions, depending on random initialization of ANN weights. However, the minimal solution, which uses only two hyperplanes, is observed very rarely.
304 3.3. Finding balance: coupled task-communication learning In the next experiment, we allow the agents to communicate while also learning to solve the task. With this task-language coupling, the agents converge to a maximally simple mapping that also solves the problem. In some (chance-determined) runs, agents develop languages with redundant symbols, and in some they do not; a redundancy example is shown in figure 5.
2 1 >
0
ABD 1
-1
-0.5
0
0.5
1
1.5
2
X
Figure 5. A learned solution to the XOR problem, with communication between agents. All agents converge to the same solution. Even though they have four hidden layers nodes, they converge on a simpler solution that uses only two of the nodes.
The population consisted of only four agents, and figure 6 shows the learned form-meaning mappings of all the agents, as Hinton diagrams. The size of a box is proportional to the magnitude of the value.
Figure 6 . The learned form-meaning matrices for each of the agents from experiment 4. Formmeaning pairs that have become associated have been highlighted with circles.
There are a couple of interesting things to note about these matrices. First, they all map symbols and hyperplanes uniquely to each other. Each row and column has a distinct maximum in each of the matrices. Second, they are all different (except the first and third). In other words, their private interpretation of symbols is different, even though they all understand each other and have perfect performance on the task. Thus while their task representations and public language are aligned, their private languages are different.
305 3.4. Coupled learning: the emergence of a holistic language
The language that is shown to emerge in the previous experiment is compositional, in the sense that the agents make use of multiple symbols, and combine them meaningfully to communicate about the labels of various points. Though this is an interesting and desirable outcome from the point of view of language evolution, it is a pertinent question to ask whether this outcome is in some way built in, or whether it is truly emergent. To show that it is, in fact, emergent, we present the following result. Figure 7 shows the outcome of a run with identical parameters as experiment 4. However, this time we see the emergence of a holistic language. Each point has a unique symbol associated with it (one of the points has no symbol, as A is redundant).
AD -1
-0.5
0
05
1
15
2
X
Figure 7. A learned solution to the XOR problem, where the communication matrix is learned by counting. All agents converge to the same solution. They essentially memorize the points, assigning one hidden layer node to each point.
In effect, the agents have memorized the points. This is only possible in the case where the neural network is large enough in the sense of having enough hidden layer nodes to assign a unique one to each point. In any realistic problem, this is generally not the case. However, this notion of the role of cognitive capacity in the emergence of compositionality and syntax has been studied theoretically by Nowak et al. (Nowak, Plotkin, & Jansen, 2000). They showed that when the number of words that agents must remember exceeds a threshold, the emergence of syntax is triggered. In our case, this threshold is defined by the number of hidden layer nodes. If the number of points that must be labeled exceeds the number of hidden layer nodes, the network must clearly resort to a compositional code to solve the task. 4. Conclusion
Two kinds of holistic languages can be seen in our system. When a single hypothesis is sufficient to solve the problem, we have a language where a single symbol is used for a single class. This would be something like an animal giving an alarm
306 call when any predator is detected. The second kind of holistic language we see in our experiments is described in experiment 3.4, where a single hypothesis is associated with each point. This corresponds to giving a unique name to each object in the domain of discourse. Thus our model has intrinsic reasons for the emergence of holistic and compositional languages, as opposed to the population level model of Nowak et al. Kirby et al. have also given an account of the emergence of compositionality via their Iterated Learning Model (ILM) (Smith, Kirby, & Brighton, 2003; Kirby, 2007). The ILM models cultural transmission of language, for example from parents to children through successive generations. They show that since language must pass through the bottleneck of child language acquisition, the only languages that are stable are the ones that allow the construction of new valid utterances on the basis of known utterances. In other words, compositionality is favored by the need to learn quickly from a few samples. Our model is similar to theirs in the sense that the population of agents tends to converge upon a simple language, which, as we have discussed earlier, leads to better generalization. However it is not clear if the causal mechanisms that lead to this phenomenon are the same in both models. To better investigate this question, we could extend our model in an analogous manner, by considering generations of agents, that get different training sets for the same problem, or possibly even for different, but related, problems. This presents an interesting possibility for future research.
References Briscoe, T. (2003). Grammatical assimilation. In M. H. Christiansen & S. Kirby (Eds.), Language evolution: The states of the art. Oxford University Press. Croft, W. (2001). Explaining language change. Longman Group United Kingdom. Kirby, S. (2007). The evolution of meaning-space structure through iterated learning. In C. Lyon, C. Nehaniv, & A. Cangelosi (Eds.), Emergence of communication and language (p. 253-268). Springer Verlag. Markman, E. M., & Wachtel, G. F. (1988). Children’s use of mutual exclusivity to constrain the meanings of words. Cognitive Psychology, 20, 121-157. Mufwene, S. (2002). Competition and selection in language evolution. Selection, 3( l), 45-56. Nowak, M. A,, Plotkin, J. B., & Jansen, V. A. A. (2000). The evolution of syntactic commmunication. Nature, 404,495-498. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: A framework for the emergence of language. Artificial Life, 9(4), 371-386. Swarup, S., & Gasser, L. (2006). Noisy preferential attachment and language evolution. In From animals to animats 9: Proceedings of the ninth international conference on the simulation of adaptive behavior Rome, Italy.
KIN SELECTION AND LINGUISTIC COMPLEXITY MAGGIE TALLERMAN Linguistics Section, SELLL, Universiw of Newcastle upon Tyne, Newcastle, NEl 7RU, U.K. Language is typically argued (or assumed) to be an adaptive trait in the Homo lineage, and various specific selection pressures are offered to explain why language would have increased fitness in a population. However, it is incoherent to discuss ‘language’ as a monolithic entity: the set of properties that comprise the full, complex language faculty almost certainly evolved independently, and any pressure that ‘buys’ one of these properties does not necessarily entail the others. Some recent work on kin selection starts by discussing the evolution of speech, but then moves on to the selective value of the exchange of information without indicating how our ancestors got from vocalization to propositions. This is too large a leap, and more specific mechanisms must be proposed if the hypotheses are to be seriously considered.
1. Introduction: evolution and the components of the language faculty
Most authors agree that language is adaptive, in other words that it is a trait the possession of which enables an organism to be better adapted to its environment, and thus more likely to survive and reproduce. This seems reasonable, since complex features (such as eyes or wings) only appear if their earliest manifestations and subsequent stages all confer selective value. The complexity of language is such that it seems hardly controversial to suggest that language as a phenomenon does offer selective advantages. The problem faced by a theory of language evolution is that language is modular, and hence cannot possibly evolve as a monolithic, unified phenomenon. Therefore, selection pressures cannot operate on ‘language’ as an entity in the earliest stages of its evolution, but instead must target its individual components. As a faculty, language comprises many distinct features, not all of which are interdependent even in the fully-modern faculty, and which must therefore have evolved individually (cf. Jackendoff 2002). For instance, linguistic vocalization and syntax are independent modules (as attested by signed languages, infant speech, non-nativized pidgins, Wernicke’s aphasia, and phonetically-based language games; and less obviously, by the fact that an 307
308 imitation of human vocalization can be made by parrots). Symbols can certainly be used without syntax, as attested in ‘ape language’ research, and in pidgins. But syntax does depend on a lexicon which stores the subcategorization and selectional requirements of words and stems, which in turn relies on the ability to learn arbitrary symbolic associations. However, it is not hard to envisage a protolanguage with stored symbols and a sound system but no syntax (Bickerton 1990). We might imagine that certain features of full language, such as morphology and a fully-developed pragmatics, were later to evolve (CarstairsMcCarthy, 2005; Callanan, 2006). This leaves the following as central features to be accounted for: the evolution of linguistic symbols; the evolution of a combinatorial sound system; the ability to have vocal utterances under voluntary control; the capacity to learn a large and extendable vocabulary, and for vocabulary to be culturally transmitted rather than essentially innate; and finally, the syntactic capacity, including “discrete infinity” - the ability to form an unlimited set of phrases and sentences from a finite set of words and morphemes. A complete theory of language evolution must ideally also specify how the various independent or semi-independent modules of language came to be inextricably linked in normal adult usage, so that we can now genuinely refer to a ‘language faculty’. From the evolutionary perspective, we need to know what factors drove the emergence and development of the observed properties of language. In recent years, the question of what selective value language had for early hominins, and what selection pressures existed in the evolution of language, have become a topic of frequent discussion; see, for instance, Calvin & Bickerton (ZOOO), Dunbar (1993, 1996), Falk (2004), Fitch (2004), Locke & Bogin (2006), Mithen (2005), amongst many others. Mechanisms proposed include sexual selection, group selection and kin selection. In this paper, I discuss the interaction between selection pressures and linguistic complexity with reference to proposals in recent literature for kin selection, including parental selection. One recurring problem is that, although it is reasonable to start with the evolution of vocalization, some proposals move from speech to language without suggesting mechanisms for getting from a single module to the full language faculty. I also discuss the question of whether the selection pressures that are suggested can in fact give rise to the components of language which the authors are claiming for them.
309 2. Vocalization in early hominin infants and mothers
In a recent paper, Locke & Bogin (2006; henceforth L&B) focus on the evolution of vocalization, offering an account which suggests selective advantages for an expanded suite of vocal abilities throughout the ‘life history’ of hominins, or ‘selection for speech’ (p. 275). In the earliest stages of ontogeny, L&B’s parental selection hypothesis suggests, infants that vocalized appropriately received more care and attention than those who did not: thus, parents effectively selected for ability to vocalize. L&B outline an account in which hominin mothers early in the Homo lineage, approximately two million years ago, began weaning their infants at a younger age, thus enabling the mother to have further offspring sooner than is possible, for example, in chimpanzees. During this period, the argument runs, the still-dependent infants must deploy new methods ‘of signaling their needs to, and appraising the reactions of, heavily burdened caregivers’ (p. 277). An account remarkably similar to L&B’s is also proposed by Falk (2004), who suggests that natural selection targeted parents too: ‘hominin mothers that attended vigilantly to infants were strongly selected for’ (2004:49 1). Her specific idea is the ‘putting the baby down’ hypothesis, which proposes that a special, infant-directed vocalization was initially used by hominin mothers, around the same era, to soothe infants which had been temporarily put down while their mothers were foraging. Like L&B, Falk believes that selection would favour ‘infants that vocalized their distress upon becoming separated’ (2004: 501). The specific mechanisms for getting from a pre-linguistic motherese or from infant vocalization to anything language-like are left vague in each case. Falk suggests (2004: 50 1) that ‘the prosodic [infant-directed] vocalizations of hominin mothers would have taken on less emotional and more pragmatic aspects as their infants matured’, and similarly ‘[olver time, words would have emerged in hominins from the prelinguistic melody [...] and become conventionalized’. But these developments cannot simply be assumed: no account is given of the stages leading from early hominin motherese even as far as a syntax-free protolanguage. Most importantly, Falk does not outline what this form of motherese might have looked like. It cannot have a special, simplified form of vocabulary, since words had yet to evolve. It cannot even consist of exaggerated vowels and a slower tempo, since this implies that there are some standard vowels and a standard tempo to begin with. L&B suggest that as infancy progresses, the better-vocalizing infants get more care, and are more likely ‘to generate and learn complex phonetic patterns’
31 0 (p. 266). But this hypothesis immediately runs into a problem, for where would these complex patterns come from? As Studdert-Kennedy (2005) stresses, we cannot assume that the purely linguistic elements, vowels and consonants, are somehow already in place. Instead, we need an account which explains how these cognitive elements come into being (e.g. MacNeilage 1998). For infants to ‘learn complex phonetic patterns’, there would have to be something already present to be learned, and presumably it must be learned from caregivers. L&B do not suggest how this learning cycle began; nor do they suggest a path from ‘speech’ to compositional phonology, which is by no means an automatic development. Moreover, neither of these BBS target articles discusses why ‘the capacity to produce more complex vocalizations’ (L&B, p. 277) was advantageous: in other words, why would a larger and more differentiated array of noncompositional, innate calls and gestures not have sufficed for hominin infants to make their needs known under the changed environmental and developmental circumstances, and indeed for their mothers to communicate with the infant? Why would more sophisticated (protolinguistic) vocalization be more likely to elicit care than non-linguistic infant vocalization? This does suffice for nonhuman primate infants, for they too are still dependent even when able to eat adult food; chimpanzee mothers care for their infants until they are around five years old (Nishida et al. 1990), and infants may stay with their mothers until they are ten. Here, some major evolutionary shift in the hominin line would seem to be necessary to account for parental selection - how and why did linguistic vocalization come to be preferred by parents over ordinary primate vocalization? It also seems odd that (proto)language or protolinguistic vocalization is offered as the primary way for infants and young children to make it clear that they need attention; surely, early hominin infants, just like infants today, would simply burst into tears and wail - using non-verbal, phylogenetically-ancient primate distress signals. 3. Teaching, learning and information exchange
Crucially, it cannot be assumed that at this stage in hominin evolution, there was any meaningful content in infant vocalization. Even if ‘complex phonetic patterns’ have evolved, there is no suggestion that symbolic reference (Deacon 1997) has yet emerged. Yet L&B go on to say that ‘the kin group provided a context in which it was advantageous to exchange information, and [. . .] infancy and childhood furnished raw vocal material that would have favored any system
31 1
of spoken communication’ (I,&B, p. 267). Moreover, as the stage of childhood became extended in hominins, ‘opportunities arose [. . .] for the negotiation of more structured and complex forms of vocalization, and [. .,] benefits would have accrued to families that were able to deploy these more complex forms meaningfully, and thus to warn, advise and inform each other’ (p. 272). But this too requires a massive development, and one which is not expanded on in L&B’s account: how do we get from better vocal skills to the exchange of information? Thus, selection pressures have been proposed, but there is a gulf between what they might produce and what language actually comprises. However complex vocalization becomes, it does not lead automatically to the evolution of symbolic reference, to the voluntary control of vocalization, or to cultural transmission of vocabulary, let alone to propositional syntax. In their BBS response, L&B say that ‘the components of language are related, [. ..] stitched together by sequential patterns of selection’ (p. 3 1 1). This view is not too controversial, but if claims are being made regarding the evolution of the entire language faculty, then specific mechanisms must be offered whereby selection for vocal abilities can lead to the development of other linguistic modules (cf. Carstairs-McCarthy 1999 for one attempt). Fitch (2004: 286) also proposes that ‘a key selective advantage of language is the transfer of information between kin, particularly parents and their offspring’. However, there is very limited evidence of deliberate teaching of kin either among chimpanzees or bonobos (cf. Boesch 1991). This suggests that our common ancestors also did not teach their offspring intentionally, which means that the emergence of teaching itself has first to be accounted for. Moreover, there is a danger here of foundering in the ‘teleological pitfall’ (Johansson 2005). Fitch (2004:289) suggests that a kin-selected communication system provides a ‘selective force that could underlie the generation of complexity: the need to communicate arbitrarily complex ideas’. Teaching infants and children, and the exchange of information, may well benefit from the evolution of language, but language cannot evolve in order to be used for these purposes: a trait does not evolve because it is needed. The more pertinent question is why such explicit teaching and information exchange became more important in the life of early hominins than they were (and are) for other primates, who would, presumably, also have found these skills highly useful. Once explicit teaching and information exchange are in place in some form, then they could create selection pressures for enhanced communication. It is notable, however, that other primates appear not to deliberately exchange much information about their environment. It would also be worth knowing how much parent-child interaction in modern human populations does concern ‘the transfer
312
of information’, rather than (say) discussions of food, playthings or the family’s animals, or attempts to break up sibling quarrels. The extended period of childhood is also seen by L&B as an increased opportunity for learning, including language learning. While this is undoubtedly true for fully-modern infants, who live in societies in which older children and adults are already in possession of language, L&B’s account seems confused: they appear to propose several times both that early hominin infants and children are responsible for the increasing complexity of language, and simultaneously that these infants and children are learning a more fully developed language from their care-givers: ‘Young hominins also would have needed to know about plants as well as game, tools, shelter, and predators. Even a small amount of vocal-verbal behavior would have facilitated warnings and instruction’ (p. 274). It is therefore unclear who is driving the development of the language faculty: infants and children, as L&B claim to propose, or adults, as the kin selection hypothesis suggests. Logically, of course, it could be both; but we do need to know where the increased complexity comes from. The problem remains exactly how the major features of the language faculty can arise from the pressures proposed.
4. Where does syntax come from? Moving on to other central properties of the language faculty, Falk (2004) suggests that the ‘social syntax’ involved in turn-taking between mothers and infants at the babbling stage ‘may enhance infants’ acquisition of other rules that are preliminary to learning the proper arrangements for elements within sentences (syntax)’ (p. 496). This carefully-worded proposal is perhaps not intended to be a very strong claim, but it certainly underestimates what is involved in syntax, which is clearly far more complex than the mere arrangement of elements. Fitch (2004) suggests that the property of ‘discrete infinity’ is not in fact unique to language: ‘the songs of birds or humpback whales use recombination of basic units to form larger, more complex units, and there are no obvious limits on the variety of units thus formed’ (p. 283). But morphemes and words are meanin@ units, whereas the phrases of birds and whales do not consist of meaning-bearing elements, nor are the complex units which are formed in any way a sum of their parts, or propositional. So far, no convincing analogues for syntax in non-human communication have been offered.
313 5. How did protolanguage ever leave home?
One important question for the kin selection hypothesis is how (proto) language ever got from the motherhnfant dyad, or from the immediate family, into the community. Fitch (2004) counters the problem of why language is not used today predominantly for communicating with relatives by referring to the human propensity for reciprocal altruism, suggesting that ‘valuable information could be exchanged at low cost’ (p. 290). However, this does not address the question of how (proto)language got outside the family in the first place. Instead, we might expect each family within a community to develop and maintain its own protolanguage. L&B (2004: 278) suggest that indeed, ‘vocal behaviors’ stayed within the family for a long period in evolution, finally emerging with adolescence (i.e. via sexual selection). And Falk (2004: 502) suggests that ‘protolinguistic utterances of early hominins would have become conventionalized across their groups’. But the details of the extension of protolanguage from family to community seem quite difficult to account for. Moreover, even if reciprocal altruism offers an explanation for why we don’t talk just to our kin (Fitch 2004: 289-90), if the transmission of information is so important in increasing fitness, then it would definitely be advantageous to keep it in the immediate family. 6. Conclusion
It would be unreasonable to expect a single theory of language evolution to have all the answers, to suggest ways in which each of the crucial central features of language outlined in section 1 could have originated and evolved. However, recent work makes too large a leap from new vocal skills to information-sharing. Kin selection may well play an important role in language evolution, but we need more details about how the gulf between (proto)linguistic vocalization and language was breached.
References
Bickerton, D. (1990). Language and species. Chicago: University of Chicago Press. Boesch, C. (1991). Teaching in wild chimpanzees. Animal Behaviour, 41, 530532. Callanan, S. (2006). The pragmatics of protolanguage. Paper presented at the Cradle of Language Conference, Stellenbosch, South Africa, 6-1 0 November.
314
Calvin, W. H. & Bickerton, D. (2000). Lingua a machina: reconciling Darwin and Chomsky with the human brain. Cambridge, MA & London: The MIT Press. Carstairs-McCarthy, A. (1 999). The origins of complex language: an inquiry into the evolutionary beginnings of sentences, syllables and truth. Oxford: Oxford University Press. Carstairs-McCarthy, A. (2005). The evolutionary origin of morphology. In M. Tallerman (Ed.), Language origins: perspectives on evolution (pp. 166- 184). Oxford: Oxford University Press. Deacon, T. (1 997). The symbolic species: the co-evolution of language and the human brain. London: Penguin Books. Dunbar, R. (1993). Coevolution of neocortical size, group size and language in humans. Behavioral and Brain Sciences, 16, 681-735. Dunbar, R. (1996). Grooming, gossip and the evolution of language. London: Faber & Faber. Falk, D. (2004). Prelinguistic evolution in early hominins: when motherese? Behavioral and Brain Sciences, 27,491 -54 1. Fitch, W. T. (2004). Kin selection and ‘mother tongues’: a neglected component in language evolution. In D. K. Oller & U. Griebel (Eds.), Evolution of communication systems: a comparative approach (pp. 275-296). Cambridge, MA & London: MIT Press. Jackendoff, R. (2002). Foundations of language: brain, meaning, grammar, evolution. Oxford: Oxford University Press. Johansson, S. (2005). Origins of language: constraints on hypotheses. Amsterdam: John Benjamins. Locke, J. & Bogin, B. (2006). Language and life history: a new perspective on the development and evolution of human language. Behavioral and Brain Sciences, 29, 259-325. MacNeilage, P. (1998). The frame/content theory of evolution of speech production. Behavioral and Brain Sciences, 21,499-5 11. Mithen, S. (2005). The singing Neanderthals: the origins of music, language, mind and body. London: Weidenfeld & Nicolson. Nishida, T., Takasaki, H. & Takahata, Y . (1990) Demography and reproductive profiles. In T. Nishida (Ed.), The chimpanzees of the Mahale Mountains: sexual and l f e history strategies, (pp. 63-97). Tokyo: University of Tokyo Press. Studdert-Kennedy, M. (2005). How did language go discrete? In M. Tallerman (Ed.), Language origins: perspectives on evolution (pp. 48-67). Oxford: Oxford University Press.
REGULARITY IN MAPPINGS BETWEEN SIGNALS AND MEANINGS
MONICA TAMARIZ & ANDREW D.M. SMITH Language Evolution and Computation Research Unit, Linguistics and English Language. University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL monica @ling.ed.ac.uk/ andrew @ 1ing.ed.ac.uk We combine information theory and cross-situational learning to develop a novel metric for quantifying the degree of regularity in the mappings bctwccn signals and meanings that can be inferred from exposure to language in context. We illustrate this metric using the results of two artificial language learning experiments, which show that learners are sensitive, with a high level of individual variation, to systematic regularities in the input. Analysing language using this measure of regularity allows us to explore in detail how language learning and language use can both generate linguistic variation, leading to language change, and potentially complexify language structure, leading to qualitative language evolution.
1. Introduction Croft (2000)’s evolutionary model of language change proposes that language is made up of multiple linguistic patterns, which can be differentially replicated across communities and over time, and thereby adapt to their environment. We investigate one potential functional source of such adaptation, namely the ease with which patterns of mapping between signals and meanings can be learnt. Recent research focuses on the inherent stochasticity of language learning (Bod, Hay, & Jannedy, 2003); children make use of statistical regularities in their linguistic input to learn phonemic contrasts (Maye, Werker, & Gerken, 2002), word boundaries (Jusczyk, Goodman, & Baumann, 1999; Saffran, Newport, & Aslin, 1996) and basic syntactic dependencies (Gbmez, 2002). Regularity helps us to learn the specific mappings between meanings and signals: reliable cooccurrence with labels enhances the perceptual salience of features of referent meanings (Schyns & Murphy, 1994), and regularity assists in learning similarities between objects. Patterns of frequency of use also play a crucial role in the entrenchment of linguistic constructions and in the persistence of linguistic irregularity (Bybee & Hopper, 2001). Few efforts, however, have been made to quantify the systematicity or regularity in linguistic knowledge. Our main aim in this paper is to propose such a measure, which can be used to examine how this regularity impacts on the learnability of languages and on their use. In Section 2, we present a novel measure 315
31 6 of quantifying linguistic regularity, based on the confidence in the signal-meaning mappings that learners can derive from their experience over multiple episodes of language use. In Section 3, we use the measure in two artificial language learning experiments, and examine how learning is affected by regularities in the input. Finally, we briefly discuss the ramifications for language change and evolution.
2. Quantifying Linguistic Regularity Researchers in evolutionary linguistics often make a distinction between compositional and holistic languages (Kirby, 2002; Brighton, 2002). In a compositional language, the meaning of a signal is a function of the meanings of elements of the signal and of the way those elements are arranged together. Symmetrically, the signal encoding a meaning is a function of the signals that encode elements of the meaning. In a holistic language, by contrast, there is no such relationship: the whole signal stands for the whole meaning. Human languages, however, are neither wholly compositional nor wholly holistic, but contain constructions of both types, and many with intermediate behaviour. Recent formulations of grammar (Langacker, 1987; Croft, 20011, indeed, use this insight to represent all linguistic knowledge in a large lexicon of constructions, or form-meaning pairings of varying levels of generality, ranging from very general compositional rules to idiosyncratic holistic idioms. From an evolutionary point of view, it would be beneficial to compare languages in terms of their level of compositionality, to explore the conditions under which they become more systematic and can sustain complexity. Despite this, useful measures of systematicity are not available; among the very few attempts to measure language compositionality was Smith (2003), who used the correlation of similarity between signals with similarity between meanings, but only by considering signals and meanings holistically, and thus failing to isolate the effects of meaningful elements of signals and irreducible aspects of meanings. We aim here to fill this gap, by describing a gradient measure to quantify the regularity of mapping (RegMap)between signals and meanings. This measure is based on the cross-situational co-occurrence (Siskind, 1996; Smith, Smith, Blythe, & Vogt, 2006) of signal and meaning components in the language; it is bidirectional, and can thus be used to quantify both the regularity of the mapping from signals to meanings and vice versa; it can also be applied at many different levels of linguistic analysis, from measuring the regularity with which a particular morpheme encodes a component of meaning, to the overall regularity of the entire system. We illustrate the method by exploring the regularities in the miniature artificial language shown in Table 1. In this language, meanings are represented in a three-dimensional meaning space {COLOUR, SHAPE, INSET}, with three different values on each dimension, giving the language 27 possible meanings in total. Each meaning is paired with a signal (shown in the cells of the table), which is also made up of three dimensions, or syllables {(TI, ( ~ 2 , 0 3 } . We can see that the signal
317 Table 1. A language with near-perfect compositionality. Values in syllables 1, 2 and 3 encode values on the meaning dimensions colour, shape and inset respectively, with the exception of the highlighted elements.
square
, I I
SHAPE
hexagon
oval
1
COLOUR red
blue tuloga tuloga tulobe tumudi tumuga tumube kinaga tunabe
I I 1
kilodi kiloga kilobe kirnudi kimuga kirnube penaga kinabe
I I
1
yellow pelodi peloga pelobe tumudi pemuga pemube tunaga penabe
, 1
I
1
I
cross dot star cross dot star
INSET
dot star
and meaning dimensions map onto each other almost perfectly; the first syllable encodes colour, the second, shape and the third, inset. Only a few elements (highlighted in the table), do not conform to this encoding, so these break the perfect compositionality of the language. On the other hand, the language is clearly far from holistic, as there remains a large degree of regularity in the signal-meaning mappings. How can we quantify this systematicity? We start by calculating how regularly a single signal dimension encodes a given meaning dimension, and then scale this up to measure RegMap for the entire language.
2.1. RegMap from a signal dimension to a meaning dimension In developing RegMap, we make use of humans’ “cognitive preference for certainty and for robust, redundant descriptions” (Pierrehumbert, 2006, p.8 l), basing our metric on redundancy, namely the degree of predictability, order or certainty in a system. Redundancy is defined mathematically as the converse of entropy, as measured over a finite set of mutually independent variants (Shannon, 1948). Consider, then, the different variants in the first syllable in the language shown in Table 1, namely { k i , pe, t u } , and how they co-occur with the variants {red, blue, yellow} of the meaning dimension COLOUR depicted in the columns of the table. For each signal variant s in dimension cr1, we can calculate the relative entropy and thence its redundancy R, across meaning variants:
where N , is the number of different values on the meaning dimension (here COLOUR), and P , , ~is the probability that signal variant s and meaning value m co-occur. R, effectively reflects how certain we are that a signal variant in 01 unambiguously encodes one COLOUR variant (Table 2). In calculating the regularity of mapping for the whole signal dimension 01, we
318 Table 2. Co-occurrences of the signal variants of 01 and the meaning values of COLOUR in the language shown in Table 1.
I
I blue k i t
1
I
COLOUR
red 8
yellow 0 7 2
I
R,
F,
RF,
1
0.682 0.657 0.545
9 8 10
6.142 5.256 5.445
I
need to consider the R values for every variant. Following usage-based models (Barlow & Kemmer, 2000), we also assume that frequency plays a crucial role in linguistic entrenchment, and hence the level of regularity which can be attributed to a construction. We therefore multiply the redundancy of each signal variant by its frequency in the language ( F ) ,obtaining a weighted redundancy value ( R F ) . We now define RegMap for a signal dimension S with respect to a meaning dimension M as the sum of R F for each signal variant s, divided by the sum of frequencies for each varianta. This is further adjusted to take account of any discrepancy d between the number of variants in S and the number of variants in M , where d is the greater of these divided by the lesser:
Substituting the data from Table 2 into Eq. 2, therefore, yields a value for RegMup(al-+ COLOUR) of 16.843/27 x 1/(3/3) = 0.623.
2.2. RegMap for the entire language Table 3.
RegMup(S +M ) for all dimension pairs in the language.
I
M COLOUR
SHAPE
INSET
0.008
0.008
ff2
0.623 0.000
1.000
0.000
ff3
0.008
0.007
0.890
ff1
s
I
I I
Rs
Fs
RFs
0.881
0.639
0.563
1.000
1.000
1.000
0.910
0.905
0.825
Table 3 shows RegMap values for all combinations of signal and meaning dimensions, calculated using Eq. 2. Note that when RegMap = 1,there is an unambiguous representation of the meaning dimension by the signal dimension (e.g. RegMup(a2 +SHAPE)); when RegMup = 0, there is no information at all about the meaning dimension in the signal dimension (e.g. RegMap(a2 + COLOUR)). The values in Table 3 can be used to estimate the regularity of the whole language. First, we use Eq. 1 again, substituting signal and meaning dimensions for aEach word occurs once here, so the sum of frequencies is the number of words in the language.
319
signal and meaning variants, to calculate the redundancy for a signal dimension Rs across all meaning dimensions. This value is again weighted by the sum of all the RegMap values for the signal dimension, yielding a modified redundancy value R F s ; this is averaged across all signal dimensions and again adjusted for any discrepancy D between the number of signal dimensions N s and the number of meaning dimensions N M to produce a RegMap value for the whole language:
It is important to re-emphasise that directionality in the mappings between signals and meanings is assumed in these calculations, and therefore that RegMap( Ls+M), as illustrated in the exposition above, will not necessarily yield the same value as RegMap(LM,s) for the same language L. The latter measure can be calculated exactly as describcd above, with the co-ocurrence matrices in Tables 2 and 3 transposed before application of the equations.
3. Miniature artificial language learning experiments We hypothesise that signal and meaning components which map each other systematically are more likely to be learnt and replicated than those with higher levels of ambiguity or uncertainty. To investigate this, we conducted two experiments using an artificial language learning task (G6mez & Gerken, 2000) with artificial languages structured like the one in Table 1, but with different RegMap levels, as detailed in Table 4. 40 participants (14 males, 26 females; all students in their 20s) were randomly assigned to the four conditions; they were recruited through the Edinburgh University Careers website, and each paid &5 for participation. Table 4.
RegMap values for the four conditions in Experiments 1 and 2.
Language 1 RegMap(LhM) 0.143 RegMap( L M ~ ) 0.154
Language 2 0.455 0.468
Language 3 0.754 0.754
Language 4 1 .oo 1.oo
Experiment 1. RegMap from Signals to Meanings Participants were asked to learn the meanings of words in an artificial language as best they could. During training, object-label pairs were presented on a computer monitor one at a time, and participants proceeded to the next pair by clicking the mouse in their own time (training duration: mean 10.2 mins, range 6.8-14.5). The whole language was shown three times, with breaks between each. Participants were then tested on the same objects they had seen in the training phase, and asked to type in the corresponding words for each object in the language they had learnt. We measured how well the structure of the signals produced by the participants mapped to the structure of the meanings provided (i.e. RegMup(S +M ) ) .
320
Experiment 2. RegMap from Meanings to Signals The experimental setup was identical to Experiment 1, except that in the testing phase participants saw screens showing one of the labels and all the objects; they were asked to click on the object that they thought corresponded to the label. In this experiment, we measured R e g M a p ( M + S),or how well the meanings participants chose reflected the structure of the signals provided. Since the results of both experiments are comparable, they are presented and discussed together in the following sections. 3.1. Results We examine R e g M a p for individual signal dimensions (syllables) with respect to the different meaning dimensions. For each signal and meaning dimension, Figure 1 shows the change in R e g M a p between the input and output languages, for both signal and meaning dimensions. Positive changes indicates that the participant has increased the systematicity with which the relevant dimension is encoded, while negative changes indicate that the systematicity has been reduced. Signal and meaning dimensions show similar, but not identical, distributions. The three signal distributions are significantly different (one-factor ANOVA: F(2,117) = 1 9 . 5 5 4 , ~< O.OOl), as are the three meaning distributions (one-factor ANOVA: F(2,117)= 2 1 . 7 4 2 , ~< 0.001).
Figure 1. Change in RegMap between input and output languages, by signal dimension (left) and meaning dimension (right). Plot shows inter-quartile range and median change.
Figure 2 shows R e g M a p for the output languages plotted against R e g M a p for the input languages provided to participants. Visual inspection of the plots in Figure 2 reveals a very high degree of individual variation, as all participants in each vertical row of data were exposed to exactly the same input language. Nevertheless, there is a significant effect of RegMap for the input language on the resultant R e g M a p in the output language, both for signals to meanings (single
321
01
01
Figure 2. R e g M a p ( M -+ S ) (left) and R e g M a p ( S + M ) (right) showing the languages produced by participants as a function of the R e g M a p of their input language. Vertically arranged datapoints (left to right) are from participants trained on languages 1-4;each point corresponds to one individual. Points above the x = y diagonal show participants who increased the language's systematicity.
factor ANOVA: F(3,36)= 21.581,~< 0.001) and for meanings to signals (single factor ANOVA: F(3,36) = 3 6 . 8 4 8 , ~< 0.001).
3.2. Discussion We note that in all these languages, COLOUR, SHAPE, INSET are mainly encoded in 6 1 , C T ~u3 , respectively; we cannot therefore know whether the significant differences between signal and meaning dimensions in Figure 1 are due to (for instance) colour being more salient than shape, or C T ~being more salient than 0 2 . We plan to adapt the paradigm to explore these effects separately in future studies. Nevertheless, the results provide support to the well-established finding that word beginnings and endings are particularly salient (Jusczyk et al., 1999; Saffran et al., 1996) and that structure in the middle of signals is more susceptible to being lost. Our preliminary results suggest also that participants are sensitive to, and can reproduce, regularities in the mappings between signals and meanings at different levels, without explicit instruction; that there are great individual differences in these abilities and that, in some cases, RegMap is greatly increased. 4. Conclusion
We have defined a novel metric to quantify the systematicity of languages, and measured how the metric is affected by individual learning. Learning generates new linguistic variants and thus provides an impetus for language change, yet also, since languages with higher levels of RegMap are learnt with greater fidelity, the kind of learning quantified here offers a potential cultural mechanism for the accumulation of structure in language during cycles of learning from experience and transmission.
322 Acknowledgements Monica Tamariz holds a Leverhulme Trust Early Career Fellowship; Andrew Smith is supported by Arts and Humanities Research Council Grant AR-112105. References Barlow, M., & Kemmer, S. (2000). Usage-based models of language. University of Chicago Press. Bod, R., Hay, J., & Jannedy, S. (Eds.). (2003). Probabilistic linguistics. MIT Press. Brighton, H. (2002). Compositional syntax from cultural transmission. Art@cial Life, 8( l ) , 25-54. Bybee, J. L., & Hopper, P. J. (Eds.). (2001). Frequency and the emergence oflinguistic structure. Amsterdam: John Benjamins. Croft, W. (2000). Explaining language change: an evolutionary approach. Pearson. Croft, W. (2001 ). Radical construction grammar: syntactic theory in typological per.spective. Oxford: Oxford University Press. G h e z , R. L. (2002). Variability and detection of invariant structure. Psychological Science, 13(5), 43 1-436. G6mez, R. L., & Gerken, L. (2000). Infant artificial language learning and language acquisition. Trends in Cognitive Sciences, 4(5),178- 186. Jusczyk, P. W., Goodman, M. B., & Baumann, A. (1999). Nine-month-olds’ attention to sound similarities in syllables. Journal of Memory and Language, 40(l), 62-82. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In E. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and comnputational models (pp. 173-203). Cambridge University Press. Langacker, R. W. ( 1 987). Foundations of cognitive grammar: theoretical prerequisites (Vol. I). Stanford, CA: Stanford University Press. Maye, J., Werker, J. F., & Gerken, L. (2002). Infant sensitivity to distributional information can affect phonetic discrimination. Cognition, 82(3), BlOl-Bl 11. Pierrehumbert, J. B. (2006). The statistical basis of an unnatural alternation. In L. Goldstein, D. H. Whalen, & C. Best (Eds.), Laboratoryphonology viii (p. 81-107). Mouton de Gruyter. Saffran, J. R., Newport, E. L., & A s h R. N. (1996). Word segmentation: the role of distributional cues. Journal of Memory and Language, 35(4),606-62 1. Schyns, P. G., & Murphy, G. L. (1994). The ontogeny of part representation in object concepts. In D. L. Medin (Ed.), The psychology of learning and motivation (Vol. 3 1, pp. 305-349). New York: Academic Press. Shannon, C. E. (1948). A mathematical theory of communication. Bell System Technical Journal, 27,379-423 and 623-656. Siskind, J. M. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61.39-91, Smith, K. (2003). Learning biases and language evolution. In Proceedings of the 15th European Summer School on Logic, Language and Infonnation. Smith, K., Smith, A. D. M., Blythe, R. A,, & Vogt, P. (2006). Cross-situational learning: a mathematical approach. In P. Vogt, Y. Sugita, E. Tuci, & C. Nehaniv (Eds.), Symbol grounding and beyond (p. 31-44). Springer.
EMERGENCE OF SENTENCE TYPES IN SIMULATED ADAPTIVE AGENTS RYOKO UNO, TAKASHI IKEGAMI The Graduate School of Arts and Sciences, University of Tokyo, 3-8-1 Komaba, Tokyo 153-8902, Tokyo , Japan. DAVIDE MAROCCO, STEFAN0 NOLFI Institute of Cognitive Science and Technologies, CNR, Via San Martinodella Battaglia 44, Rome, 00185, Italy This paper investigates the relationship between embodied interaction and symbolic communication. We refer to works by Iizuka & Ikegami and Marroco & Nolfi as the examples of simulating EC (embodied communicating) agents, and argue their differences in terms of joint attention, a class of communication between cognitive agents. Then we introduce a new simulation to bridge the gap between the two models; with the new model we demonstrate the two pathways to establishing agents’ coordinating behaviors. Based on the simulation results, we explain the typology of sentences (such as ‘declarative’, ‘imperative’ and ‘exclamative’ sentences) from a communicative point of view, which challenges the traditional views of formalizing grammar.
1. Introduction Artificial life study provides a test bed for examining how symbols and grammars emerge in minimal interacting systems. For the last 10-15 years, artificial life studies have contributed greatly to this direction, and the origin and evolution of language has become a target of scientific study (see e.g. Cangelosi & Harnad (2000), Quinn (2001), Kirby (2002), Sugita & Tani (2005), Steels (2006), Sasahara & Ikegami (2007)). In this paper, we will compare Iizuka and Ikegami’s turn-taking study (2004, 2007) with Marcco and Nolfi’s signaling game (2006) from linguistic points of view (Uno & Ikegami, 2003). Since the two research studies have many things in common but stress different viewpoints, it is useful to compare their studies to clarify the problems lying between verbal and non-verbal communicating systems. We will examine a variation of Marroco and Nolfi’s model to see the effect of symbolic communication with embodiments. 323
324 A typological classification from linguistics and the concept of joint attention will help us to sort out the problem. Joint attention (henceforth “JA”) is a coordinated preverbal behavior among two or more persons (See e.g. Gomez, Sarria & Tamarit (1993), Murray & Trevarthen (1993) and Tomasello (1999, 2003)). A simple example is a children’s pointing behavior under the attention of the mother. It is a process of sharing one‘s experience of observing an object or events with others by following pointing gestures or eye gazing. We distinguish two types of joint attention. If a person uses joint attention as a tool to achieve a goal (e.g. establishing the joint attention to let your dog pick up a ball), we call it “instrumental joint attention”. But if a person takes joint attention itself as a goal, we call it “participatory joint attention”. For example, two people looking at the same sunset establish the participatory joint attention as it doesn’t require further achievements. We assume that participatory and instrumental joint attentions form continuous spectra. We extend the meaning of JA and call any performance such as language or dance that would make two people pay attention to the same thing (except for each other) JA in a broad sense. In the following section, we compare two communication models to clarify the important issue. By developing concepts and ideas, we newly interpret and generate variational studies on Marocco and Nolfi’s modeling in section 3. In the discussion part, we reconsider the organization of sentence typology from interaction perspectives based on the new simulation results.
2. Previous studies 2.1.
Iizuka and Ikegam. (2004)
A task of the simulation in Iizuka & Ikegami (2004) is the maintenance of turns and spontaneousswitching of turns between two mobile agents on the twodimensional field. The agent’s internal neural circuits were evolved by a genetic algorithm. In the early stage of the evolution, the turn-taking is geometrical and regular in space and time. Agents are “automatically” taking turns similar to two pendulums adjusting their phases. But in the later stage of the evolution the pattern becomes more chaotic and dynamic. Agents are changing their positions temporally and its timing can vary from time to time. A remarkable property of those chaotic dynamics is that the agents can cope with agents from different generations.
325 2.2. Marocco & Nolfi (2006) A task of Marocco & Nolfi’s simulation work (2006) is to solve a collective
navigation task by evolving four mobile robots. To achieve the task, agents go to a target area, making a pair and staying there. A robot (of a circular body) has 14 sensory neurons. By the eight infrared sensors, the robot can detect obstacles and other robots nearby. By the ground sensor, the robot can detect the color of the ground. The floor is colored white except for the target area which is colored black. By the four communication sensors, the robot can perceive the signals emitted by the other robot. By the self-monitoring sensor, the robot can recognize the signal emitted by itself a step before. Here the communication by the signal is spatially limited; a robot can hear the signal emitted by the nearby robot. The agent’s internal neural circuits were developed by a genetic algorithm. In the early time phase, the robots explore the environment. In this phase their signals reflect the environment and the signal of the other robots. Then the speaker-hearer distinction appears. Using the signal of the agent inside the target area the other one can come into the target area. Finally, both agents come inside the target area and their signals start to synchronize. 2.3. Comparison of the two simulations
Two models are focusing on the different kinds of communication and both of them use synchronization in different ways. In Iizuka & Ikegami’s model (2004), turn-taking is established by alternatively switching positions. As the result, their spatio-temporal patterns of sequential turn-taking look like the same patterng. But their internal states or headings aren’t synchronized in the rigid sense. In the case of Marocco & Nolfi’s model (2006), a synchronized oscillation is observed in several sensory channels. When two agents come to stay in a target area, they are communicating with each other by having synchronized signaling patterns. On the other hand, Iizuka & Ikegami’s model shows more dynamic interaction with each other. That is, turn-taking is maintained by creating chaotic but correlated dynamics, which is generated by the interaction between the agents. In Marroco & Nolfi’s model, agents communicate in rather static ways. Their signaling pattern is already separated from their embodiment and becomes a function of the task to solve. Lastly, agents in Iizuka & Ikegami’s model always aim for the participatory JA, whereas Marroco & Nolfi’s model demonstrates the transition from the instrumental to participatory JA. This transition is interesting and useful for classifying the possible interaction between the internal states of the agents. The
326 other critical issue here is that Iizuka & Ikegami put the stress on the difference between mere synchronization and the inter-subjectivity achieved by participatory JA. Shared intentionality requires detecting the intentionality of the other agents, which is translated to the sensitivity of the styles of turn-taking in their models. Without having the variety of styles of motion, it is difficult to examine the JA in fruitful ways. Thus, they used co-creativity and cooperation instead of synchrony. As we see above, there are advantages and disadvantages in those two models. To consider the evolution of communication styles, it is important to pose a new model that is a sort of compromise between them.
3. Variations of the Signaling model 3.1. New setup
In modifying Marocco and Nolfi (2007), three changes are made. First, in the training phase the target area is not only in black but it can be any shade of gray. Second we only have two agents in the field. Thirdly, now robots can hear each other’s signals all over the environment (in their first model, the signaling communication was limited to the local neighborhoods). These changes resultsin the different “strengths” of symbols mediating the collective behavior. Because the signals for the target domain become uncertain, the agents try to use other sensory channels to enter the target areas. Secondly, the infrared sensors change their meanings. Since there are only two agents in this new environment, when the infrared sensors are activated within a target area, it means either that the other agent or a wall exists in the proximity. In the previous setup, it was not able to use the infrared sensor to detect the agent who is in the target area because there were many robots. Since the signal for the target area is made uncertain by temporally being varied in target color and since there are only two agents, detecting agents by the infrared sensor now becomes possible. Finally, to discriminate the wall and the other robots, synchronization becomes useful, which is also true for the previous experiments. It has to be emphasized that the signal for the infrared sensor sometimes overrides the signal for the target area. In this case, synchronization is used only to find out whether the robot is interacting with another robot or not. Signals are used rather to build up a ground of interaction between the two. 3.2. Twof o r m of collective behaviors
There are two collective navigation forms observed in this new setup. In the first type, a synchronization of all channels is organized inside the target area (We
327
call this “JA inside a target area”). In the second type, a synchronization of infra-red (IR) and communication (C) channels is organized but not the ground channel (G) outside the target area (We name this “JA outside the target area”). The following diagrams in Figure 4 show how the interrelation between robots changes over time. In the case of “JA inside a target area”, both robots explore inside the field in the beginning stage. Then one of them gets into the target area by chance and emits intensive signals, which the other can hear. Here their signals have no correlation at all. But when the one outside the target area can get inside (sometimes guided by the first agent) they then get close to each other. When both agents’ IR sensors are turned on, the C sensors show a strong synchronization. But this collective state isn’t that stable compared to the previous model, and the collective state breaks up by losing the IR sensory patterns due to the fact the agents move apart within the target area. When this happens, the synchronization of the C sensory pattern is weakened (i.e. only showing anti-phase synchrony). Eventually, one of the agents leaves the target area. On the other hand in the case of “JA without a target area”, the trial starts from exploration and then they find each other using the IR sensors. Once both IR sensors are turned on, then their C signals get into synchronization. Namely, the trigger of the synchronization is given by the communication channel, but only after the infrared sensor’s synchronization can the communicative channel synchronize. The potential cue for establishing JA is in the order of ground sensor < infrared sensor < communicative channel. We interpret the results in the following way using the states of three channels. At the exploration phase, all the three sensors are off, which we express as a state (-, -,-). The three positions stand for the ground sensor, the infrared sensor and the communication channel, respectively. A communication channel is defined as “+” (on) state when it shows periodic waves. Then in the case of JA inside the target area, sensors end up with the value of (+, +, +). In the case of JA outside the target area, it starts from (-,-,-) and ends up with (-, +, +). Based on the typology of joint attention, we can say that, primary intersubjectivity is present when we have the value (-, -, -) and secondary intersubjectivity or participatory JA as (*, +, +). And, instrumental JA is present when one has (-, -,+)while the other has (+, -,-). The following points should be noted. Close analysis of synchronization reveals that phase synchronization comes first and amplitude synchronization comes afterwards in the case of JA withidwithout a target area. Since the infrared sensors are passive, those patterns are easily lost while agents are
328
moving around. But the communication channels can be actively synchronized by modifying the communication channels.
I
-
9 0
I
I
U
,
I
I
,
a
I
m
I
I
Figure 1. Communication sensory data of two agents as a function of time in case of JA inside a target area. They demonstrate synchronization around 400-800 and 1300-1500 time steps. Antiphase oscillation is observed around 800-1 100 time steps.
Figure 2. Communication sensory data of two agents as a function of time in case of JA outside a target area. They demonstrate synchronization around after 1300 time steps.
An implication from this itinerant behavior observed in this new setup is that the infrared and communication channel synchrony is more unstable than in the case of reliable ground sensing. Because agents can't trust the ground sensor pattern, they organize imaginary synchrony without a target area, which we consider an essential function of language, as will be discussed in the discussion section.
329
4. Discussion: Sentence Typology Up to this point we have observed three types of interaction from the perspective of JA. First we compared the models of turn-taking and communication with signaling. While turn-taking only realizes one type of JA, namely participatory JA, communication of signaling realizes various types of JA that ranges from instructive JA to those which have some characteristics of participatory JA. In this paper we have modified the communication with a signaling model to see whether we can observe the emergence of a pure participatory JA continuously from instrumental JA. We have focused on the nature of the coupling by analyzing the behavior of the sensors. We propose that the results of modeling interaction can be linked to linguistic structure via the typology of JA (Uno & Ikegami, 2003). Traditionally, what a speaker is going to do with linguistic expression and the linguistic structure does not have to be mutually correlated. The speech act is conventionally linked with linguistic structure called sentencetype. Sentence typology is originally based on the intention of the speaker. Based on typological research, Sadock and Zwicky (1985)argue that the major forms of sentences are declaratives, imperatives and interrogatives. Declarative sentences are forms which are combined with the speech act of making statements. Imperative sentences are combined with commanding. And interrogative sentences are combined with questioning. Exclamative sentences also are related to surprises. Reinterpretation of sentence typology from the interaction pattern rather than the speaker's intentional stance provides us a novel view on the classification of sentences. Among sentences, declaratives are thought to be the most typical ones. Most of linguistic analysis is based on declarative sentences, and usually declaratives are thought to be used for information transmission. However, as we discussed, "a full declarative" is defined as a special way to share the ground of speech between the speaker and the hearer. A complementary category of sentences is "full imperative". Sentences in this category use the sharing of ground of speech as a tool to achieve some purpose. The speaker's use of a sentence affects the behavior of the hearer (e.g. "come here! This idea of declarative sentences points out that language may help establishing JA in a broad sense. Using the term JA we can restate the argument in the previous section as follows: Full declaratives are sentences that are used for participatory JA and full imperatives are used for instrumental JA. Exclamatives are the special case used as a primary inter-subjectivity, which is often established between a baby and his or her mother. It is not necessarily true that the intention of the speaker exists before a communication starts. Intention is a co-product of a
330 communication itself, which was also true in the case of simulated turn-takings. T h e problem is that if w e only focus o n the speaker’s intention w e do not know w h y only these speech acts are grammaticalized and not the others. Instead w e aimed to show w h a t two people sharing the ground o f speech has to d o with the grammaticalization of speech acts. By applying the notion “joint attention” w e might b e able to understand w h y w e have these types o f sentences.
References Cangelosi A. & Hamad S. (2000). The adaptive advantage of symbolic theft over sensorimotor toil: Grounding language in perceptual categories. Evolution of Communication 4(1), 117-142 Gomez, J. C., E. Sarria, and J. Tamarit. (1993). A comparative approach to early theories of mind: ontogeny, phylogeny and pathology, S. Baron-Cohen, H. Tager-Flusberg and D. J. Cohen (eds.) Understanding Others Minds: Perspectives from Autism. Oxford: Oxford University Press. 195-207 Iizuka, H. & T.Ikegami (2004). Adaptability and Diversity in Simulated Turn-taking Behavior, Artificial Life 10,361-378. Ikegami, T & Zlatev, J. (2007). From pre-representational cognition to language. J. Zlatev, T. Ziemke, R. Frank, R. Dirven (eds.) Body, Language and Mind 2, Berlin: Mouton de Gruyter. 241-283. Ikegami, T. & H. Iizuka (2007). Turn-taking Interaction as a Cooperative and Cocreative Process, Infant Behavior and Development 30,278-288. Kirby, S. (2002). Natural language from artificial life, Artificial Life, 8(2), 185-2 15. Marocco, D. & S. Nolfi. (2006). Self-organization of Communication in Evolving Robots. Luis M. Rocha et al., (eds.) Artificial Life X. MIT Press. 178-184. Murray, L., & C. Trevarthen. (1993). Emotional regulations of interactions between twomonth olds and their mothers. T. Field, & N . Fox (eds.), Social Perception in Infants. Nonvood, NJ: Ablex. 177-197. Quinn, M. (2001). Evolving communication without dedicated communication channels, In J. Kelemen & P. Sosik (eds.) ,ECALO1. Springer. 357-366. Sadock J. & A.M.Zwicky (1985) Speech act distinctions in syntax. T. Shopen (ed.) Language Typology and Syntactic Description 1. Cambridge: Cambridge University Press. 155-196. Sasahara, K & T. Ikegami. (2007). Evolution of Birdsong Syntax by Interjection Communication, Artificial Life 13, 1-19. Steels, L. (2006) Semiotic dynm’ics for embodied agents, IEEE Intelligent Systems 2 1(3), 32-38. Sugita Y. & J. Tani. (2005). Learning semantic combinatoriality from the interaction between linguistic and behavioral processes, Adaptive Behavior 13, 133-52. Tomasello, M. (1999). The Cultural Origins of Human Cognition. Cambridge: Harvard University Press. Tomasello, M. (2003). Constructing a Language: A Usage-Based Theory of Language Acquisition. Cambridge: Harvard University Press. Uno,R. & Ikegami,T. (2003). Joint attention / prediction and language: A mechanism to align intentionalities. Papers in Cognitive Linguistics 2, Tokyo: Hituzi, 23 1-274.
DESPERATELY EVOLVING SYNTAX
JUAN URIAGEREKA Linguistics, University of Maryland at College Park, College Park, MD 20742, USA The Chomsky Hierarchy (CH) gives a first approximation as to where human syntax lies in an abstract logical space: the generating device accepting appropriate languages should be slightly more powerful than a standard Push-Down Automaton (a PDA+), although for familiar reasons not much more so. An evolutionary study of syntax ought to give us some clues as to how a PDA+ could have emerged in brains. The goal of my talk is to provide an approach to this question that is informed about contemporary considerations in the Eva-Devo paradigm and, especially, standard results in the study of syntax
1. Syntactic Boundary Conditions The theoretical study of syntax has determined, at least, that: (1) a. Dependencies arrange themselves in terms of formal objects that can be quite high within the Chomsky Hierarchy of automata (CH). b. Context-sensitive dependencies are generally triggered. and in any case structure-dependent and limited by locality considerations. c. Semantic dependencies are determined by syntactic dependencies and obey Full Interpretation, Compositionality, and Conservativity. d. Morphological variation, of the sort patent across languages, in many instances involves uninterpretable formatives. e. Core language acquisition involves the fixation of a few fixed, normally morphological, syntactic options (‘parameters’). (la) and ( l b ) are structural specifications, while (lc) and (Id) are interface conditions. There may be further properties the syntax system has, but for the broad purposes of evolution this should suffice - and seems uncontroversial. (le) is the least established among the results in (I), as no full account of language acquisition has been developed. The point for us is that the options the child faces are very limited and ostensive, possibly even just morphophonemic and/or lexical variants. In a system that is seen to emerge at the interface with 331
332 interpretive components, uninterpretable features (1d) are at least surprising. That linguistic systematicity should be of the compositionality sort (lc) is also remarkable, going to an extreme in elements that relate a restriction and an even more complex (often syntactically unrelated) scope; there is no logical way to guarantee that natural language quantifiers should only be conservative (relating to their restriction in a tighter way than to their scope). It is because of specifications as in (lc/d) that claim (la) is made, qualified as in (lb). Recall that levels within the CH of automata invoke degrees of systemic memory deployment: from its absence in the less structured (finite-state) layer to unlimited capabilities at the other extreme (the Turing machine). Human syntax falls in between context-free and context-sensitive relations. The former are the least one needs to express compositionality, impossible to express at the finite-state level. The latter are required to express discontinuous morphological dependencies and the conservative property of quantifiers. One may code context-sensitive relations in language via type lifters, slash categories, indices, threads, traces, copies, etc. However, all of these involve a ‘PDA+’ automaton, requiring more derivational memory than a PDA - so that internal manipulations within phrases are permitted - but less than the next one in the series - so that they are not generalized beyond observable limits. 2. Where do Standard Accounts Fail to Meet Syntax? The importance of the exercise above is to establish a minimal system to model the evolution of. If put in computational terms, the automaton that must have evolved for accepting relevant linguistic structures is of the PDA+ sort. The evolutionary literature does not fulfill that minimal requirement. For example, the classic Pinker and Bloom (1990) virtually ignores syntax, even if it exhibits more expertise than what is demonstrated by such notable authors as Michael Arbib, Terence Deacon or even Phillip Lieberman. In all of these instances, so far as I know there isn’t a single detailed discussion of the sorts of facts that (1) outlines, presupposing a PDA+ architecture of grammar. A second group of works gets closer to the concerns just discussed. For instance, Carstairs-McCarthy (1999) takes phrasal syntax to be an outgrowth of syllabification, while for Calvin and Bickerton (2000) it is an exaptation from preexisting thematic relations. Unfortunately, it is unclear how either approach bears on whatever goes beyond local dependencies. Such stories would remain unchanged
333
even if the language faculty did not present discontinuous dependencies, or they obeyed conditions that are the inverse of what holds. Other relevant pieces are designed computationally. Thus Kirby (2000) explores the learners’ capacity to segment utterances and generalize over chance coincidences in their meanings of identical segments. Again, such a system does not go beyond phrasal associations and the meaning compositionality thereof, at best reaching PDA capabilities. A similar approach is taken by Hurford (2000), still emphasizing broad generalizations and regularizations, or Nowak et al. (2002), which shows -also through modeling- that beyond a threshold of active word usage it is advantageous for a system to deploy phrases. What would be impressive is to show how discontinuous relations emerge. Given the CH, a transition to a PDA+- presupposes the PDA describing mere constituents, and so an evolutionary change taking an organism into PDA+ territory automatically carries it over intermediate realms. In other words, we don’t know whether syntax got to be PDA+ directly or through PDA stages, as these studies assume. It may well be that a PDA+ automaton is an even more effective way to compress the dependencies that arise with linguistic symbols. But then why should the grammar go through a ‘mere’ PDA stage? Implicit in alleged alternatives to core syntax is that structure ‘is there’, so that learners reflect it. But the question is how it ‘got there’. A PDA could, indeed, compress information that would be only clumsily describable in terms of a finitestate automaton (FSA). If such a result and the corresponding system is not specific to language, a priori it is better to go into a general explanation of this sort. But familiar communication systems, even elicited ones, do not obviously go beyond FSA conditions, let alone PDA ones. So if all there were to language structuring is some clever packing that compactness favors, why should we be the only species that stumbled onto that wondrous feat of general cognition? Why wouldn’t ‘general cognition’ favor other intelligent animals? The evolution of syntax as in (1) won’t be modeled by ignoring syntax, or by asserting that it ‘should follow’ from effective packing, or similar broad considerations. The latter claim would move professionals if we were shown how any of the basic properties we routinely work with do indeed emerge, in detail - and why they are unique to this system. Short of this, the sensation many of us get is that researchers are attempting to desperately evolve syntax.
334 3. Towards a Different Approach In contrast to the proposals alluded to, the programmatic Hauser et al. (2002) is sensitive to the matters raised here, dividing those aspects of language that humans share with other species (the Broad Faculty of Language, FLB) from those that are unique (the Narrow Faculty of Language, FLN). The piece furthermore raises the empirical challenge of deciding which features of syntax fall within each. For example, recursion is seen as a property of information-exchange systems that has not, so far, been encountered in non-human species, and so is hypothetically declared part of FLN. In turn, some phonetic and semantic features are shown to have been spotted in other creatures, and are thus taken to be part of FLB. The paper suggests that syntax is part of FLN. Hauser et al. has raised much discussion. Pinker and Jackendoff (2005) attack it on the basis that not just recursion is unique to language. However, nothing in the logic of the criticized piece prevents specific (say, phonetic) properties from having emerged in the faculty after the emergence of recursion. More importantly, some of the syntactic properties Pinker and Jackendoff observe as part of syntax - and not ‘recursive’ - (agreement/case, displacement, binding, etc.) are analyzed by professionals in terms of PDA+ devices, which presuppose PDA ones. A system without recursion would ips0 facto be incapable of such behaviors. And although an interesting question is whether recursion itself opened up to such capabilities, it borders irrationality to logically separate them from the automaton that decides on recursive properties. All of this bears directly on the course of human evolution. Coolidge and Wynn (2004) suggest that the two known sub-species of Homo sapiens differ in the sort of memory that determines rule-governed behavior. Our species is genetically distinct from Neanderthals, having separately evolved in Africa, from where it migrated elsewhere carrying the same underlying syntax seen throughout the world’s languages. The question is, then, how the language faculty can ‘bloom’ into the system it must have been already in the Upper Paleolithic, from its virtual nonexistence 100,000 years prior. In this regard, it is not a bad idea to try and focus a sudden brain reorganization in a drastic mutation, whether it be based on recursion or something presupposing it. Enigmas for evolutionary theory arise with any system that emerges rapidly, such as Adaptive Immunity (AI), which appeared abruptly in the cartilaginous fish though not in the sister lamprey lineage. Key to A1 is a kind of memory, which preserves both the recognition of signal sequences and the RAG proteins, responsible
335
for initiating the recombination gene segments of the antigen-recognizing receptors. Piattelli-Palmarini and Uriagereka (2004) argue that the accepted theory of how A1 evolved, via the horizontal insertion of a (viral in origin?) transposon, is a model story for systemic emergence. Aside from pertinent to some o f the properties seen in syntax (e.g. the memory it requires, which resembles A1 conditions), thinking that way addresses a difficulty for linguistic emergence, emphasized by Lewontin (1990). New traits need not lead to more offspring, even if advantageous to the individual showing them; a linguistic trait may be detrimental for an individual, against the norm of its group. The A1 logic suggests that an entire sub-group may have shown the relevant trait, epidemically. If so, the sudden emergence would be guaranteed. 4. Searching for Real Answers That approach, too, may ultimately seem like ‘desperately evolving syntax’, at least to some. But then the real metric of success ought to be which of the two lines of reasoning sketched here fares better in predicting the observable syntactic phenotype. That evaluation, to my knowledge, has not been attempted. The alternative approach I am suggesting is based on the Evo-Devo project in biology, which shows how species arise from gene arrays, invoked without any simple correlate in the phenotype. While recursive thought may have preceded FLN, at least its use in an information exchange system seems uniquely human. A s Hurford (2000) notes, self-embedding - a defining property of recursion - poses special processing problems (online storage cannot easily distinguish several tokens of the same processed category). According to both Bickerton (2000) and CarstairsMcCarthy (2000), the shift to language came with a steep rise in signal processing capacity, which may have addressed the successful identification, in on-line terms, of complex recursion. Then we may need to find the bio-molecular bases for the parser that allowed us to squeeze multi-dimensional ‘syntactic thoughts’ into speech units, and back. Whatever was used for that purpose should also be related to all the array of syntactic properties that make us conclude the system exhibits PDA+ conditions. In that regard, it is worth exploring the Procedural Deficit Hypothesis in Ullman and Pierpont (2005), which they take to underlie co-morbid syndromes of the linguistic and the motor sort. This theory gives a central role to so-called procedural memory, of a kind that psychologists find related to ruled-governed behaviors. Now, rather than taking this hypothesis as an indication that relevant conditions are non-
336 linguistic and broadly cognitive, it is more useful to characterize matters in terms of the sort of memory a PDA+ abstractly requires. Thus we may assume the matter is manifested in syntactic terms, and in those domains where it extends beyond obviously linguistic behaviors (from rhythmic abilities to the capacity to tie knots), inasmuch as these activities too seem uniquely human, they should be thought of as parasitic on human (context-sensitive) syntax. Thus they would not be witnessed prior to the cognitive explosion leading to our species (Camps and Uriagereka 2005). Aside from directly testable (the way to prove this hypothesis wrong is to find non-human species capable of such behaviors, or to demonstrate how putatively non-linguistic human conducts could not have parasitized the syntactic machinery), the present approach has another advantage: it bears directly on the context-sensitive constructs syntacticians argue for. Although the present one is far from being a complete program, it addresses the issue of computational memory: It is not possible to evolve PDA+ capabilities without an increase in memory capacities, whatever those are. Needless to say, in addition to that basic premise we will also need to understand what it means for a brain to embody such a memory, or what sorts of units it stores. But it is intriguing that the one gene isolated in relation to ‘vocal learning’, FOXP2, should be expressed in circuits where procedural memory effects are detected. References Bickerton, D. 2000. ‘How protolanguage became language.’ In Knight, StuddertKennedy and Hurford 2000,264-284. Camps, M. and J. Uriagereka. 2006. ‘The Gordian Knot of Linguistic Fossils.’ In The Biolinguistic Turn, eds. J. Martin and J. Rossellb, 34-65. Publications of the University of Barcelona. Carstairs-McCarthy, A. 2000. ‘The distinction between sentences and noun phrases: An impediment to language evolution?’ In Knight, Studdert-Kennedy and Hurford 2000,248-63 Hauser, M., N. Chomsky and T. Fitch. 2002. ‘The faculty of language.’ Science, 198,1569-1 579. Hurford, J. 2000. ‘The Emergence of Syntax.’ In Knight, Studdert-Kennedy and Hurford 2000,219-230.
337 Knight, C., M. Studdert-Kennedy and J. Hurford (eds.). 2000. The evolutionary emergence of language: Social function and the origins of linguistic form. Cambridge: CUP. Lewontin, R. 1990. ‘How much did the brain have to change for speech?’ Behavioral and Brain Sciences, 13,740-741. Piattelli-Palmarini, M. and J. Uriagereka. 2004. ‘The Immune syntax.’ In Universals and Variation in Biolinguistics, ed., L. Jenkins, 341-377. London: Elsevier. Pinker, S. and R. Jackendoff. 2005. ‘What’s Special about the Human Language Faculty? Cognition, 95, 201-236. Ullman, M. and E. Pierpont. 2005. ‘Specific Language Impairment is not specific to language: The Procedural Deficit Hypothesis.’ Cortex, 41,399-433. Wynn, T. and F. Coolidge. 2004. ‘The expert Neanderthal mind.’ Journa[ ofHuman Evolution, 46,467-487.
CONSTRAINT BASED COMPOSITIONAL SEMANTICS
VAN DEN BROECK, WOUTER Sony Computer Science Laboratory, 6, rue Amyot, Paris, 75005, France [email protected]
Abstract
This paper presents a computational system that handles the grounding, the formation, the interpretation and the conceptualisation of rich, compositional meaning for use in grounded, multi-agent simulations of the emergence and evolution of artificial languages. Compositional meaning is deconstructed in terms of semantic building blocks which bundle a semantic function together with the relevant grounding and learning methods. These blocks are computationally modelled as procedural constraints, while the compositional meaning is declaratively represented as constraint programs. The flexibility of the data flow in such programs is utilized to adaptively deal with interpretation and learning. The conceptualisation is performed by a sub-system that composes suitable constraint programs. The various methods used for managing the combinatorial explosion are discussed. 1. Introduction
One way to study the evolution of language is to simulate the emergence and evolution of artificial languages in multi-agent experiments. An important issue in such experiments concerns the involved meaning. This meaning has to be "rich" for experiments that focus on the emergence of grammar. This paper presents an integrated system that handles the grounding, the formation, the interpretation and the conceptualisation of such meaning. The kind of meaning covered by the system consists of concepts and semantic functions. Concepts are here considered to be category-like entities such as colors, shapes, events, relations, roles, etc. The grounding of these concepts requires some grounding method. Examples of such methods are neural networks as used in (Plunkett, Sinha, Moller, & Strandsby, 1992), probability density estimation as used in (Roy & Pentland, 2002), or discrimination trees as uscd in (Stccls, 1996). Each concept that is grounded with a particular grounding method, corresponds to a particular set of concept parameters, which are used by that method. Each agent constructs and maintains its own repertoire of concepts. The acquisition of a grounded concept requires a learning method for use with
338
339 the concerned grounding method. Back-propagation can for instance be used as the learning method for concepts grounded in terms of multi-layered perceptrons. The role of a concept in the meaning of some utterance depends on the semanticfunction that uses this concept. Concepts are for instance used to categorise perceived entities in order to filter out those that do not fit the category. Interpreting “the red ball”, for example, involves a filtering of the context such that what is retained is the object that best corresponds with both the color category RED and the shape prototype BALL. Other semantic functions are quantification, predication, negation, deictic reference, etc. Semantic functions are considered to be recruited from the general cognitive capabilities. Their evolutionary origin is thus not considered here. 2. Semantic building blocks
Rich, compositional meaning often involves different types of concepts. There is, however, no grounding method that is equally well suited for all types of concepts. The proposed system therefore accommodates different grounding methods. The structurally coupled evolution of language and concept repertoires furthermore requires a close interaction between the grounding and learning methods on one hand and the semantic functions on the other. Each semantic function is therefore bundled together with the relevant grounding and learning method, and encapsulated in a sernuniic building block. Each such block is equipped with a number of slots. These slots are used to get or set the arguments, such as the concepts and contexts, over which the semantic function operates. An example of a semantic block is calledjlter-set-prototype. This block has three slots for the arguments it takes, i.e. a source-set, a target-set and a prototype concept. The behaviour of this semantic block depends on the availability oI‘ the arguments. If the source-set and the prototype are given, which is the case in a regular interpretation process, then the block can derive the target-set. This set contains all entities in the source-set that match the given prototype. If the source-set for instance contains all the objects in the observed scene shown in figure 1 , and the prototype concept is for example the shape BALL, then the target-set will contain all ball-like objects in the source-set, i.e. 03, 04 and 0.5. The meaning of the utterance “the balls” could thus be represented by a structure that includes this>lter-set-prototype block. Different arguments are available in a learning situation. Consider for instance a situation in which the speaker used the utterance “the frouple” to discriminate object 01 in figure 1. The hearer indicated that he/she could not understand this utterance. The speaker then drew the attention to the topic by pointing to it. This presents a learning opportunity for the hearer. The filter-set-prototype block now has the source-set, which includes all objects in the scene, and the target-set, which contains the topic. It can try to infer the concept that could account for the filtering from the source-set to the targetset. The hearer could assume that this concept is the one meant by the word “frouple” and add this mapping in hisher lexicon.
340
Figure 1. A scene with a number of labelled objects of varying size and shape.
3. Constraint programs A semantic building blocks can have multiple operational modes depending on the availability of the arguments. Put differently, each block represent an omnidirectional relationship among a number of variables. Such relationships can be computationally modelled as constraints. The encapsulated functionality that implements the grounding and learning method and the semantic function enforre the relationship. The resulting procedural constraints can however be declaratively combined by linkinga relevant slots. The result is a constraint program that represents compositional meaning.
The constraint paradigm is a model of computation in which values are deduced whenever possible [. ..]. One may visualize a constraint ‘program’ as a network of devices connected by wires. Data values may flow along the wires, and computation is performed by the devices. A device computes using only locally available information (with a few exceptions), and places newly derived values on other, locally attached wires. (Steele, 1980) The interpretation of a constraint program can be seen as a constraint satisfaction problem, for which efficient algorithms exist. Our implementation uses a extension of the AC-4 algorithm (Mohr & Henderson, 1986) which implements a strong form of generalized relational arc-consistency. It involves constraint-ordering heuristics, and uses a look-ahead search to find the actual solutions.
3.1. Exumples Figure 2 depicts the constraint program that represents the meaning for the utterance “the bigger ball”. The particular values and data flow correspond with the interpretation of this program in the context of the scene shown in figure 1. The filter-set-protype constraint takes the context and the BALL prototype, and yields the set that contains all balls. Thefilter-set-comparison constraint takcs this set and the comparator BIG and selects the bigger one, i.e. the topic 04. asuchlinks represent equality relationships
341 BIG
BALL
IL
{oi, ,_,, ._..06) 06)
4
filter-setprototype
1
dl
{033O4+35)4 filter-set-
cornpanson
t,
04
Figure 2. The constraint program and interpretation data flow for “the bigger ball”.
Figure 3 shows the data flow involved in a learning situation. The hearer did not understand the modifier but was shown the topic 04. The hearer did properly understand “ball” and could thus produce the source-set taken by the filter-set-comparison constraint. This constraint can then, given the topic, infer the modifier BIG, and a new entry can be added to the lexicon.
{oi, .... 06)
Figure 3. The data flow involved in the inference of the modifier concept.
Figure 4 depicts the program and interpretation data flow for “the box close to the pyramid”. The~filter-set-relationconstraint takes the set of boxes as source-set, the pyramid as landmark, and CLOSE-TO as relation concept. Given these parameters, it can properly discriminate the topic 02. BOX
IL
C LOSE-TO
L
prototype
PYRAMID
\1
filter-set-
Figure 4. The program and interpretation data flow for “the box close to the pyramid”
4. Conceptualisation
We can now turn our attention to the conceptualisation of the compositional meaning. Since this meaning is represented as constraint programs, its conceptualisation must involve a process that constructs such programs. The input for this process is a communicative goal, such as “discriminate topic
342 in the sensory context". It must construct a constraint program that, when interpreted by the hearer, is expected to satisfy that goal. There are typically many potential programs that could fulfil a given goal. Various criteria are defined for measuring their relative strengths, such as the level of ambiguity involved, the expressibility in an utterance, the complexity, etc. Finding a suitable constraint program is a combinatorial problem. The constraint program composer algorithm used in our system involves a number of techniques and strategies for keeping the combinatorial explosion in check.
z
Eager, incremental search. The algorithm searches for suitable constraint programs by incrementally expanding incomplete programs, one constraint at a time. There can be many candidate constraints at each step. These candidates are handled in separate branches. The expanded programs are evaluated according to some heuristics to decide which branch to expand next. Solutions are found more efficiently with this strategy. Goal-directed search. If the goal is to discriminate a topic in a context, then the target program must be such that the topic can be inferred from the given concepts and context. In other words, one of the potential data flows in that program must be a coherent, non-cyclic one from the context and concepts to the topic. The algorithm tries to satisfy this requirement by only adding constraints that incrementally extend the data flow backwards. Each constraint is added to support a goal. The initial goal is the topic. Each constraint supports a goal by adding a piece of data flow. The added data flow connects the goal with the new sub-goals introduced by the constraint. When a filter-set-prototype constraint is for example added and its target-set slots is linked with the goal, then the new sub-goals are the source-set, unless it is linked with the context, and the prototype, unless it is expressed in the utterance. A more detailed description of this search process can be found in (Van den Broeck, 2007). All potential expansions that do not properly contribute to the data flow, are ignored. This significantly reduces the size of the search space. The number of potential combinations of T constraints from an inventory of n constraints is (); (the multi-set coefficient). The average number of potential links between the slots of T constraints with an average arity of a is s ( k , a ) = ( k - I) a ( ( k - 1)a 1)/2. The total number of potential constraint programs of size k is thus approximately (L)2"'k'"',while the size of the incrementally explored search space of constraint programs of maximum (1)2s(k,a). size IC is approximately For a small test case with 5 kinds of constraints with an average arity of 2.6 and a maximum program size of 6, the total number of partial constraint programs is approximately 5.199348e29. The goal-directed search does however find a suitable program (if there is one) after on average 262 expansions when conceptualising a program for a randomly chosen topic in our benchmark scene collection.
+
c,"=,
343 Interleaved constraint satisfaction. Determining if a constraint program fulfils the goal is done by interpreting it using the aforementioned constraint satisfaction algorithm. This algorithm also identifies branches with inconsistent partial programs, which can be pruned. Interleaving the constraint satisfaction in the incremental search furthermore minimizes the amount of consistency enforcing (when using AC-4), because all enforcing applied on some partial program caries over to the expanded programs. Chunking An additional technique we are currently exploring is chunking. This technique consists of taking a (part of a) successfully used semantic program and wrap it such that it can be re-used as a constraint in future programs. We call these composite constraints, since they are composed of a number of component constraints. The initially given constraints are in contrast called primitive constraints. Figure 5 depicts a constraint program that involves a composite constraint which wraps two primitive constraintsb. This composite constraint has four slots, which are internally linked with the appropriate slots of the component constraints. BOX
BALL
LEFT-OF
BIG
t
Figure 5. The constraint program and data flow for the interpretation of “the box left of the big ball”. This program involves a composite constraint that wraps two primitive constraints.
The composite constraint inventory of an agent is initially empty. New composites are created according to some chunking strategy. We currently use a basic strategy that chunks complete constraint programs. The resulting composite constraints are candidates, just like primitives, with which to expand incomplete programs. Adding a composite corresponds to jumping to a point in the search space that previously proved to be useful. First experiments show that chunking and re-using the resulting composites, significantly improves the performance of the composer algorithm, as shown in figure 6. These telling results were obtained in spite of the basic chunking strategy we currently use. The chunking strategy is also interesting because it can be relevant at the language level. In particular the potential relationship between composite constraints and grammatical constructions is intriguing, but unfortunately bcomposite constraints can also be hierarchically composed
344 2
.-
60
I
p 50 an.
p
40
g
30
.-2E
20
8
2 10
0
0
50
100
wilhouichunking
1%
-
200
250
with chunting
300 350 conceptualisations
----
Figure 6 . Comparison of run-time needed to conceptualise a series of topics, with and without chunk-
ing. beyond the scope of this paper. Finally we would like to note that the composer is also useful when a hearer could not fully reconstruct the constraint program due to misunderstanding or under-specification. The composer can in these cases propose potential completions of the incomplete program. 5. Conclusions
In this paper we showed how representing rich, compositional meaning in terms of constraints and constraint programs offers a uniform framework for dealing with their interpretation and conceptualisation. We demonstrated how the flexible data flows handles interpretation and appropriately adapts to learning situations. The bundling the semantic functions together with the grounding and learning methods affords a tight interaction between the interpretation and the concept acquisition. Encapsulating the procedural details of the bundled functionality allows experimenters to combine different techniques transparently. The constraint based representation of meaning enabled us to draw upon the well-developed body of knowledge on constraint processing in the fields of artificial intelligence and operations research. The interpretation of the constraint based representation constitutes a constraint satisfaction problem, for which optimal algorithms exist. The conceptualisation on the other hand, is implemented as a incremental composer of constraint programs. A number of techniques and strategies were discussed that effectively keep the involved combinatorial explosion in check. In traditional first-order logic representations of meaning, the concepts are typically represented as predicates. In a constraint based approach, the concepts are rather arguments for the semantic constraints, which can be thought of as relational predicates. A constraint based semantics can thus be regarded as a second-order semantics.
345 Finally, the proposed system does not favour any particular model or formalism concerning the emergence and evolution of language in general, or grammar in particular. It should thus be adoptable in a wide array of experimental and theoretical settings. One particular setting is presented elsewhere in this collection (Bleys, 2008). Acknowledgements
This research is supported by Sony Computer Science Laboratory in Paris and the ECAGENTS project funded by the Future and Emerging Technologies programme (IST-FET) of the European Community under EU R&D contract IST-2003-1940. It builds on the work first introduced in Steels (2000) and elaborated on in Steels and Bleys (2005). References
Blackburn, P., & Bos, J. (2005). Representation and inference for natural language. a first course in computational semantics. CSLI Publications. Bleys, J. (2008). Expressing second order semantics and the emergence of recursion. In A. D. M. Smith, K. Smith, & R. F. i Chancho (Eds.), The evolution of language: Evolang 7. World Scientific. Dechter, R. (2003). Constraint processing. Morgan Kaufmann. Mohr, R., & Henderson, T. C. (1986). Arc and path consistency revisited. Artijicial Intelligence, 28(2), 225-233. Plunkett, K., Sinha, C., Moller, M. F., & Strandsby, 0. (1992). Symbol grounding or the emergence of symbols? vocabulary growth in children and a connectionist net. Connection Science, 4,293-312. Roy, D. K., & Pentland, A. (2002). Learning words from sights and sounds: a computational model. Cognitive Science, 26, 113-146. Smith, A. D. M. (2005). The inferential transmission of language. Adaptive Behavior, 13(4), 31 1-324. Steele, G. L. (1980). The definition and implementation of a computerprogramming language based on constraints. Unpublished doctoral dissertation, MIT. Steels, L. (1996). Perceptually grounded meaning creation. In M. Tokoro (Ed.), Icmas96. AAAI Press. Steels, L. (2000). The emergence of grammar in communicating autonomous robotic agents. In W. Horn (Ed.), Ecai2000 (pp. 764-769). Amsterdam: 1 0 s Press. Steels, L., & Bleys, J. (2005). Planning what to say: Second order semantics for fluid construction grammars. In A. Bugarin Diz & J. S. Reyes (Eds.), Proceedings of caepia '05. lecture notes in ai. Berlin: Springer Verlag. Van den Broeck, W. (2007). A constraint-based model of grounded compositional semantics. In Proceedings of langro '2007.
THE EMERGENCE OF SEMANTIC ROLES IN FLUID CONSTRUCTION GRAMMAR
REMI VAN TRIJP
Sony Computer Science Laboratoiy Paris, Rue Amyot 6, Paris, 75005, France [email protected] This paper shows how experiments on artificial language evolution can provide highly relevant results for important debates in linguistic theories. It reports on a series of experiments that investigate how semantic roles can emerge in a population of artificial embodied agents and how these agents can build a network of constructions. The experiment also includes a fully operational implementation of how event-specific participant-roles can be fused with the semantic roles of argument-structure constructions and thus contributes to the linguistic debate on how the syntax-semantics interface is organized.
1. Introduction Most linguists agree that there is a strong connection between the semantic representation of a verb and the sentence types in which the verb can occur. Unfortunately, the exact nature of the syntax-semantics interface is still a largely unresolved issue. One approach is the lexicalist account (e.g. Pinker (1989)) in which it is assumed that there exists a list of universal and and innate ‘semantic roles’ (also called ‘thematic’ or ‘theta’ roles). In the lexicon it is then specified how many arguments a particular verb takes and which semantic roles they play. For example, the verb push (as in Jack pushes a block) is listed as a two-place predicate which assigns the roles ‘agent’ and ‘patient’ to its arguments. These roles are then ‘projected’ onto the syntactic structure of the sentence through a limited (and usually universal) set of linking rules. Differences in syntactic structures are taken as indicators for differences in the semantic role list of a verb. Recently, however, the lexicalist approach has come under serious criticism. Goldberg (1995, p. 9-14) points to the fact that lexicalists are obliged to posit implausible verb senses in the lexicon. For example, a sentence like she sneezed the napkin off the table would count as evidence that the verb sneeze is not only an intransitive verb as in she sneezed, but that it also has a three-argument sense ‘X causes Y to move to 2’ and that it assigns the roles ‘agent’, ‘patient’ and ‘goal’ to its arguments. The lexicalist approach also fails to explain coherent semantic interpretations in creative language use and coercion effects, for example in A gruff ‘police monk’ barks them back to work (Michaelis, 2003, p. 261). 346
347 As an alternative, Goldberg (1995) proposes a constructionist account which we will adopt in this paper. Here, a verb’s lexical entry contains its verb-specific ‘participant-roles’ rather than a set of abstract semantic roles. To take push as an example again, two participant-roles are listed: the ‘pusher’ and the ‘pushed’. These participant-roles have to be “semanticallyfused” with semantic roles, which Goldberg calls ‘argument roles’ (p. 50) and which are slots in argument-structure constructions. Constructions are like the linking rules of the lexicalist approach in the sense that they are a mapping between meaning and form, but the difference is that they carry meaning themselves and that they add this meaning to the sentence. So instead of positing different senses for the verb to accommodate sentences such as he pushed a block and he pushed him a block, parts of the meaning are added by the verb and other parts are contributed by the constructions. For example, in he pushed him a block the ‘recipient’-role is added by the ditransitive construction which maps the meaning ‘X causes Y to receive Z’ to a syntactic pattern. In the constructionist account, semantic roles are no longer treated as universal nor as atomic categories. This is supported by empirical evidence from both cross-linguistic studies as from research on individual languages (Croft, 2001). Even for a specific category such as the English dative, the “relation between form and meaning is rather indirect and multi-layered’’ (Davidse, 1996). Moreover, it is shown that there is a gradient evolution from lexical items to become more grammaticalized (Hopper, 1987), which leads more and more linguists to the conclusion that pre-existing categories don’t exist (Haspelmath, 2007). The constructionist account is more plausible from an empirical point of view, but so far it leaves two questions unanswered: where do semantic roles come from and how exactly does ‘fusion’ work? This paper addresses both issues through experiments on artificial language evolution. It first proposes a fully operational implementation of the constructionist approach using the computational formalism Fluid Construction Grammar (Steels & De Beule, 2006, FCG). Next, the experiment itself is described. Since the experiment deals with artificial languages, the examples in this paper should not be confused with actual grammar descriptions, but rather as indicators of the minimal requirements for explaining semantic roles.
2. Semantic Roles and Fusion in Fluid Construction Grammar In FCG, a language user’s linguistic inventory is organized as a network of rules which is dynamically updated through language use. Figure 1 illustrates the relevant part of a speaker’s network for the utterance Jack pushes a block. There are three lexical rules on the left for jack, push, and block, which introduce the individual meanings of these words. In a logic-based representation, the complete meaning can be represented as (3 v, w, x, y, z: jack(v), block(w), push(x), pushl(x, y), push-2(x, z)}. Note that the lexical rule for push contains two participantroles and that these are represented as predicates themselves. Instead of the names ‘pusher’ and ‘pushed’, the more neutral labels ‘push-1’ and ‘push-2’ are used.
348 The careful reader will have noticed that there is a problem with the meaning: the variables v and y are bound to the same object (jack) so they are coreferential. Similarly, the variables w and z are coreferential because they are bound to the same object (block). Expressing coreferentiality between variables introduced by different predicates is one of the most important functions of grammar and languages have developed various strategies for doing so (e.g. word order in English and case marking in German). Coreferential linking is achieved by making the variables equal (Steels, 2005), which results in the following meaning for the sentence: (3 v, w, x: jack(v), block(w), push(x), push-l(x, v), push-2(x, w)}.
rub-ruie of
Figure 1. The fusion of an event’s participant-roles and a construction’s semantic roles is achieved through fusion links which are dynamically updated through language use
In the FCG implementation, the composition of meanings including the establishment of coreference is taken care of by con-rules which thus implement argument-structure constructions in construction grammar (Goldberg, 1995). The con-rules map a semantic frame (the left pole) to a syntactic pattern (the right pole). The semantic frame contains a set of semantic roles and the syntactic pattern includes simple ‘case markers’ that immediately follow the arguments of which they indicate the semantic role.a An example utterance could be pushjackBO block-KA where BO indicates that jack plays sem-role-8 (which fuses with ‘push-1’) and where KA indicates that block plays sem-role-3 (which fuses with aThe experiment only focuses on the emergence of semantic roles. It therefore assumes a one-toone mapping of semantic roles to grammatical markers.
349
0.e
0.6
0.4
0.2
0 2000
0
M O O
40W
..-I -
-
T - - ~
-
-
-
I
4wO
- .
low0
language games
total nurnbcr of pancipant-rdcs mvtred ...............................
2 5 - / . ’ .. . I
20
-::
15
{.
i.
..
number 01 participant.rolescovered by generalized roles ......
.L
................... . . . . . . . . . . . . . . . . . .
........................
number 01 generalized marker5 I
number pl verb.specific markers
0
40m
8W0
l0MO
language garnss
Figure 2 . The top graph shows that the agents rapidly reach communicative success and that they converge on a coherent set of semantic roles after 5,500 language games. The semantic role variance reaches almost zero. The bottom graph gives more details on the roles themselves.
‘push-2’). There are also links between con-rule 23 and con-rule 5 and con-rule 10 which means that the latter two are sub-rules of con-rule 23. For convenience’s sake, these sub-rules are only illustrated as nodes in the network. The fusion of the event-specific participant-roles and the semantic roles of a construction is specified in ‘fusion links’, which are the grey boxes in Figure 1. The fusion links represent all possible fusions known by an agent which can be extended if needed. Each of the links fuses a participant-role with a semantic role within a specific con-rule. This link has a ‘confidence score’ between 0 and 1 which indicates how successful this fusion has been in past communicative acts. For example, ‘push-1’ can be fused with ‘sem-role-8’ in con-rule 10 with a confidence score of 0.7. There is a competing fusion link in which ‘push-1’ is fused with ‘sem-role-]’ in con-rule 2, but this link only has a confidence score of 0.3 so the other one is preferred. Finally, ‘push-1’ can also be fused with ‘sem-role-8’ in
con-rule 23, which also contains the semantic role ‘sem-role-3’. In this case, the fusion has a confidence score of 0.5. This fine-grained scoring mechanism allows speakers of a language to cope with the fuzzy edges of grammatical categories, which is necessary because grammar rules have to be applicable in a flexible manner. A network of rules, as opposed to a limited set of linking rules, is also an elegant way of capturing the complex and multilayered mapping between form and function in language.
3. Experiments on the Emergence of Semantic Roles This paper hypothesizes (a) that the emergence of semantic roles is triggered by the need to reduce the cognitive effort of interpretation and to avoid misinterpretation, and (b) that generalizations and grammatical layers are developed as a side-effect of reusing existing linguistic structures in new situations. To test these hypotheses, the same experimental set-up was used as Steels and Baillie (2003). The experiment involves a population of 5 artificial agents which play description games about dynamic real-world scenes. Equipped with a vision system and embodied through a pan-tilt camera, the agents are capable of extracting event descriptions from the scenes. During a game one agent describes an event in the scene to another agent. The game is a success if the hearer agrees with that description. In order to focus exclusively on the emergence of semantic roles, the agents are given a lexicon at the beginning of an experiment but no grammar. The agents are autonomously capable of detecting when there might be communicative problems through self-monitoring (Steels, 2003). This enables the agent to detect whether variables are coreferential and thus whether there are missing links in the meaning of an utterance (Steels, 2005). If the speaker detects one missing link (but no more), he will try to repair this problem. The hearer’s learning strategy works in the same way, except that he has more uncertainty because he has no access to the speaker’s intended meaning. By comparing the parsed utterance to his world model, however, the hearer may exploit the situatedness of the communicative act to solve the missing link problem as well. Repairing a missing link can be done by classification or by combination. Repair by classification occurs when the missing link involves a participantrole which the speaker encounters for the first time (e.g. push-1) which we will call the target-role. The agent will first check whether he already knows a semantic role for an analogous participant-role (source-role) that might be reused. Analogy works by (1) taking the event of the target-role and the event that was used to construct the source-role, (2) decomposing them into their event structures, and then (3) constructing a mapping between the two. For example, a ‘walk-to’-event can be decomposed into an event structure that starts with two non-moving participants and then one participant approaching the other. Event structures themselves are represented as a series of micro-events. The algorithm takes all the participant-roles of the micro-events in which the target-role occurs
351 and maps them onto the corresponding participant-roles in the source event structure. An analogous mapping is defined as when the filler of those corresponding roles is always the same. In case of multiple analogies, the source role which covers the most specific participant-roles is chosen. The source role will then be generalized so that it also covers the target-role. If no analogy could be found, the agent will create a new con-rule which maps the target-role to a newly invented marker. In both cases, fusion links are created and updated for later usage. Repair by combining existing rules occurs when the speaker wants to express a two- or three-place predicate and already has separate rules that link some of the coreferential variables, but not all of them. The agent will then try to combine these existing rules into a new con-rule. New fusion links are created and family links (sub- and super-rules) are kept between the new con-rule and the rules that were used for creating it. In this way, a network of rules as seen in Figure 1 gradually emerges which improves linguistic processing. Given the population dynamics of the experiment, several semantic roles may be created and generalized in local language games and then start to propagate among the agents. This automatically creates conflicting solutions, however, so the roles start competing with each other for survival and for covering as much participant-roles as possible. Language thus becomes a complex adaptive system in its own right, very much like a complex ecosystem. There are two types of selectionist forces at work: functional (i.e. some roles are more analogous and therefore better suited for covering a participant-role) and frequency-based. To be able to align their grammars with each other, agents consolidate their linguistic inventory after each game by updating the scores of the fusion links. Since each construction has its own place in the grammar, fusion links are needed for each specific construction (see Figure 1). However, there is a danger of lingering incoherence if the scores of the fusion links are updated independently of each other. For example, the fusion link between ‘push-1’ and ‘sem-role-1’ may win the competition for single-argument utterances whereas the fusion with ‘sem-role8’ may win for two-argument utterances. This is incompatible with observations in natural languages which develop a coherent system for argument-structure constructions. In order to solve this problem, the agents apply a consolidation strategy of multi-level selection. Instead of updating only the fusion links that were actually used during processing, all the compatible fusion links are updated as well. Compatible fusion links are links that are related to sub- or super-rules of the applied con-rule. These scores are increased if the game was a success while all the competing links are decreased by lateral inhibition. The scores are lowered if the game was a failure.The exact algorithm and experiments on multi-level selection are reported in more detail in Steels, van Trijp, and Wellens (2007).
352 4. Results and Discussion The results show that the agents succeed in developing a coherent system of semantic roles. The top graph in Figure 2 shows that the agents rapidly reach communicative success and that they learn all the case markers after 2,000 language games. It takes them another 3,500 games before they reach total meaning-form coherence. Meaning-form coherence is measured by taking the most frequent form to cover a participant-role and divide this by the total number of forms circulating in the population. Inversely, the semantic role variance - which measures the distance between the semantic role sets of the agents - reaches almost zero which means that the agents have aligned their semantic roles. The bottom graph of Figure 2 gives more details about the roles themselves. The semantic role overlap indicates that there is still competition going on for 5 participant-roles. The graph also shows that there are 9 verb-specific markers whereas 7 have already become more generalized. These 7 markers cover 24 of the 30 participant-roles in the experiment. Figure 3 gives a snapshot of the evolution of case markers in one agent. It shows that there is a gradual continuum between more lexical, verb-specific markers and more grammaticalized markers which cover up to 8 participant-roles. Similar observations have been made in natural languages by grammaticalization studies (Hopper, 1987).
vuivos - puxaec - zoazeuch - naetaz - toawash .nudeua
.
,; ..: , ' : . \
,. , ,,. . ..:,.. .:. ,:, :, .: . . : ,
.,
r
0
1Mo
2000
.; '.,. J . i
: ,
3oW
....' . ,'"
., i.
,
..;:
.
'
4Mo
.
.. . , '.. :, .. , . 5OW
..
, ,
BOW
..
.. .
. .. , . . .j
7000 BOW Language games
Figure 3. The evolution of case markers in one agent. For example "fuitap" covers 8 specific roles after 600 games, but is in conflict with other markers and in the end covers 6 roles. The graph shows the continuum between more specific and more generalized semantic roles.
5. Conclusion This paper showed that experiments on artificial language evolution can be highly relevant for linguistic theories. It proposed a fully operational implementation of
353 the constructionist account to predicate-argument structure in Fluid Construction Grammar. By embedding this approach in experiments with embodied artificial agents, a coherent explanation was presented on the emergence of semantic roles. The results of the experiments showed that semantic roles can emerge as a way to avoid misinterpretation and to reduce the cognitive effort needed during parsing, and that they are further grammaticalized by reuse through analogy.
Acknowledgement This research was funded by the EU FET-ECAgents Project 1940. The FCG formalism is freely available at www.emergent-1anguages.org. I am greatly indebted to Luc Steels (who implemented the first case experiment in 2001), director of the Sony Computer Science Laboratory Paris and the Artificial Intelligence Laboratory at the Vrije Universiteit Brussel, the members of both labs, and Walter Daelemans, director of the CNTS at the University of Antwerp. References Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford: Oxford UP. Davidse, K. (1996). Functional dimensions of the dative in english. In W. Van Belle & W. Van Langendonck (Eds.), The dative. volume I : Descriptive studies (pp. 289 - 338). Amsterdam: John Benjamins. Goldberg, A. E. (1995). A construction grammar approach to argument structure. Chicago: Chicago UP. Haspelmath, M. (2007). Pre-established categories don’t exist. Linguistic Typology, I l ( l ) , 119-132. Hopper, P. (1987). Emergent grammar. BLC, 13, 139-157. Michaelis, L. A. (2003). Headless constructions and coercion by construction. In E. Francis & L. Michaelis (Eds.), Mismatch: Form-function incongruity and the architecture of grammar (pp. 259-3 lo). Stanford: CSLI Publications. Pinker, S. (1989). Learnability and cognition: The acquisition of argument structure. Cambridge: Cambridge UP. Steels, L. (2003). Language re-entrance and the ‘inner voice’. Journal of Consciousness Studies, lO(4-5), 173-185. Steels, L. (2005). What triggers the emergence of grammar? In Aisb’O5: Proceedings of eelc’05 (pp. 143-150). Hatfield: AISB. Steels, L., & Baillie, J.-C. (2003). Shared grounding of event descriptions by autonomous robots. Robotics and Autonomous Systems, 43(2-3), 163-173. Steels, L., & De Beule, J. (2006). Unify and merge in fluid construction grammar. In P. Vogt, Y.Sugita, E. Tuci, & C. Nehaniv (Eds.), Symbol grounding and beyond. (pp. 197-223). Berlin: Springer. Steels, L., van Trijp, R., & Wellens, P. (2007). Multi-level selection in the emergence of language systematicity. In F. Almeida e Costa, L. M. Rocha, E. Costa, & I. Harvey (Eds.), Proceedings of the 9th ecal. Berlin: Springer.
BROADCAST TRANSMISSION, SIGNAL SECRECY AND GESTURAL PRIMACY HYPOTHESIS SLAWOMIR WACEWICZ & PRZEMYSLAW ZYWICZYNSKI Department of English, Nicoluus Copernicus Universily, Fosa Staromiejsh 3 Toruri, 87-1 00, Poland In current literature, a number of standard lines of evidence reemerge in support of the hypothesis that the initial, “bootstrapping” stage of the evolution of language was gestural. However, one specific feature of gestural communication consistent with this hypothesis has been given surprisingly little attention. The visual modality makes gestural signals more secret than vocal signals (lack of broadcast transmission). The high relevance of secrecy is derived from the fundamental constraint on language evolution: the transfer of honest messages itself is a form of cooperation, and therefore not a naturally evolutionarily stable strategy. Consequently, greater secrecy of gestural communication constitutes a potentially important factor that should not fail to be represented in more comprehensive models of the emergence of protolanguage.
The idea of gestural primacy (in the evolution of language), in its various forms, has attracted numerous modern supporters (Hewes 1973, Armstrong et al. 1994, Corballis 2002, among many others), as well as several sceptics (e.g. MacNeilage & Davis 2005), with a small but notable minority denouncing it as a non-issue (Bickerton 2005). Its proponents adduce a wide range of evidence, focussing on the rigidity of preexisting primate vocal communication, iconicity of gestures, sign language acquisition, cortical control of the hand, and many others. However, one very interesting feature of gestural signals, the greater potential secrecy resulting from the lack of broadcast transmission, has so far remained unexplored, despite its strict relevance to the evolutionary context. At the same time, we have found it to be neglected in standard psychological, linguistic, and ethological approaches to nonverbal communication in humans (Feldman and Rime 1991, McNeill 2000; Atkinson and Heritage 1989; EiblEibesfeldt 1989). 1.
Definitions and caveats
It is important to voice a number of caveats at the outset. Firstly, we follow Hewes (1 996) in giving the pivotal term gesture a relatively broad interpretation. 354
In the present context, “gestures” are primarily defined as the voluntary communicative movements of the arm, hand and fingers. Somewhat less centrally, they also include elements of proxemics, posture and orientation, facial expressions, and gaze direction. On the other hand, gestures as understood here do not refer to the articulatory gestures involved in speech production, nor to non-intentional bodily signals (affective gestures), although they may form a continuum with the latter. Secondly, it must be emphasised that the present paper deals specifically with the very earliest stage of the phylogenetic emergence of languagelike communication. We subscribe to the widely held position that language as known today was preceded by a “simpler” protolanguage. We remain noncommittal as to the exact nature of protolanguage (e.g. holistic versus atomic), but assume it to be distinguished by the lack of generative syntax, but the presence of the conventional sign (sensu Zlatev et al. 2005). Thirdly, it should be noted that this text concerns broadcast transmission only with respect to its consequences to secrecy (“privacy”, “addressee discrimination”). The general implications of broadcast transmission of a communication system are much wider, including such aspects as independence from visibility conditions and line of sight, but they lie outside the scope of the present paper’. 2.
The fundamental constraint on the evolution of communication
A standard, intuitive approach to explaining the absence of language in nonhuman primates is to look to their cognitive, conceptual or physical limitations (relative to humans). Such a position implicitly assumes a natural motivation to exchange honest messages, only held back by the lack of suitable means of expression. This, in turn, is rooted in an intuitive view on the naturalness of cooperation, additionally backed up by the group selectionist mindset popular in the first half of the past century. From that perspective, the presence of extensive cooperation between nonkin in humans is expected; it is the lack of such cooperation in other primates that becomes the theoretical problem in want of an explanation. The above explanatory pattern has been reversed by the introduction
’
It is worth noting that once the argument becomes framed in terms of the advantages of one transmission channel over the other (as is often the case), it instantly loses its relevance to the issue of gestural primacy. The question of which communication system is more efficient is logically independent from the question of which communication system is more natural to evolve in an ancestral primate: “which is better” is fully dissociable from “which came first”.
356 into evolutionary theory of the gene’s eye view (Dawkins 1976) and gametheoretic logic (Maynard Smith 1982). However, the relation between cooperation and communication remains complicated, with communication often seen essentially as a mere means for establishing the cooperative behaviour proper (e.g. Gardenfors 2002). It takes another vital step to realise that the exchange of honest messages is a special case of communication that is itself a form of cooperation. As such, it requires special conditions for emergence (such as kinship, byproduct mutualism, group selection, reciprocity see e.g. Dugatkin 2002), and generates specific predictions as to its nature (Krebs and Dawkins 1984). Communication in general is constrained by the honesty of signals. Since receivers are selected not to respond to dishonest messages - ones that fail to be reliably correlated with their “contents” - in the absence of signal honesty communication breaks down. Honesty can be guaranteed in two different ways, reflecting two models of social interaction. They result in two distinct kinds of signalling that characteristically differ in their expensiveness (Krebs and Dawkins 1984; see also Noble 2000, who nevertheless generally endorses this conclusion). Typically the interests of the individuals and their genes are conflicting, and communication spirals into an arms race between “costly advertising” and “sales resistance”. Here, honesty of a signal is certified by its being expensive and thus difficult to fake. The costs incurred on the signallers are diverse and involve minimally the expenditure of valuable resources such as time, energy, attention - but they can also include attracting predators, warning potential prey, or otherwise handicapping the animal in performing a simultaneous action (see also point 4). However, in cooperative interactions, honesty is intrinsically present, and need not be backed up by signal expensiveness. In such a model, selection pressures act against signal expensiveness, favouring the emergence of “cheap” signalling. In particular, this is relevant to signalling in language, which follows the latter pattern of communicative interactions. To sum up, the emergence of language-like communication necessarily presupposes the cooperative spectrum of the payoff matrix. Furthermore, it strongly predicts the signals used in such a type of communication to rninimise their conspicuousness as well as all other kinds of costs. 3.
Broadcast transmission
The concept of broadcast transmission was defined by Hockett (1 977) as one of the design features of language. The idea of broadcast transmission captures a
357 basic trait of verbal communication, which results from its dependence on the vocal-auditory transmission channel. Under canonical conditions, a vocal signal travels in all directions from its source, its detectability being restricted only by the distance from the sender (and the sensory equipment of potential decoders). This fact has a number of consequences, but in the present context, it is important that a vocally coded message is available indiscriminately to all individuals within the hearing range. The signaller is normally unable to confine the scope of addressees of its message. It is of interest to note that this problem was recognised as early as Hockett himself (1977: 131): “The situation is like that in bidding at bridge, where any information sent to one’s partner is also (barring resort to unannounced conventions, which is cheating) transmitted to opponents. There must be many ecological conditions in which this public nature of sound is potentially contrasurvival.” In this respect, gestural communication stands in a clear contrast with vocal communication. Its dependence on the visual mode, despite being limiting in other ways, does not lead to broadcast transmission, allowing the sender to select the addressees of the message. 4.
The costs of signalling in (proto)language
Language is a communicative system distinguished by its very high flexibility in the range, kind and complexity of transferred messages. This is founded on detached representation (Gardenfors 1996), which affords linguistic communication with essential independence from contextual, thematic, etc. constraints. This is a qualitative difference from nonlinguistic communication systems, and we assume it to be characteristic of protolanguage, at least to a considerable extent. The use of conventional signs endows protolanguage, despite its limited compositionality/productivity, with the ability to represent states, events, relations, etc. in the world in a rich form that can be assigned, or at least effectively interpreted in terms of, truth values’. As stated in point 2, all signalling is costly, principally in ways that are directly related to the production of the message, rather than to its “content”. Nevertheless, signalling may bear yet another type of consequences that rise to prominence in increasingly language-like forms of communication. These pertain to the content of the message. In so far as other parties are capable of acting on the disclosed information in ways harmful to the signaller, this reduces This need not imply an explicitly propositional representation format. For a possible format see e.g. Hurford (2006).
358 the signaller’s fitness and therefore can be conceptualised as a cost. Such costs may be negligible for most kinds of animal communication. This changes radically in protolanguage, which enables its users to convey a qualitatively different kind of information: rich information about the location and ways of access to food and other resources or about the history of social interactions (the “who did what to whom”). Such information constitutes valuable knowledge, and the evolutionary costs on the individual unintentionally divulging it to “eavesdropping” competitors and opponents are proportional to its high value. It must be especially emphasised that the above constraint is particularly relevant to the early stages of the development of language-like communication, where the cooperative context of communication is fragile. This is so because as is well known - language introduces or facilitates a range of normative mechanisms, such as reciprocity and punishment, that bolster cooperation; cooperation and language co-evolve. Therefore, the ability to discriminate between the receivers of the message would have been particularly important in the “bootstrapping” phase of the emergence of protolanguage. 5.
The secrecy of gestural signals
Gestural communication has so far been little studied with respect to signal secrecy. However, secrecy resulting from the lack of broadcast transmission appears to be a prominent trait of the use of gestures in present day humans. When gestural communication occurs between speakers capable of vocal communication, it is likely to follow from the effort to constrain the number of addressees, and is a strong indicator of a conflict of interests with a third party present in the vicinity. A strong link between the use of gestural communication under default audibility conditions and the need of secrecy, motivated by a conflict of interests, is supported by diverse lines of circumstantial evidence, some of which are enumerated below:
-
-
parenthetical signals that qualify, or even contradict, the vocally transmitted information, are often designed to be inaccessible to part of the receivers of the vocal message (e.g. a conspiratorial wink accompanying a vocal statement) - see Scheflen 1972; in contexts involving team competitions, the secrecy of tactical decisions is secured by reverting to the gestural mode, e.g. by taking advantage of the blocked line of sight of their opponents - see fig. 1; thieves operating in public places are known to depend on gestures to
359 coordinate their actions in a manner designed to minimise conspicuousness; indigenous people of the Kalahari Desert resort to sign language during hunting; this case represents a markedly different type of secrecy from the ones described above: here, the use gestures is not motivated by the intention to hide the content of the message but by the intention to hide (from prey) the very act of communication.
As already noted, secretive use of gestures has not been given attention in communication studies. Our work should be seen as a preliminary attempt to bridge this gap. Given the speculative nature of our claims, we have designed a set of role-play experiments and hope that, in the wake of them, we will be able to give these claims a more empirical footing. 6.
Conclusion
The argument outlined above is conceptually simple. The specific thesis advocated here is that the use of gestures counters the disadvantage incurred by the “broadcast transmission” feature characterising vocal communication. We suggest that this apparently slight disadvantage becomes magnified in more human-like interactions relying on more language-like communication, where the cost of divulging valuable information becomes an important factor. The gestural mode of communication, making use of the visual channel of transmission and thus being more secret, allows one to choose the receivers of its messages more discriminately. The above argument, which can be referred to as the “gestural secrecy argument” is limited in its scope. It does not constitute a separate scenario of the
360 evolution of protolanguage; rather, it identifies a potentially powerful factor that should be included into existing scenarios. Also, the argument does not address the central issue of why communication in hominids took a cooperative course in the first place. Still, it lends certain support to gestural rather than vocal theories of language origins, showing them to be more economical in the above respect. Further necessary research includes the incorporation of the factor of signal secrecy into more formal modelling of (proto)language origins, as well as empirical studies of signal secrecy in present-day gestural communication. References
Armstrong, D. F., Stokoe, W. C. & Wilcox, S. E. (1994).Signs of the origin of Syntax. Current Anthropology 35-4,349-368. Atkinson, J. M.,& Heritage, J. (Eds.) (1 989).Structures in social action. Studies in conversation analysis. Cambridge: Cambridge University Press. Bickerton, D. (2005).Language Evolution: a Brief Guide for Linguists. URL= http://www.derekbickerton.com/blog/~archives/2005/7/1/989799.html Corballis, M. C. (2002). From hand to mouth: The origins of language. Princeton, NJ: Princeton University Press. Dawkins, R. (1976).The SelJsh Gene. Oxford: Oxford University Press. Dugatkin, L. A. (2002). Cooperation in animals: An evolutionary overview. Biology and Philosophy, 17,459-476. Eibl-Eibesfeldt, I. (1989).Human ethology. New York: Aldine de Gruyter. Feldman, R. S., & Rim& B. (Eds.) (1991). Fundamentals of nonverbal behaviour. Cambridge: Cambridge University Press. Gardenfors, P. (1 996). Cued and detached representations in animal cognition. Behavioural Processes 36,263-273. Gardenfors, P. (2002). Cooperation and the evolution of symbolic communication. Lund University Cognitive Studies, 91. Hewes, G. W. (1996).A history of the study of language origins and the gestural primacy hypothesis. In: A. Lock and C. Peters (Eds.), Handbook of human symbolic evolution (pp. 57 1-595).Oxford: Oxford University Press. Hockett, C. F. (1977). Logical considerations in the study of animal communication. In C. F. Hockett (Ed.), In The View from Language: Selected Essays 1948-1974 (124-162).Athens, GA: The University of Georgia Press. Hurford, J. R. (2006).Proto-propositions. In A. Cangelosi, A. D. M. Smith, and K. Smith (Eds.), The Evolution of Language. Proceedings of the 6Ih International Conference E VOLANG6 (pp. 13 1-138). Singapore: World
361 Scientific Publishing. Krebs, J. R. & Dawkins, R. (1984). Animal Signals: Mind-Reading and Manipulation. In J. R. Krebs and R. Dawkins (Eds.), Behavioural Ecology: An Evolutionary Approach (pp. 380-402). Oxford: Blackwell. MacNeilage, P. F. & Davis, B. L. (2005). The FrameEontent theory of evolution of speech: A comparison with a gestural-origins alternative. Interaction Studies, 6-2, 173-199. McNeill, D. (Ed.) (2000). Language and gesture. Cambridge: Cambridge University Press. Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge: Cambridge University Press. Noble, J . (2000). Co-operation, competition and the evolution of pre-linguistic communication. In C. Knight, J. R. Hurford and Michael Studdert-Kennedy (Eds.), The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form (pp. 40-61). Cambridge: Cambridge University Press. Scheflen, A. E. (1972). The significance of posture in communication systems. In J. Laver and S. Hutcheson (Eds.), Communication in face to face interaction (pp. 225-246). Harmondsworth: Penguin Books. Zlatev, J., Persson, T. & Gardenfors, P. (2005). Bodily mimesis as “the missing link” in human cognitive evolution. Lund Universiw Cognitive Studies, 121.
SELF-INTERESTED AGENTS CAN BOOTSTRAP SYMBOLIC COMMUNICATION IF THEY PUNISH CHEATERS
EMILY WANG Artijicial Intelligence Laboratory, Vrije Universiteit Brussel Pleinluan 2, Brussels, 1050, Belgium emily @arti.vub.ac.be
LUC STEELS Sony Computer Science Laboratory 6 Rue Amyot, Paris, 75005, France Artijicial Intelligence Laboratory. Vrije Universiteit Brussel Pleinlaan 2, Brussels, 1050, Belgium [email protected] We examine the social prerequisites for symbolic communication by studying a language game embedded within a signaling game, in which cooperation is possible but unenforced, and agents have incentive to deceive. Despite this incentive, and even with persistent cheating, naming conventions can still arise from strictly local interactions, as long as agents employ sufficient mechanisms to detect deceit. However, unfairly antagonistic strategies can undermine lexical convergence. Simulated agents are shown to evolve trust relations simultaneously with symbolic communication, suggesting that human language need not be predicated upon existing social relationships, although the cognitive capacity for social interaction seems essential. Thus, language can develop given a balance between restrained deception and revocable trust. Unconditional cooperation and outright altruism are not necessary.
1. The Reciprocal Naming Game Sociality is generally regarded as a prerequisite for symbolic communication (Steels, 2008), but given the pressure of natural selection, there remains the question of how honest communication can be evolutionarily stable when individuals might gain an advantage by deceiving others (Dessalles, 2000). In hunter-gatherer societies, imparting personal knowledge to others about the location of food can be of negligible cost and may bring extra benefits if collaboration is required to harvest the food, or if the other individuals are likely to return the favor at a later time (Knight, 1991). Reciprocity has been put forward as a mechanism that sufficiently elicits altruism directed at unrelated individuals given Darwinian constraints, as long as individuals encounter each other repeatedly over the course of many interactions, and are exposed symmetrically to opportunities for altruism, as 362
363 in the prisoner’s dilemma strategy game (Trivers, 1971). With a tit-for-tat policy, a player remembers each opponent’s previous action so that cooperation is only directed towards those who did not defect in the previous interaction, and this has been shown to foster reciprocity because it is punishing yet forgiving (Axelrod & Hamilton, 1981). Thus, we present a computational model where individuals can recognize each other, keep a record of cooperative behavior, and direct their own altruistic behavior towards those who previously offered cooperation. We combine two well-studied models, the Naming Game and the Signaling Game, to make the Reciprocal Naming Game, which we use to study the interaction between optional altruism and the emergence of symbolic communication. The Naming Game (Steels, 1995) was introduced as a minimal model for studying the conventionalization of names in a population of agents, using only peer-to-peer interactions. The goal is to develop globally accepted naming conventions from only the sum experience of many local interactions. The Crawford-Sobel model of strategic information transmission (1982) defines a Signaling Game, which is a two-player strategy game in which the players communicate using signals. For convenience, we denote the signaler as S , and the receiver as R. S is better informed than R, with private information t about the environment. S transmits a message m to convey either t , or something misleading. Based on m, R takes an action a that determines the payoff for both players. If S adopts a strategy of lying about t , then R adapts by ignoring information in m. In the Naming Game, the speaker utters a word to best convey the intended referent to the hearer. But in a Signaling Game, the signaler need not transmit m E t. We create a single game out of these two by presenting two players, randomly chosen out of the population in each iteration, with a context of two items, one of which is the target, and the other a distructer. S has access to this information, but may choose either item as the referent. This situation can be conceived as a shell game, where a set of shells forms the context, and a dealer has hidden a pea under one of the shells. R is like a player who places a bet, and wins by correctly guessing which shell contains the pea. S is a third party that may act as an informant and truthfully indicate the target to R, in which case S takes a share of R’s winnings. Or, S may act as a shill by indicating the distracter, and receive a payment from the dealer if R guesses incorrectly. So S may use m to deceive and R must decide whether to believe m. This interaction scheme is similar to that of the regular Naming Game, but without feedback from explicit pointing. With the Reciprocal Naming Game, the signaler’s intended meaning is never revealed to the receiver. Adding this layer of uncertainty preserves the privacy of the players’ choices whether to cooperate or defect. The remainder of this paper studies the Reciprocal Naming Game. We first introduce a minimal agent architecture needed to play the game, and then some different strategies. Next we report on the result of computational simulations that examine key questions about the social prerequisites of symbolic communication.
364 2. Agent Architecture To remember object names, each agent is equipped with a lexical memory associating words with meanings and scores. Multiple lexicon entries may share the same word or meaning, and these competing conventions can be ordered by preference according to their score. Scores are governed by lateral inhibition, that is, incremented following successful usage and decremented following failed interactions, or the successful use of a competing association. Group coherence represents agreement in the population, and this is summarized by a group lexicon of the most widely accepted words, but this measure is only known to an external observer. The agents themselves receive only local information. To identify other agents in the population and to record previous experiences, each agent also has a social memory, associating each other individual with a rating. One agent can regard another with the intent to cooperate, regard(aj, a k ) = 1 , or with the intent to defect, regard(aj, a k ) = 0. Two agents that regard each other in the same way share mutual regard, regard(aj, U k ) = regard(ak, a j ) , but otherwise their relationship is one-sided. The outcome of one iteration of the Reciprocal Naming Game depends upon three binary parameters, a s , c, and U R . The actions of the signaler and receiver are a s and a ~where , cooperation and trust are coded as 1, and defection and disbelief as 0. The predicate c indicatcs whether R comprehended the message correctly. A fourth value p depends on the other three, and indicates whether R successfully located the pea, which can occur on purpose or by accident, depending on c. So p is set like an even parity bit, with p = 1 only when an odd number of the bits in { a s ,c, a ~ are } 1, and this collapses the eight possible combinations into four distinct outcomes. These outcomes are summarized by the payoff matrix, p=l
p=o
where u denotes utility, and each entry gives us,U R . Note that p is used to decide the payments instead of U R , since the dealer or R only pay S based on the final outcome of the shell game. Three levels of information govern the players’ knowledge. Actions a s and a~ are kept private by each player. The result p is public information, displayed to both players, but the result c is not revealed to any player; it is known only by virtue of experimenter inspection. Players cannot inspect each others’ internal processes, so they cannot know for certain whether their opponents cooperate or defect. Nevertheless, S and R can each estimate the action of the other, given knowledge of their own actions, and their observation of p . For an agent-knowledge formulation of the Reciprocal Naming Game, as well as further results not presented here, see http://arti.vub.ac.be/-emily/msc/.
365
3. Player Strategies Under the general condition of complete reciprocity, the signaler chooses as = regard(S, R) and the receiver chooses UR = regard(R, S), in accordance with tit-for-tat. An empty strategy was implemented to refute the null hypothesis, which would be that cheater detection has no effect on the ability of the population to agree upon lexical conventions. In this condition, S behaves as above. R assumes that the target m, but if R cannot interpret m, then it looks for the pea under a random context item. In another condition with only partial reciprocity, we relax the requirement that U S = regard(S, R). Instead we allow US = 0 even when regard(S, R ) = 1, by introducing a constantfairness parameter f for each agent. Afair agent has f = 1.0, and behaves with complete reciprocity. When f = 0, the agent acts as afree rider, and always defects when playing as S , although it can still choose to believe the signaler when playing as R. The agents also employ specified strategies for updating their memories. For the lexicon, both players promote the association that was applied in the interaction when they have received a nonzero reward, and they demote associations resulting in zero payoff. With a short-term memory strategy, associations reaching the minimum score threshold are deleted from the lexicon, but such entries are kept when using long-term memory. Updates for social regard are less symmetric. The signaler's sole criteria for updating its regard for R is whether or not the receiver chose the object that was intended, thus S assumes c = 1. When a s = 1, the intended object is the target, and when a s = 0, it is the distracter. So the receiver's choice matches the signaler's intention when p = a s . The receiver considers the size of U R to estimate whether the signaler cooperated in the interaction. As illustrated by the payoff matrix, R can sometimes deduce c and a s , given U R and p. When U R = 0.6, it is certain that a s = 1,even if R did not cooperate. R responds by cooperating with S next. When U R = 1.0, both players defected, and R continues to defect against S . When U R = 0, R cannot be certain about a s , and responds by modifying its regard for S by a bit-flip, since the payoff was not favorable.
=
4. Experimental Results Figure 1 shows a Reciprocal Naming Game with ten objects and ten agents using short-term memory. Measures are shown as running averages. Figures 2-5 are meant to be read in direct comparison to Fig. 1 (and so they have been simplified, and afforded less space; complete color versions can be viewed at http://arti.vub.ac.be/"emily/evolang7/). In successful systems, an initial lexical explosion due to the rapid invention of new words is followed by an approach towards high group coherence and communicative success as the lexicon becomes more efficient. Even under the more challenging conditions of the Reciprocal Naming Game, the agent population is capable of reaching complete agree-
366
0
2000
4000 6000 Games Played
8000
10000
Figure 1. Lexical agreement is not hindered by cheating in a simulation where the agents employ titfor-tat and have short-term memory. The lexicon becomes optimal and stable after 5,000 games, with complete group coherence fixed at 1.0, and lexicon size at 10. Communicative success is near perfect. but fluctuates just below 1.0. Reciprocating relationships are split about equally, and fluctuating.
ment on a set of lexical associations, despite the persistence of mutually defecting pairs. However, communicative success remains less than perfect, even when coherence is full, due to homonyms that are propagated following games where m was misunderstood. Because of the lack of pointing, agents cannot distinguish between a a zero payoff due to failed communication, and the same result due to a defecting partner. Thus communicative success and social relationships fluctuate continuously as a result of lexical inefficiency. We now examine the importance of sociality by discussing four major issues: 4.1. Retaliation allows deception to be tolerated
In Fig. 2, R employs the empty strategy and simply assumes that S is truthful, while S follows tit-for-tat. Coherence is not realized because misinterpreted messages pollute the lexicon with many homonyms. Even though the initial population is fully cooperative, R guesses randomly when it does not know m, and this introduces uncooperative regard into the system. So agreement can form when the agents are equipped retaliate, as they are in Fig. 1, but not in Fig. 2. This clearly rejects the null hypothesis since the population only develops group coherence when the receivers, as well as the speakers, follow a policy of reciprocation. Therefore lexical convergence depends not upon a complete lack of deception, but rather upon balance between deception and the ability to detect it. Given this, individuals can direct their altruism accordingly. But since R cannot always deduce the true value of a s , it seems even an approximation of the speaker’s honesty suffices. Thus, cheater detection is essential, even if it is fallible.
367
4.2. More memory prevents the death spiral One weakness of tit-for-tat, cited for the iterated prisoner’s dilemma, is the problem of the death spiral in noisy environments, where a single mistake can destroy a mutually cooperative relationship (Axelrod & Hamilton, 1981). The Reciprocal Naming Game tends to resist this pitfall since the true actions, U S and a ~remain , private, and players must deal with doubt when estimating these values. Cooperative relations become even more robust with long-term lexical memory, when obsolete associations remain accessible to R for interpreting m. This increases the chance of comprehension, and suppresses defecting pairs to much lower numbers, as shown in Fig. 3. The time to reach convergence doubles, but mutually cooperative relations are more constructive and stable since a shared reward results in synchronous score promotions, while defection virtually guarantees that the players will make mismatched lexical updates. 4.3. Limited numbers of free riders are bearable
Figure 4 shows that a population mostly composed of fair agents can accurately retaliate against a single free rider. But retaliation becomes less effective as the number of free riders grows, as shown in Fig. 5 where coherence is significantly more difficult to achieve, and unstable. Free riders detract from the common good in total utility, since mutually cooperative interactions benefit from a 0.2 bonus. The advantage of the free rider strategy depends on how many other agents in the population are following the same strategy. Individual utility is best served by taking part in the majority, that is, to cease reciprocating when there are more free agents than fair agents in the population. 4.4. Reciprocation produces coherence in spite of deception
While the agents never form explicit agreements, each agent’s personal utility depends on its ability to establish reciprocal relationships. Acting without reci-
I*..*...*...-.* ........._............*....~....
I
o
I 201x1
4000 6im Game, Played
~NKI
mx)
11
4IMl
RIYIO
IZ(HI0
16W)
ZIXHXI
Gamer Plrycd
Figure 2. Agents perform at random when R has Figure 3. Defection is suppressed when agents no strategies for detecting deceit. Lexical agree- have the added capacity of long-term memory. ment under these conditions is not possible. The learning curve compares with that of Fig. 1.
368 procity is costly. Cooperating with a partner who defects results in the sucker's payoff. Defecting against a partner who cooperates precludes future cooperation. But we must distinguish between failing to reciprocate and choosing not to cooperate. If two agents have established a pattern of repeated, mutual defection, then they receive roughly equal cumulative payoff. In a sense, one player sacrifices itself in each interaction, to provide the other with a large reward, and they take turns doing this since roles are randomly assigned. This way, cooperation takes place not within each interaction, but over the course of multiple interactions, emerging from tit-for-tat. The level of information sharing found in human language use suggests that speakers must be motivated to share personal knowledge by some direct payoff (Scott-Phillips, 2006). In the context of the Reciprocal Naming Game, a speaker can be seen to derive utility from the propagation of its own words, because later in the receiver role, this agent will deal better with the social situation when it is able to interpret the linguistic situation. Ostensibly, it would be every agent's goal to avoid coherence with unfair partners if coherence renders an agent vulnerable to deception perpetrated by shared words. But coherence contributes to personal utility when cheaters can be detected, and this supports convergence in the face of deception. Although an opponent might use a word to deceive once, the word cannot be used against the same agent to cheat repeatedly if the meaning of the word is shared, since an agent who has been deceived will choose to disbelieve the message in the next round, if playing by tit-for-tat. Thus in the long run, comprehension of messages elevates receiver performance above chance, and it is in an agent's interest to share the words it knows, and to learn the words spoken by other players. This way, the group lexicon serves as a neutral tool and as a sort of social contract, especially because it would be difficult for a single agent to deviate unilaterally from the agreed naming conventions. In this system, the language remains a constant fixture because the opportunity to brandish it for deceit is no greater than the opportunity to engage it for cooperation.
0.2 CommunicativeL U C E ~ M
Lexicon size
n
2 ~ m
4000
m
-
~(100
ii
Gamer Played
Figure 4. With only one free rider, lexical agree- Figure 5. With three free riders, the ability to build agreement becomes greatly diminished. ment and stability nearly matches Fig. 1.
369 5. Conclusion
In simulations guided by a model of selfish communication, we experimented by endowing agents with a tit-for-tat policy, as well as some other policies for guiding altruistic behavior. With tit-for-tat, the agents’ selfishness did not impede lexical agreement. But without sufficient reciprocation, deception prevented consensus. These simulations show that peer-to-peer negotiation of conventions in language games remains viable in a social environment where deception is prevalent, as long as a socially-informed mechanism governs the agents’ choices between cooperation and deception. Bootstrapping a symbolic system of communication can even occur in parallel with the formation of trust relations. This demonstrates that trust need not be permanent or unconditional for communication to develop and remain stable. Rather, reciprocity may serve as a proxy for honesty. Acknowledgments
This research has been conducted at the A1 Laboratory of the Vrije Universiteit Brussel, with funding from FWO project AL328. Emily Wang visited the A1 Lab during the 2006-07 academic year on a Fulbright fellowship sponsored by the U.S. Department of State. We would like to thank both Pieter Wellens and Joris Bleys for their insights on Naming Game dynamics. References
Axelrod, R., & Hamilton, W. (1981). The evolution of cooperation. Science, 21 1(4489),1390-1396. Crawford, V. P., & Sobel, J. (1982). Strategic information transmission. Econometrica, 50(6), 1431-1451. Dessalles, J-L. (2000). Language and hominid politics. In C. Knight, M. StuddertKennedy, & J. Hurford ( a s . ) , The evolutionary emergence of language: Socialfinction and the origins of linguistic form (pp. 62-79). Cambridge, UK: Cambridge University Press. Knight, C. (1991). Blood relations: Menstruation and the origins of culture. New Haven, CT: Yale University Press. Scott-Phillips, T. C. (2006). Why talk? Speaking as selfish behaviour. In Proceedings of the 6th international conference on the evolution of language (pp. 299-306). Steels, L. (1995). A self-organizing spatial vocabulary. ArtQicial Life, 2(3), 319332. Steels, L. (2008). Sociality is a crucial prerequisite for the emergence of language. In R. Botha & C. Knight (Eds.), The cradle of language. Oxford, UK: Oxford University Press. Trivers, R. (1971). The evolution of reciprocal altruism. Quarterly Journal of Biology(46), 35-57.
COPING WITH COMBINATORIAL UNCERTAINTY IN WORD LEARNING: A FLEXIBLE USAGE-BASED MODEL
PIETER WELLENS VUB AI-Lab, Pleinlaan 2, 1050 Brussels, Belgium [email protected] Agents in the process of bootstrapping a shared lexicon face immense uncertainty. The problem that an agent cannot point to meaning but only to objects, represents one of the core aspects of the problem. Even with a straightforward representation of meaning, such as a set of boolean features, the hypothesis space scales exponential in the number of primitive features. Furthermore, data suggests that human learners grasp aspects of many novel words after only a few exposures. We propose a model that can handle the exponential increase in uncertainty and allows scaling towards very large meaning spaces. The key novelty is that word learning or bootstrapping should not be viewcd as a mapping task, in which a set of forms is to be mapped onto a set of (predefined) concepts. Instead we view word learning as a process in which the representation of meaning gradually shapes itself, while being usable in interpretation and production almost instantly.
1. Introduction Word learning is commonly viewed as a mapping task, in which the learner has to map a set of forms onto a set of concepts (Bloom, 2000; Siskind, 1996). While mapping might seem more straightforward than having to shape word meanings, it is in fact more difficult and lies at the root of many problems. The view that word learning corresponds to mapping forms onto concepts is commonly accompanied by claims that a learner is endowed with several biases (constraints) that guide him toward the right mapping (Markman, 1989). Whether these constraints are language specific is yet another debate (Bloom, 2001). While this approach recognises the uncertainty it largely circumvents it by invoking these constraints. Another possibility is to propose some form of cross situational learning where the learner enumerates all possible interpretations and prunes this set when new data arrives. This second approach would seem to have a problem explaining fast mapping, since it takes a large amount of time before the initial set of hypotheses can be pruned to such an extent that it becomes usable. To be clear, we are not unsympathetic to the idea of word learning constraints, but we believe that it is only when viewing word learning as mapping that the constraints become as inescapable as they seem. In this publication we try to 370
371 show that by trading the mapping view for a more organic, flexible approach of word learning (in line with Bowerman and Choi (2001)), the constraints become less cardinal. Moreover, the enormous diversity found in human natural languages (Haspelmath, Dryer, Gil, & Comrie, 2005; Levinson, 2001) and the subtleties in word use (Fillmore, 1977) suggest that language learners can make few apriori assumptions and even if they would, they still face a towering uncertainty when homing in on more subtle aspects of word meaning and use. Some developmental psychologists emphasize human proficiency in interpreting the intentions of others (Tomasello, 2003) or our endowment with a theory of mind (Bloom, 2000). While being supportive of these ideas and even taking some for granted in our experimental set-up, it is important to understand that intention reading is no telepathy. It might scale down the problem, but not entirely solve it. Any of these skills have to be accompanied by a model capable of coping with immense uncertainty in large hypothesis spaces. Siskind (1996) and others propose models based on cross situational learning to bootstrap a shared lexicon. Unlike the current experimental setup their experiments do not address an exponential scale-up in the number of hypotheses. Other models such as De Beule and Bergen (2006), Steels and Loetzsch (2007), Steels and Kaplan (2000) in different ways allow exponential scaling but tend to keep the hypothesis space small. For example the experiments in De Beule and Bergen (2006) are limited to 60 objects represented by 10 distinct features (there called predicates). These papers, however, do not address scale-up and therefore do not claim to handle it.
2. Overview of the model Agents engage in series of guessing games (Steels, 2001). A guessing game is played by two agents, a randomly assigned speaker and hearer, sharing a joint attentional frame (the context). The speaker has to draw the hearer’s attention to a randomly chosen object (the topic) using one or more words in its lexicon. After interpretation, the hearer points to which he believes the speaker intended. In case of failure, the speaker corrects the hearer by pointing to the topic. To investigate referential uncertainty, which is the problem that an agent cannot point to meaning but only to objects, we must ensure that multiple equally valid interpretations exist upon hearing a novel word. It follows that explicit meaning transfer (i.e. telepathy) or a non structured representation of meaning are to be avoided. Even with an elementary representation of meaning such as sets of primitive features the number of possible interpretations scales exponential in the number of features, given that word meaning can be any subset of these featuresa. For example, upon hearing a novel word, sharing joint attention to an do not claim such a representation to be realistic, but we believe it is the minimal requirement that suits our current needs for investigating the problem of referential uncertainty.
372 I
I
I
(attribute]
0.8
(attribute] )4- ( (attribute)
Figure 1 . Left an association between form and meaning as in common in many models of lexicon formation, scoring the complete subset. Right the refinement suggested in the proposed model, which is related to fuzzy sets and prototype theory.
object represented by 60 boolean features, and having no constraints to favor particular interpretations the intended meaning could be any of 260 = 1.153 x l0ls possibilities. Confronted with numbers of such magnitude one wonders how a learner, given a stable input language, ever achieves in finding out the intended meaning, let alone a population of agents bootstrapping, from scratch, a shared lexical language. Word learning constraints seem to be the only viable way out. With the number of hypotheses per novel word well over the billions a learner cannot enumerate these possibilities and score them separately, neither can he make series of one-shot guesses and hope for the best since finding the correct meaning would be like winning i n lottery. The first step towards a solution is to include uncertainty in the representation of word meaning itself. This is done by keeping an (un)certainty score for every feature in a form-meaning association instead of keeping only one scored link per word as in for example (De Beule & Bergen, 2006) (see figure 1). This representation is strongly related to both fuzzy set theory (Zadeh, 1965) and prototype theory (Rosch, 1973). A crucial difference with traditional cross situational learning approaches is that this representation avoids the need to explicitly enumerate competing hypotheses. The key idea during language use is that a weighted similarity can be calculated between such representations. In the model we use a weighted overlap metric using the certainty scores as weights. In short, shared features increase similarity and the disjunct parts decrease it. Employing this similarity measure, production amounts to finding that combination of words of which the meaning is most similar to the topic and least similar to the other objects in the context. This results in context sensitive multi-word utterances and involves an implicit on-the-fly discrimination using the lexicon. The most important corollary of using a similarity measure is the great flexibility in word combination, especially in the beginning when the features have low certainty scores. Thanks to this flexibility the agents can use (combinations of) words that do not fully conform the meaning to be expressed, resembling what
373 Langacker (2002) calls extension. The ability to use linguistic items beyond their specification is a necessity in high dimensional spaces to maintain a balance between lexicon size and coverage (expressiveness). Interpretation amounts to looking up the meaning of all uttered words, taking the fuzzy union of their features and measuring similarity between this set and every object in the context. The hearer then points to the object with highest similarity, again making interpretation flexible. Flexible use of words entails that in a usage event some parts of the meanings are beneficial and others are not. If all features of the used meanings are beneficial in expressing the topic it would not be extension but instantiation, which is rather the exception than the rule. As Langacker (2002) puts it, extension entails “strain” in the use of the linguistic items which in turn affects the meanings of the used linguistic items. This is operationalised by slightly shifting the certainty scores every time a word is used in production or interpretation. The certainty score of the features that raised the similarity are incremented and the others are decremented resembling the psychological phenomena of entrenchment and its counterpart erosion. Features with a certainty score equal or less than 0 are removed, resulting in a more general word meaning. In failed games the hearer adds all unexpressed features of the topic to all uttered words, thus making the meanings of those words more specific. Combining similarity based flexibility with entrenchment and erosion, word meanings gradually shape themselves to better conform future use. Repeated over thousands of language games the word meanings progressively refine and shift, capturing frequently co-occurring features (clusters) in the world, thus effectively implementing a search through the enormous hypothesis space, capturing what is functionally relevant. Word invention is triggered when the speaker’s best utterance cannot discriminate the chosen topic. To diagnose possible misinterpretation the speaker interprets his own utterance before actually uttering it, which is crucial in many models (Batali, 1998; Steels, 2003). Given that his lexicon is not expressive enough, the speaker invents a new form (a random string) and associates to it, with very low initial certainty score, all so far unexpressed features of the topic. Because word meanings can shift, it might not be necessary to introduce a new word. Chances are that the lexicon needs a bit more time to be shaped further. Therefore the more similar the meaning of the utterance is to the topic, the less likely a new word will be introduced. The hearer, when adopting novel words, first interprets all known words and associates, again with very low certainty scores, all unexpressed features with all novel forms.
3. Experimental results In the multi-agent experimental setup we use a population of 25 agents endowed with the capacities described in the previous section. Machine learning data-sets
374
' "'I
'
Lexicon Coherence Communicative Success
01
1 ::
Figure 2. Left shows the performance of the proposed model on a small world (averaged over 5 runs), right for the much larger world (averaged over 3 runs) . Although the number of hypotheses scales exponential the agents attain high levels of communicative success and lexicon coherence while keeping reasonable lexicon size.
are used to obtain the large meaning spaces required to verify the claim that the model can scale to large hypothesis spaces. We use both a small data-set containing only 32 objects represented by 10 boolean features with context sizes between 4 and 10 objects, and a much larger data-set comprising 8124 objects represented by a total of 100 distinct boolean features and context sizes between 5 and 20 objects (Asuncion & Newman, 2007). This larger data-set confronts the agents with incredible amounts of uncertainty but the the results (figure 2 ) show that the model can manage this. The following measures are depicted:
Communicative Success (left axis): A running average (window of 500) of communicative success as measured by the agents. A game is considered successful if the hearer points to the correct topic. It is therefore different from communicative accuracy as in Vogt and Divina (2007), Siskind (1996). Lexicon Size (right axis): Represents the average number of words in the lexicons of the agents. Lexicon Coherence (left axis): Measures the similarity (using the same similarity measure the agents use) between the lexicons of the agents. Coherence of 1indicates that for all words all agents have the exact same features associated. It makes sense to be lower than 1 since it is not required to have the exact same meanings to be able to successfully communicate. The agents will not be aware of their (slightly) different meanings until a particular usage event confronts them with it. As a comparison we ran a model that does not score the individual features, but instead keeps a score for the meaning as a whole as in figure 1 (left). It does not employ a similarity measure and updates scores based on communicative success instead of the more subtle entrenchment and erosion effects. Results show (figure
375
,/
/' I Cominunicative Success -1
6o OR
t
....
Lexicon Size
0 6 .
04
.......
-
,/
...... 01
I ] '
Coininunicative Success 0
2-
IMW
m ( 1
8-
100000
Figure 3. Both graphs show the performance of a model that doesn't score the individual features and does not use a similarity measure. Left for the small meaning space, right for the larger space. The model achieves success on the small one, but fails to scale to the larger meaning space.
3) that the population can bootstrap a shared lexicon for small meaning spaces but cannot handle the scale up to the larger world. Also note that even in the small world the agents using this second model reach only 20% communicative success by game 20000 while with the proposed model they have already attained close to 99% communicative success by then. Data from developmental psychology suggests that human learners can infer aspects of the meaning of a novel word after only a few exposures. The graphs in figure 2 do not give us any insight on these issues as they show the average of a population in the process of bootstrapping a lexicon. By adding a new agent to a population that has already conventionalised a shared lexicon we are able to shed light on the behaviour of the proposed model regarding this issue. We use the large world (8124 objects, 100 features), a stabilised population with an average lexicon size of some 100 words and measure for a newly added agent the average success in interpretation in relation to the number of exposures to the word (see figure 4). The graph shows the average success in interpretation (i.e. the new agent pointed correctly) of all words, in relation to the number of exposures. Due to the way success is measured the first exposure is always a failure and so average success is zero. Quite perplexing, on the second exposure a whopping 64% of the novel words are used in a successful interpretation. Further exposures gradually improve this result and by the tenth exposure 70% of the words result in a successful interpretation. This is the more baffling taking into account that the other members of the population are unaware they are tallung to a new agent, and thus use multi-word utterances, including difficult to grasp words. 4. Conclusion The proposed model tries to capture and bring together some insights from cognitive linguistics (Langacker, 2002) and other computational models (Batali, 1998; Steels & Belpaeme, 2005; De Beule & Bergen, 2006), while taking for granted in-
376 E
.-0 4-
m
0.7 -
-
0.6 -
-
0.5 -
-
0.4 -
-
c
c
.-c .-C
u) u)
Q
!
0.3 -
u)
0
m
E
3
0.1
average success in interpretation
0 1
-+-
I
I
I
1
I
I
I
I
2
3
4
5
6
7
8
9
10
Number of exposures
Figure 4. The graph shows the performance in interpretation of one new agent added to a stabilised population. Quite perplexing the average success in interpretation at the second exposure to a novel word is already 64%.
sights from developmental psychology (Tomasello, 2003) and criticising assumptions made by others (Bloom, 2000; Markman, 1989). The main strength of modelling is that it can operationalise ideas and so our main goal is in showing that a more organic view on word learning combined with flexible language representation, use and alignment results in a powerful idea, both for scaling to very large hypothesis spaces and arriving at operational interpretations after very few exposures. Although our model can be interpreted as Whorfian this is only so if you assume that word meanings and concepts are one and the same. We did not make this assumption and do not take a position regarding the relation of concepts and word meanings.
Acknowledgements The research reported here has been conducted at the Artificial Intelligence Laboratory of the Vrije Universiteit Brussel (VUB). Pieter Wellens is funded by FWOAL328. I would like to thank my supervisor Luc Steels and the referees for their useful comments.
References Asuncion, A., & Newman, D. (2007). UCI machine learning repositoiy. Batali, J. (1998). Computational simulations of the emergence of grammar. In J. R. Hurford, M. S. Kennedy, & C. Knight (Eds.), Approaches to the evolution of language: Social and cognitive bases. Cambridge: Cambridge University Press. Bloom, P. (2000). How chiZdren learn the meanings of words. MIT Press.
377
Bloom, P. (2001). Roots of word learning. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 159-181). Cambridge: Cambridge University Press. Bowerman, M., & Choi, S. (2001). Shaping meanings for language: Universal and language-specific in the acquisition of spatial semantic categories. In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (pp. 132-158). Cambridge: Cambridge University Press. De Beule, J., & Bergen, B. K. (2006). On the emergence of compositionality. In Proceedings of the 6th evolution of language conference (p. 35-42). Fillmore, C. J. (1977). Scenes-and-frames semantics. In A. Zampolli (Ed.), Linguistic structures processing (p. 55-8 1). Amsterdam: North-Holland. Haspelmath, M., Dryer, M., Gil, D., & Comrie, B. (Eds.). (2005). The world atlas of language structures. Oxford: Oxford University Press. Langacker, R. W. (2002). A dynamic usage-based model. In Usage based models of language. Stanford, California: CSLI Publications. Levinson, S. C. (2001). Language and mind: Let’s get the issues straight! In M. Bowerman & S. C. Levinson (Eds.), Language acquisition and conceptual development (p. 25-46). Cambridge: Cambridge University Press. Markman, E. (1989). Categorization and naming in children: problems of induction. Cambridge, MA: BradfordlMIT Press. Rosch, E. (1973). Natural categories. Cognitive Psychology, 7, 573-605. Siskind, J. (1996). A computational study of cross-situational techniques for learning word-to-meaning mappings. Cognition, 61, 39-91. Steels, L. (2001). Grounding symbols through evolutionary language games. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (pp. 21 1-226). London: Springer Verlag. Steels, L. (2003). Language re-entrance and the inner voice. Journal of Consciousness Studies, 10, 173-185. Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences, 28(4), 469-89. (Target Paper, discussion 489-529) Steels, L., & Kaplan, F. (2000). Aibo’s first words: The social learning of language and meaning. Evolution of Communication, 4( l), 3-32. Steels, L., & Loetzsch, M. (2007). Perspective alignment in spatial language. In K. Coventry, T. Tenbrink, & J. Bateman (Eds.), Spatial language and dialogue. Oxford: Oxford University Press. Tomasello, M. (2003). Constructing a language. a usage based theory of language acquisition. Harvard University Press. Vogt, P., & Divina, F. (2007). Social symbol grounding and language evolution. Interaction Studies, 8(1). Zadeh, L. A. (1965). Fuzzy sets. Information and Control, 8, 338-353.
REMOVING ‘MIND-READING’ FROM THE ITERATED LEARNING MODEL
S. E WORGAN AND R. I. DAMPER Information: Signals, Images. Systems (ISIS) Research Group School of Electronics and Computer Science University of Southampton Southampton, SO17 I BJ, UK.
{swZOSr/rid}@ecs.soton. ac. uk The iterated learning model (ILM), in which a language comes about via communication pressures exerted over successive generations of agents, has attracted much attention in recent years. Its importance lies in the focus on cultural emergence as opposed to biological evolution. The ILM simplifies a compositional language as the compression of an object space, motivated by a poverty of stimulus-as not all objects in the space will be encountered by an individual in its lifetime. However, in the original ILM, every agent ‘magically’ has a complete understanding of the surrounding object space, which weakens the relevance to natural language evolution. In this paper, we define each agent’s meaning space as an internal self-organising map, allowing it to remain personal and potentially unique. This strengthens the parallels to real language as the agent’s omniscience and ‘mind-reading’ abilities that feature in the original ILM are removed. Additionally, this improvement motivates the compression of the language through a poverty of memory as well as a poverty of stimulus. Analysis of our new implementation shows maintenance of a compositional (structured) language. The effect of a (previously-implicit) generalisation parameter is also analysed; when each agent is able to generalise over a larger number of objects, a more stable compositional language emerges.
1. Introduction Hypothesising that language is a system of compression driven to adjust itself so that it can be learned by the next generation is a relatively new approach in the field of linguistics. Several important simulations (Kirby & Hurford, 1997; Kirby, 2001, 2002; Brighton, 2002; Smith, Kirby, & Brighton, 2003) have illustrated its potential and provide an alternative to established innate accounts of language (Chomsky, 1975; Bever & Montalbetti, 2002; Hauser, Chomsky, & Fitch, 2002). Currently, existing versions of this iterated learning model (ILM) suffer from a number of shortcomings, highlighted by Smith (2005), Vogt (2005), Steels and Wellens (2006). This paper will address some of these while maintaining the positive features of the model. In the classical ILM, an agent selects an object from its environment and produces a meaning-signal pair that is directly perceived by a listener. The pairing 378
379
is formed through a weighted connection between a meaning node and a signal node, and is used to adjust the weighted connections between the meaning space and the signal space of the listening agent. In this way, a language evolves across a number of generations. If each agent is only given the associated signal for a small subset of possible objects, it is forced to generalise across the remaining object space, so promoting the formation of a stable compositional language.
2. Shortcomings of the Iterated Learning Approach In the ILM, the agents’ meaning space loosely represents the ‘mind’ of a language user, In many respects, however, this analogy breaks down, as each agent is created with a perfect knowledge of the surrounding object space, which is never found in reality. We need to consider the nature of the object space and the agents’ ability to generalise across it. Also, a learning agent directly observes each meaning-signal pair, and this introduces an element of ‘mind-reading’, as the learner knows exactly what the adult teacher was thinking when it produced a signal. Obviously, this weakens the ILM’s credentials as a simulation of cultural language evolution. Kirby (2002, p. 197) himself acknowledges this criticism, writing “the ready availability of signals with meanings neatly attached to them reduces the credibility of any results derived from these models”, whereas Smith et al. (2003, p. 374) write: “This is obviously an oversimplification of the task facing language learners.” We aim to develop a new ILM to address these criticisms. Let the iterated learning approach yield a language, able to describe every object found in the object space, N,through a process of compression, governed by a form of generalisation. This compression is possible by forming a cornpositional language, which describes common features of objects in the space. Figure l(a) illustrates how a compositional meaning node is able to define partially a number of objects. In the original ILM, this is automatically determined by the number of values, V ,in the object space, e.g., in Fig. l(a) each compositional meaning node is able partially to define V = 4 objects. An implicit generalisation parameter y then determines the proportion of these V values that each meaning node can generalise over: in Fig. l(a), y = 1. This parameter, ignored in previous work, impacts significantly on the structure of the final compositional language. To understand the role of the environment in the emergence of language, we need to consider what happens when the generalisation parameter y is not equal to 1. Figure l(b) shows the compression which results from halving the, now explicit, generalisation parameter. We see that 4 meaning nodes-rather than 2 as previously-are now required to specify the same number of object nodes (i.e., poorer generalisation). In this example, y = 0.25 would correspond to a holistic, non-compositional language (i.e., no generalisation). Having acknowledged the role of this (previously-implicit) generalisation parameter, we are now able to remove the ‘mind-reading’ abstraction from our
meaning node = (>1,3)
(a) Y = 1
(b) y = 0.5
Figure I . In an ILM, the object space is defined by the number of object values V in each of F dimensions. In this example, F = 2 and V = 4. In the original ILM in (a), the generalisation parameter y,representing a proportion of object values, is implicitly set to 1. By varying y as in (b), where y = 0.5, we can vary the level of compression that each compositional meaning node can achieve.
simulations. To do this, we will define the agent’s meaning space as a selforganising map (SOM) and y as a radius around a selected object, removing the two criticisms of IL stated above. An agent no longer has complete and perfect knowledge of the object space, and this knowledge remains private so that each agent develops a different ‘understanding’ of its linguistic environment.
3. Self-organking Maps and Iterated Learning Self-organking maps (Kohonen, 1982)have previously been used to good effect to model emergent phonology (e.g., Guenter & Gjaja, 1996; Oudeyer, 2005; Worgan & Damper, 2007). In the present work, SOMs offer a way to model each agent’s unique and private understanding of its environment. Our model is based on the neural network model of (Smith et al., 2003, Sect. 4.2.1), but with important differences motivated by the discussion of Section 2 and described explicitly in this section. In this environment, an object can be defined as, e.g., xk = {1,2}, and in the meaning space as mj = { 1,2}. Equivalently, it can be defined as the pair:
m(, = {I,*} m;+l
=
{*,2)
where * represents a wildcard. In this example, mj forms a holistic signal, as this individual meaning node is only capable of defining one object, whereas mi and together form a compositional signal, as features from the object space are defined by the two meaning nodes and can be combined to define an individual object. These feature definitions can then be used in other combinations to describe other objects. We will maintain this aspect of traditional IL by redefining generalisation as a variable radius around a perceived object. The weightings on the connections between nodes of the meaning and signal spaces determine the mapping from meaning-to-signal and from signal-
381
to-meaning. The object space, N , that each agent talks about is represented by a simple coordinate system and a subset of these coordinates is drawn from the object space according to a uniform probability distribution. Each object in turn is mapped directly to the appropriate meaning node in the agent’s meaning space. The signals, l i , are generated by mapping from this meaning space to the signal space, and are represented as characters from an alphabet, C as: li = { ( S l , s2,.
. . , s i , . . . , Sl) : si E c , 1 5 1 5 lmax}
(1)
from which it is clear that we need a sufficient number of signal nodes to express any of the nodes in the meaning space. Formally, the object space is: with
N
=
{X~,X~,...,X~,...,ZN}
2k
=
{ ( f i , f z , . . . ,f , , . . . , f ~ ) : l < f i I V }
When required to produce an utterance, an agent will select an object X k , and each node in the meaning space mj competes to have the shortest euclidean distance from this point. Formally, if we define the closest node as m ( 2 k ) then: m ( z k )
=argmin11x-mjII, 3
j = 1 , 2 , . . . ,1
(2)
The winning node is then moved closer to the selected point, better defining the object space as a whole. In addition, neighbouring nodes are moved somewhat closer to the object, allowing the network as a whole to represent the experienced object space. The extent to which these nodes move is determined by a gaussian function, h j , k , centred around the selected object (Haykin, 1999, p. 449): hj,k
= exp
(-3)
with 0 = y
where d j , k is the distance between the winning neuron j and the excited neuron k. To form a compositional signal, we build valid decomposition sets from the meaning space, governed by the generalisation parameter, y. We can then define a set, K k , containing all of those meaning nodes which fall inside the radius around x k . Formally:
Considering all possible decompositions in turn, the agent will pick the signal, with the highest combina\tion of corresponding weight values according to:
382 which is similar to Smith et al.’s equation on p. 380, in that w ( K ( z ) j )“. . . is a weighting function which gives the non-wildcard proportion of . . .” K ( z ) j , so favouring compositional meaning nodes. All meaning and signal nodes that correspond to a possible decomposition of the object are activated, with activations a,,and am,, respectively. If two active nodes are connected, the weight on that connection is increased. If there is a connection between an active node and an inactive node the weight is decreased. Weights between two inactive nodes remain unchanged. The learning displayed by this Hebbian network can be formalised as follows:
AWY =
{
+1 -1 0
iff a,? = am, = 1 iff a s t # am, otherwise
(6)
where A W , ~is the weight change at the intersection between s, and m3, s, E N s and m3 E N M . While listening to each utterance, the weight values of the agent are adjustedextending its knowledge of the current language. This hypothesis allows it to generalise to objects it has not encountered before, resulting in a meaningful expression. Therefore, a poverty of stimulus causes the language to generalise across an objcct spacc. Additionally, by having a limited number of nodes form the meaning space, the agent does not have an infinite memory resource to draw upon, forcing compression through limited memory as well as limited stimuli. Using this model, we will vary y in order to assess how this affects the stability, S , of the final compositional language:
where Sc represents the proportion of compositional languages and sh defines the proportion of holistic languages, which emerge over cultural time. The higher the value of S , the more likely is a compositional language to emerge-see Smith et al. (2003,p. 377). In the new model, each agent’s meaning space is undefined at birth (randomly initialised) and will need to learn the structure of the object space as each object is encountered. Consequently, the meaning space gradually comprehends the object space but also remains potentially unique to each agent, as a different subset of objects is encountered. 4. Results
We first ran the new SOM iterated learning model under the same conditions as the previous implementation, see Figure 2. As we can see from the results, compositional languages emerge ( S > 0.5) under a similar set of circumstances
383
to Smith et al.’s (2003) previous implementation. Therefore, the requirements for a tight bottleneck and a structured meaning space remain in this implementation. S
S
0.9 0.6
0.5
10
(a) 10%
(b) 90%
Figure 2. Stability of the resulting languages, calculated according to equation 7, when each agent is exposed to some percentage of the object space (Smith et al.’s “bottleneck” parameter).
Next, we considered the effect of varying the generalisation parameter, 7, as shown in Figure 3. The higher the generalisation, the greater the stability, S , of the compositional language and, conversely, the lower the generalisation, the lower the stability. This highlights the importance of the previously implicit generalisation parameter on the final stability of the compositional language. Accordingly, a reasonable level of generalisation is required to enable cultural emergence.
(a) Y = 2
(b) y : 0.5
Figure 3. Stability of the resulting languages when each agent is exposed to 10% of the object space, with different degrees of generalisation: (a) y = 2, (b) y = 0.5. Here y has been reformulated as a gaussian width, as shown in equations 3 and 4
Figure 4 shows how structuring the object space allows each meaning node to generalise over a greater number of objects, increasing the stability S . As we can see, the potential generalisation of each meaning node is not as effective as fewer objects are located in each generalisation area, the compositional meaning node can only generalise across two objects in the unstructured object space of
384 Fig. 4(b). This gives us greater insight into Smith et al. (2003)’s comparison of structured and unstructured meaning spaces. By considering these results in terms of y we can see how these meaning spaces indirectly affect the level of potential generalisation.
.........................
~__________-________________________I
meaning node I
=
(a) Structured space
(z? 1 (b) Unstructured space
Figure 4. In a structured object space, each meaning node generalises over a greater number of objects.
5. Conclusions In this paper, we have addressed some criticisms of the well-known iterated learning model of cultural language emergence, most notably the ‘mind-reading’ aspect of earlier ILM implementations. This was achieved using self-organising maps to model each agent’s meaning space. The result is a closer analogy to real cognitive spaces. Specifically, the meaning spaces are limited in the amount of memory resource they have available, and are not omniscient. Rather they are private and unique to each agent. The SOM does not have a high enough capacity to completely define the agents’ environment-forming a further motivation to generalise. We have made explicit the generalisation parameter that was previously implicit to earlier ILM’s and demonstarted its role in promoting emergence of compositionality. As well as being unique to each individual, the learning displayed by the SOM demonstrates another property of real language learners: namely, change over time with each new encountered object. These enhancements, or improvements, to the classical iterated learning framework are gained without compromising the essential tenets of the paradigm. As with the classical framework, stable, compositional languages emerge through use (i.e., inter-agent communication related to structured object spaces) over cultural time. Further, the poverty of stimulus encountered both in reality and in our simulations remains essential in the evolution of a structured language, rather than a ‘problem’ as in the Chomskyian tradition. Although in this work, we have relaxed or removed some of the weakening assumptions in the classical ILM, much remains to be done. There are still many strong simplifications and abstractions concerning the nature of language and communication utilised in our computer simulations. One important direction
385 for future work is to move towards acoustic (‘speech’) communication- having agents produce and perceive sounds coupled to meaning, as suggested by Worgan and Damper (2007).
References Bever, T., & Montalbetti, M. (2002). Noam’s ark. Science, 298(22), 1565-1566. Brighton, H. (2002). Compositional syntax from cultural transmission. Artificial Life, 8( l), 25-54. Chomsky, N. (1975). Rejections on language. New York, NY: Pantheon. Guenter, F. H., & Gjaja, M. N. (1996). The perceptual magnet effect as an emergent property of neural map formation. Journal of the Acoustical Society of America, 100(2), I1 11-1 121. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298(22), 15691579. Haykin, S. (1999). Neural networks: A comprehensivefoundation (Second ed.). Upper Saddle River, NJ: Prentice Hall. Kirby, S. (2001). Spontaneous evolution of linguistic structure: An iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2), 102-1 10. Kirby, S. (2002). Natural language from artificial life. Artificial Life, 8(2), 185215. Kirby, S., & Hurford, J. (1997). Learning, culture and evolution in the origin of linguistic constraints. In P. Husbands & I. Harvey (Eds.), Fourth european conference on artificial life (pp. 493-503). Cambridge, MA: MIT Press. Kohonen, T. (1982). Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43( l), 59-69. Oudeyer, P.-Y. (2005). The self-organization of speech sounds. Journal of Theoretical Biology, 233(3), 435-449. Smith, A. D. M, (2005). The inferential transmission of language. Adaptive Behaviour, 13(4),31 1-324. Smith, K., Kirby, S., & Brighton, H. (2003). Iterated learning: A framework for the emergence of language. Artificial Life, 9(4), 371-386. Steels, L., & Wellens, P. (2006). How grammar emerges to dampen combinatorial search in parsing. In Third international symposium on the emergence and evolution of linguistic communication (eelc 2006). Published in Symbol grounding and beyond, Springer Verlag LNAI Vol. 421 1, pp. 76-88. Vogt, F! (2005). The emergence of compositional structures in perceptually grounded language games. ArtiJicialIntelligence, 167(1-2), 206-242. Worgan, S. F., & Damper, R. I. (2007). Grounding symbols in the physics of speech communication. Interaction Studies, 8( l), 7-30.
HOW DOES NICHE CONSTRUCTION IN LEARNING ENVIRONMENT TRIGGER THE REVERSE BALDWIN EFFECT?
HAJIME YAMAUCHI School of Information, Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa hoplite @jaist.ac.j p Deacon (2003) has suggested that one of the key factors of language evolution is not characterized by increase of genetic contribution, often known as the Baldwin effect, but rather the opposite: decrease of the contribution. This process is named the reverse Baldwin effect. In this paper, we will examine how through a subprocess of the reverse Baldwin effect can be triggered by the niche-constructing aspect of language.
1. Introduction While the Baldwin effect describes how previously learnt knowledge becomes a part of innate knowledge, according to Deacon, under some circumstances, innate knowledge would be replaced by more plastic, learnt knowledge. As the process seemingly follows the opposite flow of what the Baldwin effect describes, he called this process the “reverse Baldwin effect” (Deacon, 2003). This effect is thought to have a strong explanatory power, which has already been applied to explain such phenomena as the mysterious loss of the ability to synthesize Vitamin C (Deacon, 2003) in primate lineage. This paper will present how the niche constructing aspect of language evolution serves as one of the key mechanisms necessary for the reverse Baldwin effect without assuming, as Deacon has, that externally motivated changes (like climate changes) in environmental conditions would take place.
2. Masking and Unmasking processes Unlike the Baldwin effect, where a simple interaction between learning and evolution produces a complex evolutionary process, the reverse Baldwin effect consists of two distinct processes which take place serially. These subprocesses are called the “Masking” and the “Unmasking” effects, respectively. The masking effect is triggered by an environmental change shielding an extant selective pressure, and neutralizes genetic differences. The neutrality permits genes to be drifted. The unmasking effect states that after a long period of this neutralization, another environmental change takes place and this time brings back the original selective 386
387 pressure. Because of the drift, the population has to develop other ways to deal with the change. Wiles, Watson, Tonkes, and Deacon (2005) demonstrates that this increases the overall phenotypic plasticity of individuals, hence it is called the reverse Baldwin effect. Given the potential explanatory power of the reverse Baldwin effect, Deacon (2003) envisages that it could play a significant role in language evolution. However, it is apparent that, for the reverse Baldwin effect to take place, there needs to be some causal agent to induce at least the masking effects. In the case of vitamin C , it was the warm climate (and abundant fruits). Deacon considers the potential masking agent in language evolution is its niche constructing process. However, it is unclear quite how the niche constructing process comes into play as regards the masking effect. 3. Computer simulation
In order to examine how the niche constructing property of language induces the masking effect, we set an agent-based computer simulation based on (Yamauchi, 2004). In the simulation, agents in the same generation attempt to establish communications with their learnt grammar (i.e., I-language) which constructs a normative social niche (i.e., E-language) which works as a selective environment, determining the gents’ fitness. The E-language becomes the next generation’s learning environment, from which learning agents receive linguistic inputs. As such, information in a given I-language is transmitted vertically through the channels of learning and genes. During learning, if a linguistic input cannot be parsed with the agent’s current grammar, she changes her grammar so as to be able to parse it. The cost of such modifications is calculated based on what type of genetic information she has: if her genetic information is consistent with the input, the cost will be less than when it is inconsistent with the input.
3.1. Model Structure
1. The Agent An agent has a chromosome containing 12 genes for coding the innate linguistic knowledge. There are two possible allelic values; 0 and 1. The initial gene pool consists of 0s and 1s randomly. A grammar is coded as a ternary string, and the length of the string is 12 -equal to the size of the chromosome. Three possible allelic values are 0, 1 and NULL. Wherever there is a NULL allele in the grammar, this part of the grammar is considered NOT to code any linguistic knowledge. Therefore, the more NULL alleles there are in a grammar, the smaller the size of the envelope of one’s language. The agent is equipped with a cognitive capacity which enables the agent to update her grammar when her grammar cannot parse an incoming linguistic input. Also, with this cognitive capacity she can partially invent her own
388 knowledge of grammar: The energy resource of the capacity is limited, and its size is represented as a vector value which is set to 24 in this particular simulation.
2. Learning Every agent in every generation is born with a completely empty grammar; all 12 alleles are NULL. Learning is the process to update such NULL alleles to substantial alleles (i.e. 0s and 1s). A learning agent seqnentially receives linguistic inputs from 5 adult neighbors. Adults are the agents from the previous generation. A linguistic input is thought of as an utterance of an adult, which is represented by one allele of her mature grammar. Utterances derived from NULL alleles are considered as NULL utterances, and no learning (thus no grammar update) takes place. Following is the algorithm to develop the grammar: Learning Algorithm Whenever the learner receives a linguistic input: 1. If the input value and the allelic value of the corresponding locus of the learner’s grammar are different (i.e., not parsable), carry out the following procedures: (a) If the corresponding allele of the chromosome “matches” (i.e. the two values are the same) the input, update the given allele of the current grammar, and subtract 1 point from the energy resource. (b) If the corresponding allele of the innate linguistic knowledge is different from the input, update the given allele of the current grammar, and subtract 4 points from the energy resource.
2. Otherwise keep the current grammar. The subtractions from the energy resource are thought of as internal cost of learning. It is internal, as this does not directly affect an individual’s fitness value. The learning procedure is stopped when either the energy resource reaches 0, or the number of inputs reaches 120 (the critical period). NULL utterances are counted for this process. Any locus of the grammar not receiving any input (or receiving only NULL utterances) remains NULL. Who makes an utterance and which part of her grammar is provided as an input is totally random. This means that if the adults have totally different grammars, the learner may update a given allele of her grammar frequently.
3. Invention Agents are capable of inventing their own grammar. If an agent still holds NULL alleles in her grammar after learning has taken place, and if her energy resource has not yet become 0, with a probability of .01, pick one NULL allele randomly, and flip it to either 0 or 1 randomly, and subtract 1 point from the resource. This process is carried out until either no more NULL alleles remain in the grammar, or the resource reaches 0. Once
389 the invention process is over, her grammar is considered to have reached a mature state, and no more grammar updates take place.
4. Communication Each agent is involved in 6 communicative acts with her immediate neighbor peers. The size of the fitness increase is 1 according to each parsable utterance using a mature grammar spoken to a hearer (it benefits both the speaker and the hearer). The representation of an utterance is the same as for learning input. As each neighbor also speaks to each agent the same number of times, a total of 12 communicative acts are involved in gauging her fitness. The maximum fitness value is 13, as those who cannot establish any communication still receive a fitness score of 1 in order to maintain the possibility of being a parent in Reproduction.
5. Reproduction Rank selection is used for selecting parents according to their fitness: the top 50% of agents can be selected by equal chance. Single-point crossover is used, and mutation rate is set to ,001 per allele. In the simulation, 200 agents are spatially organized: Individuals are placed on a one-dimensional loop (thus one of the immediate neighbors of the 200th agent is the 1st agent). Incidences of communication only take place within a generation, and are local, since an individual attempts to communicate with her two immediate neighbor peers. While communication is an adult-to-adult process that results in natural selection, learning is thought of as a vertical, adult-to-child transmission which results in cultural inheritance. One adult provides linguistic inputs for 5 neighbor learners (from the learner’s point of view, she receives the inputs from 5 immediate neighbor adults). Together with the model design, the spatial structure described above enables the agents to construct their own grammars, and hence their linguistic communities locally, and pass them on to the next generation. In this model, two closely-related, but different types of niche construction take place: First, the selective environment is dynamically constructed, as agents in earlier generations gradually build their own grammars through inventions, and collectively they form a linguistic community. Because utility of a given grammar in a given linguistic community depends on the specific linguistic demography of the community, the mode of selection through communicative acts is frequency dependent: a type of network effect takes place, and such an effect is created by their own activities. Second, because linguistic activities in a given generation become the next generation’s learning inputs, what types of language agents can potentially learn is largely determined by their ancestors’ activities. This may not be a niche construction in a traditional sense, as learning does not directly receive selective pressure. However, we believe that this mode of construction should be called “niche construction” in its own right: It defines what class of language can be learnt, and becomes the primal cause determining the direction of the assimilatory process of the Baldwin effect. It is this type of niche construction that would mainly serve as the masking agent.
390 4. Results
All figures shown here are taken from one typical run of the simulation under the conditions described, and as such they well characterize the general tendency of the model. Figure 1 shows the average fitness of the population over time with a solid line, and the average number of NULL alleles in matured grammars with a dashed line. Rapid increase of fitness shows the whole population quickly evolves to almost the optimal state as they develop their linguistic knowledge (i.e. reduction of NULL alleles). In order to increase their fitness, agents not only have to increase the size of their linguistic envelope, but also have to develop coherent grammars with other neighbor peers so as to successfully establish communications with them. As a result of this, the agents construct a highly coherent linguistic community. However around the 2500th generation, the stable state breaks drastically, and returns to normal afterward. Figure 2 summarizes the evolutionary transition of learning and genetic assimilation. In the figure, the solid line shows the remaining energy resource after the learning procedure has been completed (but before the invention process). This indicates intensity of learning (the lower the line, the higher the intensity). The dashed line shows the similarity between an agent’s genotype and her learnt grammar (this is also measured before the invention process takes place). This indicates how much of the learning environment is assimilated by the genepool. From the data, it can be said that the whole genepool seems to well assimilate the learning environment rather quickly (i.e., genetic assimilationhhe Baldwin effect), while the intensity of learning is slowly evolving. In contrast to Figure 1, the two data do not exhibit a radical degradation. Instead the transition of both figures from the highest to the lowest is rather gradual (i.e., from the 600th generation to the 2500th generation). However the recovery is similar across different data: within a matter of a hundred generations, all figures are return to their highest scores. This indicates that another assimilatory process takes place which is much quicker than the first one.
5. Analysis The overall result provides a somewhat perplexing picture of the evolution of linguistic knowledge and its genetic endowment. Although both Figure 1 and Figure 2 indicate that something significant happens around the 2500th generation, the data in the two figures exhibit quite different profiles, especially between about the 600th generation and the 2500th generation. From Figure 1, one may well assume that something happens within a quite short period. On the contrary, the graphs in Figure 2 indicates that a substantial process silently goes on. In other words, although the selective pressure has not radically changed over the generations, the learning process undergoes something significant. To get a clearer picture, in Figure 3, the graph from Figure 2 is superimposed
391 14
I
I .a
\
Fitness
NULL \, ,*,+
0
500
1000
1500
,.-, Jbpq
2000
2500
, 3000
3500
4000
Genera tion Figure I. Evolution of communicative success measured by agents’ fitness values, and the number of NULL alleles in their grammar. Both are average over the population size.
on the spatio-temporal diagrams of agents’ grammars. Each dot corresponds to one agent, and its color is assigned to one grammar type. The 200 spatially organized agents are plotted on the y-axis. Note that the color pattern of the graph rapidly becomes monotonic, indicating that the whole population converges into a monolithic linguistic community. This is because of the first assimilatory process based on the niche constructing properties of language (Baldwinian Niche Construction, Yamauchi, 2004). Once the community has converged, almost every learner receives the same inputs from her neighbor adults: The learning environment is niche-constructed so that it becomes a “Species-typical environment” (Morton, 1994). This reduces the importance of genetic endowment once it has contributed to constructing the monolithic community; even if her genotype is not fully assimilated to the dominant grammar, learning can easily compensate for the discrepancy. In other words, under this niche-constructed monolithic community, genes are “masked” from selective pressure by the learning capacity, namely the masking effect. In the same vein, a learner can compensate for some “input noise” from adults who misconvergeda her grammar from the dominant one. We can tell these from the figure: between about the 600th generation and the 2000th genera“Note that the words “misconverge” and “noise” are used here in a relative sense: Utility of a given grammar hinges on the local demography of the community, and as such these words simply refer a situation that an agent possesses a grammar which is different from other neighbors.
392 I
I OO
Gene-Grammar Match
500
1000
1500
2000
2500
3000
3500
I
Generation
Figure 2. Evolution of learning measured by the remaining energy resource, and the degree of assimilation.
tion, although both the remaining energy and the degree of assimilation decrease, almost no apparent change is observable from the diagram. The observable noise starts to appear roughly from the 2000th generation. It is closely related how much a learner can adjust her grammar against either mal-assimilated genes or input noise. Subsequently, genetic drift is gradually introduced (this appears in the data of the gene-grammar similarity which slowly, yet steadily decreases). This means that some agents are potentially incapable of learning the dominant grammar. Such misconverged agents steadily increase (this can be observed from the diagram, as the generation proceeds from the first assimilation, “random noise” visually increases). These go hand in hand with the increase of the learning intensity. Finally, the learning intensity hits the highest point, and no more learning can take place. This prevents some learners from reducing all NULL alleles. At this stage, the effect of genetic drift first surfaces on the average fitness. This produces a new selective pressure for another assimilation. This later process may be comparable to the unmasking effect, but we will not deal with this in detail here.
6. Conclusion This experiment confirms that the niche-constructing aspect of language, especially in the language learning environment, indeed provides the masking effect which creates neutrality among different genotypes, and subsequently induces ge-
393
I
0
500
2000
2500
I
1 GsneratiGfl 1000
I 1500
I
I
3000
3500
G9flemHGfl
2000I
I
4000
Figure 3. The data from Figure 2 are superimposedon the spatio-temporaldiagrams of the grammars present in the population across the generations.
netic drift. Baldwinian niche construction is responsible for both the strong uniformity of the linguistic community, and the high fidelity of genetic information to the dominant language. References Deacon, T. W. (2003). Multilevel selection in a complex adaptive system: The problem of language origins. In B. H. Weber & D. J. Depew (Eds.), Evolution and learning (p. 81-106). Cambridge, MA: The MIT Press. Morton, J. (1994). Language and its biological context. Philosophical Transactions: Biological Sciences, 346( 1315),5-11. Wiles, J., Watson, J., Tonkes, B., & Deacon, T. (2005). Transient phenomena in learning and evolution: Genetic assimilation and genetic redistribution. Art$cial Life, Z1(1-2), 177-188. Yarnauchi, H. (2004). Baldwinian accounts of language evolution. Unpublished doctoral dissertation, The University of Edinburgh, Edinburgh, Scotland.
This page intentionally left blank
Abstracts
This page intentionally left blank
COEXISTING LINGUISTIC CONVENTIONS IN GENERALIZED LANGUAGE GAMES
ANDREA BARONCHELLI
Departament de Fisica i Enginyeria Nuclear, Universitat Polittcnica de Catalunya Barcelona, 08034,Spain [email protected] LUCA DALL‘ASTA
Abdus Salam International Center for Theoretical Physics Trieste, 34014,Italy [email protected] ALAIN BARRAT LP1; CNRS (UMR 8627) and Univ Puris-Sud, Orsay, F-91405 and Complex Networks Lagrange Laboratory, ISI Foundation, Turin, 10133, Italy [email protected]
VITTORIO LORETO
Dipartimento di Fisica, Universita di Roma “La Sapienza”, Roma, 00185, Italy and Complex Networks Lagrange kboratory, ISI Foundation, Turin, 10133, Italy [email protected]
The Naming Game is a well known model in which a population of individuals agrees on the use of a simple convention (e.g. the name to give to an object) without resorting to any central coordination, but on the contrary exploiting only local interactions (Steels, 1996; Baronchelli, Felici, Caglioti, Loreto, & Steels, 2006). It is the simplest model in which the idea that language can be seen as a complex adaptive system (Steels, 2000) has been applied and challenged and it has therefore become prototypical. Indeed, its simplicity has allowed for an extensive application of complex systems concepts and techniques to various aspects of its dynamics, ranging from the self-organizing global behaviors to the role of topology, and has made it one of the most studied models of language emergence and evolution (Baronchelli, Felici, et al., 2006; Baronchelli, Dall’Asta, Barrat, & Loreto, 2006). However, while the Naming Game provides fundamental insights into the mechanisms leading to consensus formation, it is not able to describe more complex scenarios in which two or more conventions coexist permanently
397
398 in a population. Here we propose a generalized Naming Game model in which a simple parameter describes the attitude of the agents towards local agreement (Baronchelli, Dall’Asta L., Barrat, & Loreto, 2007). The main result is a non-equilibrium phase transition taking place as the parameter is diminished below a certain critical value. Thus, the asymptotic state can be consensus (all agents agree on a unique convention), polarization (a finite number of conventions survive), or fragmentation (the final number of conventions scales as the system size). More precisely, it turns out that, tuning the control parameter, the system can reach final states with any desired number of surviving conventions. Remarkably, the same dynamics is observed both when the population is unstructured (homogeneous mixing) and when it is embedded on homogeneous or heterogeneous complex networks, the latter being the most natural topologies to study the emerging properties of social systems (Baronchelli, Dall’Asta, et al., 2006). We investigate the general phenomenology of the model and the phase transition in detail, both analytically and with numerical simulations. We elucidate the mean-field dynamics, on the fully connected graph as well as on complex networks, using a simple continuous approach. This allows us to recover the exact critical value of the control parameter at which the transition takes place in the different cases. In summary, our generalized scheme for the Naming Game allows us to investigate, in a very simple framework, previously disregarded phenomena, like for instance the possible coexistence of different linguistic conventions in the same population of individuals. The complex systems approach, moreover, provides us a deep understanding of the mechanisms determining the realization of the different asymptotic states, namely consensus, polarization or fragmentation.
References Baronchelli, A., Dall’Asta, L., Barrat, A., & Loreto, V. (2006). Bootstrapping communication in language games: Strategy, topology and all that. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution oflanguage: Proceedings of Evolang 6. World Scientific Publishing Company. Baronchelli, A., Dall’AstaL., A., Barrat, A., & Loreto, V. (2007). Nonequilibrium phase transition in negotiation dynamics. Phys. Rev. E, 76,051 102. Baronchelli, A., Felici, M., Caglioti, E., Loreto, V., & Steels, L. (2006). Sharp transition towards shared vocabularies in multi-agent systems. Journal of Statistical Mechanics, P06014. Steels, L. (1996). Self-organizing vocabularies. In C . G. Langton & K. Shimohara (Eds.), Artijicial Life V (p. 179-184). Nara, Japan. Steels, L. (2000). Language as a complex adaptive system. In M. Shoenauer (Ed.), Proceedings of PPNS VI. Lecture Notes in Computer Science. Berlin: Springer-Verlag.
COMPLEX SYSTEMS APPROACH TO NATURAL CATEGORIZATION
ANDREA BARONCHELLI Departament de Fisica i Enginyeria Nucleal; Universitat Politkcnica de Catalunya Barcelona, 08034, Spain andrea.baronchelli@upcedu VITTORIO LORETO, ANDREA PUGLISI Dipartimento di Fisica, Universita di Roma “La Sapienza ” Roma, 00185, Italy [email protected], [email protected]
Computational and mathematical approaches are nowadays well recognized tools to investigate the emergence of globally accepted linguistic conventions, and complex systems science provides a solid theoretical framework to tackle this fundamental issue (Steels, 2000). Following this path, here we address the problem of how a population of individuals can develop a common repertoire of linguistic categories. The prototypical example of the kind of phenomenon we aim to study is given by color categorization. Individuals may in principle perceive colors in different ways, but they need to align their linguistic ontologies in order to understand each others. Previous models have adopted very realistic and therefore complicated microscopic rules (Steels & Belpaeme, 2005), or evolutionary perspectives (Komarova, Jameson, & Narens, 2007). We assume the point of view of cultural transmission (Hutchins & Hazlehurst, 1995), and we introduce a new multi-agent model in which both individuals and their interactions are kept as simple as it is possible. This allows us to perform unparalleled systematic numerical studies, and to understand in details the mechanisms leading to the emergence of global coordination out of local interactions patterns (see (Baronchelli, Dall’ Asta, Barrat, & Loreto, 2006) for a discussion on this point). In our model (Puglisi, Baronchelli, & Loreto, 2007), a population of N individuals is committed to the categorization of a single analogical perceptual channel, each stimulus being a real number in the interval [0,1]. We identify categorization with a partition of the interval [0,1]in discrete sub-intervals, to which we refer as perceptual categories. Individuals have dynamical inventories of formmeaning associations linking perceptual categories to words representing their linguistic counterparts, and they evolve through elementary language games. At the 399
400
beginning all individuals have only the trivial perceptual category [0,1]. At each time step two individuals are selected and a scene of M 5 2 stimuli (denoted as Oi, with i E [l,MI) is presented to them. The speaker must discriminate the scene and name one object. The hearer tries to guess the named object, and based on her success or failure, both individuals rearrange their form-meaning inventories. The only parameter of this model is the just noticeable difference of the stimuli, &in, that is inversely proportional to the perceptive resolution power of the individuals. Thus, objects in the same scene must satisfy the constraint that loi - ojl > dmin for every pair ( i , j ) . The way stimuli are randomly chosen, finally, characterizes the kind of simulated environment. The main result is the emergence of a shared linguistic layer in which perceptual categories are grouped together to guarantee the communicative success. Indeed, while perceptual categories are poorly aligned between individuals, the boundaries of the linguistic categories emerge as a self-organized property of the whole population and are therefore almost perfectly harmonized at a global level. Moreover, our model reproduces a typical feature of natural languages: despite a very high resolution power and large population sizes (technically, also in the limit N 4 00 and dmin -+ 0), the number of linguistic categories is finite and small. Finally, we find that a population of individuals reacts to a given environment by refining the linguistic partitioning of the most stimulated regions.
References Baronchelli, A., Dall’Asta, L., Barrat, A., & Loreto, V. (2006). Bootstrapping communication in language games: Strategy, topology and all that. In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution oflanguage: Proceedings of Evolang6. World Scientific Publishing Company. Hutchins, E., & Hazlehurst, B. (1995). How to invent a lexicon: the development of shared symbols in interaction. In G. N. Gilbert & R. Conte (Eds.), Artijicial societies: The computer simulation of social life. UCL Press. Komarova, N. L., Jameson, K. A., & Narens, L. (2007). Evolutionary models of Color Categorization based on Discrimination. Journal of Math Psychology, to appear. Puglisi, A., Baronchelli, A., & Loreto, V. (2007). Cultural route to the emergence of linguistic categories. Arxiv preprint physics/O703164, submitted for publication. Steels, L. (2000). Language as a complex adaptive system. In M. Shoenauer (Ed.), Proceedings of PPNS VI. Lecture Notes in Computer Science. Berlin: Springer-Verlag. Steels, L., & Belpaeme, T. (2005). Coordinating perceptually grounded categories through language: A case study for colour. Behavioral and Brain Sciences, 28(04), 469-489.
REGULAR MORPHOLOGY AS A CULTURALADAPTATION: NON-UNIFORM FREQUENCY IN AN EXPERIMENTAL ITERATED LEARNING MODEL ARIANITA BEQA, SIMON KIRBY, JIM HURFORD School of Philosophy, Psychology & Language Sciences, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL, UK
One approach to explaining the origins of structure in human language sees cultural transmission as a key mechanism driving the emergence of that structure (e.g., Deacon 1997). In this view, universal features of language such as compositionality are an adaptation by language to the pressure of being successfully passed on from generation to generation of language users. Crucially, this adaptation is cultural rather than biological in that it arises from languages changing rather than language users. The support for this has mainly come from computational and mathematical modelling as well as observations of the distribution of compositionality in real languages. In particular, in morphology there appears to be a connection between high frequency forms and non-compositionality (a particular kind of irregularity). Kirby (200 l), in a computational simulation, demonstrates that this is just what one would expect of a cultural adaptation. If compositionality arises from the need for reliable transmission of forms for particular meanings then we would expect that need to be greater if those meanings were low frequency. An irregular form for a particular verb, for example, can only be acquired if that particular form is seen enough times by a learner. A regular form, on the other hand, is more reliably acquired because it is supported in part by evidence from all the other meanings that participate in the regular paradigm. Kirby, Dowman & Griffiths (2007) give further support for this result using a generalised mathematical model of cultural transmission. Despite this, there is still understandable skepticism about the realism and therefore applicability of such models. Can we be sure, for example, that the differential take-up of particular errors in linguistic transmission that drives adaptation in the models mirrors what happens in reality? In this paper we respond to these concerns by replicating the models of cultural transmission of regular and irregular morphology using real human subjects. Using the methodology pioneered by Cornish (2006), we examine the evolution of a verbal morphology in an artificial language. Experimental subjects were asked to learn 24 verbs in a simple language. Each verb was presented with a picture signifying its meaning. These denoted either a man or a woman performing some action allowing us to present a language whose verbs 40 1
402
marked gender. In the initial language we constructed, half of the verbs marked gender using a regular suffix attached to an invariant stem form (e.g. sagilir vs. sagilar), and the other half indicated gender through completely different forms for the masculine and feminine verbs (e.g.fuderi vs. vebadu). We further divided both sets of verbs into high frequency and low frequency types. In training, each low-frequency verb (whether regular or irregular) appeared 3 times, whereas the high-frequency verbs each appeared 10 times. After training, subjects were asked to try and recall the verb forms for all 24 actions. To implement cultural evolution, the output of each subject at test formed the language which the subsequent subject was trained on. We observed the evolution of the languages for 5 “generations” and repeated the experiment with 8 different initial randomly constructed languages (with different experimental subjects, of course). The initial languages are constructed to show no relationship between frequency and regularity - both frequent and infrequent verbs are equally likely to be irregular. However, the experiment confirms the modelling work: languages rapidly adapt so that infrequent forms become regular. We confirm this with statistical analysis of the emergent languages, and descriptive analysis of the process of language change and regularisation in the experiment. Our experiment confirms a) infrequent forms are harder to learn than frequent forms and b) regular forms ameliorate this difficulty. An adaptively structured language will ensure that infrequent meanings will participate in regular paradigms. The primary contribution of the experiment is c) a demonstration that just such an adaptive language can emerge in a very short time even when the initial state does not have these features. This occurs without any apparent conscious design on the part of the participants (whose native language, incidentally, does not inflect verbs for gender) and is instead a natural consequence of the cultural evolution of the artificial languages. References Cornish, H. (2006). Iterated learning with human subjects: an empirical framework for the emergence and cultural transmission of language. Master’s thesis, University of Edinburgh. Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. W. W. Norton. Kirby, S. (200 1). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2): 102-1 10. Kirby, S., Dowman, M., and Griffiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104( 12):524 1-5245.
NEURAL DISSOCIATION BETWEEN VOCAL PRODUCTION AND AUDITORY RECOGNITION MEMORY IN BOTH SONGBIRDS AND HUMANS JOHAN J. BOLHUIS Behavioural Biology and Helmholtz Institute, Utrecht Universi&, Padualaan 8 Utrecht, 3584 CH, The Netherlands
1. Emancipation of the bird brain
In the search for the neural mechanisms of vocal learning and memory, mammals are usually preferred to birds as model systems, because of their closer evolutionary relatedness to humans. However, a recent overhaul of the nomenclature of the avian brain (Jarvis et al., 2005) has highlighted the homologies and analogies between the avian and mammalian brain. In the revised interpretation of the avian brain it is suggested that the pallium (including the hyperpallium, mesopallium, nidopallium and arcopallium) is homologous with the mammalian pallium, including the neocortex, but that it is premature to suggest one-to-one homologies between avian and mammalian pallial regions. Within the avian forebrain, Field L2 receives auditory connections from the thalamus, and in turn projects onto Fields L1 and Field L3. These two regions project to the caudal mesopallium and caudal nidopallium, respectively. Thus, the Field L complex appears to be analogous with primary auditory cortex, in the mammalian superior temporal gyrus. In addition, the projection regions of the Field L complex (the caudomedial nidopallium, NCM and the caudomedial mesopallium, CMM) may then be analogous with the mammalian auditory association cortex. 2. The neural substrate of tutor song memory in songbirds
The process through which young songbirds learn the characteristics of the songs of an adult male of their own species has strong similarities with speech acquisition in human infants (Doupe & Kuhl, 1999). Both involve two phases: a 403
404
period of auditory memorisation followed by a period during which the individual develops its own vocalisations. The avian ‘song system’, a network of brain nuclei, is the likely neural substrate for the second phase of sensorimotor learning. In contrast, the neural representation of song memory acquired in the first phase is most probably localised outside the song system, notably in the NCM and CMM, regions within the likely avian equivalent of auditory association cortex (Bolhuis & Gahr, 2006). In zebra finches, neuronal activation (measured as expression of immediate early genes, IEGs) in the NCM correlated with the number of song elements that a male had learned from its tutor, suggesting that NCM may be (part of) the neural substrate for stored tutor song.
3. Neural dissociation between vocal production and auditory memory Bilateral neurotoxic lesions to the NCM of adult male zebra finches impaired tutor song recognition but did not affect the males’ song production or their ability to discriminate calls (Gobes & Bolhuis, 2007). These findings support the suggestion that the NCM contains the neural substrate for the representation of tutor song memory. In addition, we found a significant positive correlation between neuronal activation in the song system nucleus HVC and the number of song elements copied from the tutor, in zebra finch males that were exposed to their own song, but not in males that were exposed to the tutor song or to a novel song. Taken together these results show that tutor song memory and a motor program for the bird’s own song have separate neural representations in the songbird brain. Thus, in both humans and songbirds the cognitive systems of vocal production and auditory recognition memory are subserved by distinct brain regions.
References Bolhuis, J.J., & Gahr, M. (2006). Neural mechanisms of birdsong memory. Nature Reviews Neuroscience, 7 , 347-357. Doupe, A.J., & Kuhl, P.K. (1999). Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience, 22,567-63 1. Gobes, S. M. H., & Bolhuis, J. J. (2007). Bird song memory: A neural dissociation between song recognition and production. Current Biology, 17, 789-793. Jarvis, E., et al. (2005). Avian brains and a new understanding of vertebrate brain evolution. Nature Reviews Neuroscience, 6, 15 1-159.
DISCOURSE WITHOUT SYMBOLS; ORANGUTANS COMMUNICATE STRATEGICALLY IN RESPONSE TO RECIPIENT UNDERSTANDING ERICA A. CARTMILL AND RICHARD W. BYRNE School of Psychologv, Universiv of St Andrews St Andrews, K Y l 6 9JP, Scotland
When people are not fully understood, they persist with attempts to communicate, elaborating their speech in order to better convey their meaning. This combination of persistence and elaboration of signals is considered to be an important criterion for determining intentionality in human infants (Bates et al., 1979; Golinkoff, 1986; Lock, 2001; Shwe & Markman 1997), and plays an essential role in human language, allowing us to clarify misunderstandings and disambiguate meaning. Chimpanzees have been shown to use persistence and elaboration in requesting food items from an experimenter (Leavens et al., 2005), and so these abilities likely predate human symbolic communication. Persisting in one’s attempts to reach a goal and discarding signals if they have failed to achieve the desired response could be mediated by relatively simple mechanisms and do not require an understanding of the recipient as an autonomous player in a communicative event. However, responding to how well one’s message has been understood is a more complex ability, requiring at least a functional use of the recipient’s mental state. We investigated whether captive orangutans (Pongo pygmaeus and Pongo abelii) would use persistence and elaboration when signaling to a human experimenter, and whether they could adjust their communicative strategies in response to how well the experimenter appeared to understand their signals. Captive orangutans were presented with situations in which out-of-reach food items required human help to access but the experimenter sometimes “misunderstood” the orangutan’s requests. Using a partially modified design from Leavens et al. (2005), we offered subjects both a highly desirable and a relatively undesirable food, allowing them the opportunity to request one or the other food by gesturing. The experimenter was initially unresponsive, and then gave the orangutan the entire desirable food (full understanding), half the desirable food (partial understanding), or the entire undesirable food 405
406
(misunderstanding). We then compared the orangutans’ gestures before and after the receipt of food. The orangutans altered their communicative strategies according to how well they had apparently been understood (Cartmill & Byrne, 2007). When the recipient simulated partial understanding, orangutans narrowed down their range of signals, focusing on gestures already used and repeating them frequently. In contrast, when the recipient simulated misunderstanding, orangutans elaborated their range of gestures, avoiding repetition of failed signals. It is therefore possible, from communicative signals alone, to determine how well an orangutan’s intended goal has been met. They transmit not only information about their desires but also about the success of the communicative exchange. A human observer can tell how well the orangutan’s Communicative goal has been met by considering the types and patterns of gestures the orangutan uses following delivery of a food item. If orangutan recipients are able to use this information as well, then it might function within their species as a method of achieving understanding more quickly. In the absence of a shared lexicon, one way of arriving at a shared meaning is to transmit not only the content of the intended message but also an indication of how well you have been understood. If the recipient can use this information, then the signaler and recipient will be able to arrive at a common understanding much faster. It is possible that this strategy played a central role in the earliest stages of “language.” If early humans had few referential gestures or vocalizations that were shared by the entire group, the strategy employed by the orangutans could have functioned as a way to communicate about novel or unlabelled events. It is possible that such strategies could also have resulted in the creation or adoption of new labels, thus helping to expand an initially bounded communication system into a more flexible one, bringing it one step closer to full-blown language. References Bates, E., Benigni, L., Bretherton, I., Camaioni, L., & Volterra, V. (1979). The Emergence of Symbols. New York: Academic Press. Cartmill, E. A,, & Byrne, R. W. (2007) Orangutans modify their gestural signaling according to their audience’s comprehension. Current Biology. 17, 1345-1348 Golinkoff, R. M. (1986). “I beg your pardon?”: The pre-verbal negotiation of failed messages. J. ChildLang. 20,199-208. Leavens, D. A,, Russell, J. L., & Hopkins, W. D. (2005). Intentionality as measured in the persistence and elaboration of communication by chimpanzees (Pan troglodytes). Child Dev. 76, 291-376. Lock, A. (2001). Preverbal communication. In J.G. Bremner & A. Fogel (Eds.) Blackwell Handbook of Infant Developmenf (pp. 379-403). Oxford: Oxford University Press. Shwe, H., & Markman, E. (1997). Young children’s appreciation of the mental impact of their communicative signals. Dev. Psychol. 33,630-636.
TAKING WITTGENSTEIN SERIOUSLY. INDICATORS OF THE EVOLUTION OF LANGUAGE CAMILO JOSE CELA-CONDE Department of Philosophy and Social Work, University of the Balearic Islands, Crta. Valldemossa. km 7,s Palma de Mallorca, 07122, Spain MARCOS NADAL, ENRIC MUNAR, ANTON1 GOMILA Department ofPsychology, University ofthe Balearic Islands, Crta. Valldemossa. km 7.5 Palma de Mallorca, 07122, Spain
ViCTOR M. EGUILUZ IFISC (Institute for Cross-Disciplinay Physics and Complex Systems). University ofthe Balearic Islands and Consejo Superior de Investigaciones Cientipcas, Crta. Valldemossa, km 7,s Palma de Mallorca, 07122, Spain
“Wovon man nicht sprechen kann, dariiber mu$’ man schweigen” Proposition # 7. Ludwig Wittgenstein, Logisch-Philosophische Abhandlung Wilhelm Ostwald (ed.), Annalen der Naturphilosophie, 14 (1921) Should we follow Wittgenstein and be quiet regarding the evolution of language? Or would it be too pretentious, even pedantic, to conclude that the long discussions about the relation between animal and human communication, and the conclusions offered by those comparative studies of our speech, do not actually throw light on the evolution of language? Our contribution to this symposium will be limited to taking Wittgenstein seriously. In this respect, we will try to clarify what researchers are trying to find out when studying the evolution of language, what is actually known about this process, and what conclusions are justified by such evidence. The index of our examination will be as follows: 407
408
What are we talking about? Definition of the concepts of “evolution” and ‘‘language’’ Language as a functional apomorphy fixed by natural selection after the divergence of the Homo and Pan lineages 2. The study of functional apomorphies: available tools in the case of language phylogenesis (LP) 3. Fossil evidences of LP 4. Archaeological evidence of LP 5. Genetic findings that are informative of LP 6. Mathematical models of human language 1.
AN EXPERIMENT EXPLORING LANGUAGE EMERGENCE: HOW TO SEE THE INVISIBLE HAND AND WHY WE SHOULD
HANNAH CORNISH Language Evolution and Computation Research Unit, Universiiy of Edinburgh, UK [email protected]
The complex adaptive systems view of language sees linguistic structure arising via the interaction of three dynamical systems operating over different timescales; biological evolution over the life-time of the species, cultural evolution over the life-time of the language, and individual learning over the life-time of the speaker (Kirby & Hurford, 2002). The outcome is the cultural adaptation of language to the different constraints imposed upon it by transmission (Kirby, Smith, & Cornish, 2007). These constraints can take a variety of forms, but the effect is largely similar; language adapts to become more easily learnable and transmittable by our brains rather than the other way around. Previous work exploring this idea has made extensive use of computational simulation (e.g. Kirby and Hurford (2002)). Models have shown it is possible for language to evolve culturally in populations of artificial agents as predicted, and furthermore, that the resultant systems exhibit some key universal features of human language. This lends strong support to the idea that the mechanism of cultural transmission plays a very powerful role in the evolution of language. In spite of this however, little is known about how such processes work in actual human populations. A simple question is therefore this: can the kinds of cultural adaptations seen in these models be observed in human populations in the laboratory? The development of experimental studies to explore aspects of language evolution is a fairly recent phenomenon, with work such as Fay, Garrod, MacLeod, Lee, and Oberlander (2004), Galuntucci (2007), and Selten and Warglien (2007) being examples. In spite of their many differences, one thing that all three of these approaches have in common is the fact that they rely on their subjects consciously negotiating a system of communication. Although the resultant systems show signs of cultural adaptation, they are clearly constructed devices. To illustrate, Selten and Warglien (2007) explicitly instruct participants to create a communication system with a partner, and that different symbols at their disposal in creating such a system have explicit costs which they should minimize. The languages that emerge are therefore the product of careful design on the part of the participants involved. Is this a good model for language? 409
41 0
Keller (1994) would argue not. As he sees it, much of what constitutes human language results from an ‘invisible hand’ process - whilst language change does have its origins in the actions of speakers, no single individual ‘decides’ to modify the language in order to effect an improvement. At the same time, this need not imply that all change is simply random drift. It is a defining characteristic of an invisible hand process that the end result is adaptive: we see the appearance of design without a designer. Bearing this in mind, this paper asks a second question. Previous experimental work already mentioned shows cultural adaptation of language can come about through intentional acts, but can it also come about through the unintentional actions of individuals? In order to address this, an alternative experimental framework is presented (Cornish, 2006) which confirms the central findings to have emerged from the computational literature. Participants are trained on a subset of an (initially unstructured) ‘alien’ language and then tested. A sample of the output of generation n is then given as training input to generation n + l , and the process iterates. Even when subjects are only exposed to half the language during training we still see gradual cumulative cultural adaptation leading to the emergence of an intergenerationally stable system. By simply changing the constraints on transmission slightly, we see different types of structure emerge, such as compositionality. Significantly, this is achieved without intentional design on the part of the participants.
References Cornish, H. (2006). Iterated learning with human subjects: an empiricalfiamework for the emergence and cultural transmission of language. Unpublished master’s thesis, MSc Thesis, The University of Edinburgh. Fay, N., Garrod, S., MacLeod, T., Lee, J., & Oberlander, J. (2004). Design, adaptation and convention: the emergence of higher order graphical representations. In Proceedings of the 26th annual conference of the cognitive science society (pp. 41 1-416). Galuntucci, B. (2007). An experimental study of the emergence of human communication systems. Cognitive Science, 29,737-767. Keller, R. (1994). On language change: The invisible hand in language. London: Routledge. Kirby, S., & Hurford, J. (2002). The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the evolution of language (p. 121-148). London: Springer Verlag. Kirby, S., Smith, K., & Cornish, H. (2007). Language, learning and cultural evolution: how linguistic transmission leads to cumulative adaptation. (Forthcoming) Selten, R., & Warglien, M. (2007). The emergence of simple languages in an experimental co-ordination game. PNAS, 104(18),7361-7366.
THE SYNTAX OF COORDINATION AND THE EVOLUTION OF SYNTAX
WAYNE COWART DANA MCDANIEL Linguistics Department, University of Southern Maine Portland, Maine, USA, 04104
Our purpose is to articulate and explore a possible connection between the syntactic theory of coordination and the theory of language evolution. The asymmetric functor-argument relation central to Merge (Chomsky, 1995) has come to be widely regarded as the foundational relationship in syntactic theory. Moreover, the recursive system based on Merge has been proposed as the sole uniquely human component of the human linguistic system, what Hauser, Chomsky, and Fitch (2002) term FLN - Faculty of Language - Narrow Sense. With these developments in view, the apparent symmetry of coordinate structures seems increasingly anomalous. Here we suggest that progress may be possible by reexamining what we term the Homogeneity Thesis - the widely accepted presumption that coordinate structures arise within the same general framework of syntactic structure as organizes prototypical subordinating structures. We review evidence suggesting that the Homogeneity Thesis is in fact false and propose that, by rejecting it, it may be possible to formulate a more plausible model of the evolution of the modern human linguistic system. Among several relevant lines of evidence, we report recent experimental evidence from English contrasting attraction-like effects (Bock, Eberhard, Cutting, Meyer, & Schriefers, 2001); (Eberhard, Cutting, & Bock, 2005) with complex coordinate and subordinate NP subjects. The materials were structured as in (1).
{
{ {
a newspaper ~ ~ ~ ~ ~ oI:d}o , some , ) newspapers
}{rre}
on the desk.
(1)
We compared grammatically illicit effects on judged acceptability that could be traced to the second NP, which was always at the right edge of either a coordinate or subordinate complex NP. As expected, the results showed strong, reliable differences in pattern between coordinate and subordinate forms, F1( 1,47) = 8.37, p <.01, F2(1,47) = 11.6,p <.001. However, the difference in pattern incorporated a number of statistically reliable features that are not explainable in terms of the differences in canonical grammaticality. Specifically, strong attraction-like patterns 41 1
41 2
evident in the subordinate cases are explainable in detail in terms of syntactic and morphosyntactic properties, while the coordinate structures showed no evidence that the structural position or morphological details of material in the coordinates affected judgments. In particular, the locality-based attraction-like effects seen in the subordinate cases were absent, and were replaced by a pattern suggesting that participants were aggregating the number of singular-marked nouns across the coordinate structures as a whole, without regard to linear position. It appears that our respondents took a radically different approach to assessing agreement relations in coordinates and subordinates, applying a strongly structure-based approach to the latter and a conceptual approach to the former. These and other effects in English and several other languages are discussed in relation to the Homogeneity Thesis, which we conclude does not comport well with these findings. With these results in view we explore the consequences of rejecting the Homogeneity Thesis and holding instead that some mechanism quite apart from the hierarchical syntax must be engaged to deal with coordinate structures, somehow supplementing the work of the hierarchical mechanism. We propose that if there is such a division of labor, the coordinating mechanism is the evolutionarily earlier mechanism and the one most likely to have derived from a mechanism shared with other primates. We propose that its initial role was to make possible conjoined use of members of a preexisting fixed set of holistic utterance types, perhaps similar to what’s been described in vervets. The force of these conjoined uses would be no more than to assert that each was somehow simultaneously relevant in the context of utterance. Use of this mechanism would, however, create a cognitive environment that would advantage the emergence of more word-like subpropositional units. The set-like logic of this mechanism would allow for aggregating these labels into lists or sets of names for individuals, categories, etc. In an enriched cognitive environment of this sort Merge could provide the means to specify relations that could only be hinted at with the set-like mechanism. The modern contributions of these two mechanisms are intricately intertwined, but perhaps nevertheless distinguishable.
References Bock, K., Eberhard, K., Cutting, J. C., Meyer, A. S., & Schriefers, H. (2001). Some attractions of verb agreement. Cognitive Psychology, 43(2), 83- 128. Chomsky, N. (1995). The minimalist program. Massachusetts Institute of Technology. Eberhard, K. M., Cutting, J. C., & Bock, K. (2005). Making syntax of sense: Number agreement in sentence production. Psychological Review, 112(3), 531-559. Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298(5598), 15691579.
THE ARCHAEOLOGY OF LANGUAGE ORIGIN
FRANCESCO D'ERRICO Institut de Pre'histoire et de Ge'ologie du Quaternaire, CNRS UMR 5199 PACEA, Universite' Bordeaux I , avenue des Facultis, F-33405 Talence. France E-mail: f . d e r r i c o i c r : i i , ~ q . u - b o ~ ~1 .fr i~~ti~ Department ofAnthropology, The George Washington Universiw, 21 I 0 G Street, NW. Washington, DC 20052, USA
Archaeologists have traditionally considered the emergence of innovations such as language, use of symbols, art, religious thought as the result of a sudden change, taking place 50-40 ky ago. This model, known as the Human Revolution scenario, has been gradually replaced in the last decade by a new paradigm, called the Out of Africa scenario. This new scenario, which tends to equate the biological origin of our species with the origin of modern cognition, an be summarized as follows: present day variation in Mitocondrial DNA and Y chromosome suggests our species comes from Africa. The process that produced our species in Africa must have granted it a number of advantages - syntactical language, advanced cognition, symbolic thinking - that favored its spread throughout the world, determined its eventual evolutionary success, and led to the extinction of pre-modern human populations with little or no biological contribution and, if any, little and unbalanced cultural interaction. Underlying the Out of Africa model for the origin of modern behavior is the view that the emergence of each of these new features marked a definite and settled threshold in the history of mankind and that the accumulation of these innovations contributed, as with genetic mutations, to create human societies increasingly different from those of their non-modern contemporary counterparts. Postulating that these advantages were determined by a biological change logically leads to the somewhat paradoxical conclusion that archaeology does not inform us as to the origin of modern behavior and language. Populations will be considered smart, eloquent and symbolic according to their taxonomic status or the age of the archaeological site and not on the basis of the material culture they have left behind. We argue that to avoid this pitfall archaeologist should adopt a large 41 3
414
scale comparative approach. Documenting and dating the occurrence of these innovations in various regions of the world including Eurasia, the alleged realm of pre-modern populations, may reveal their presence at times and places incompatible with the Out of Africa model. It may also show a discontinuous pattern with innovations appearing and disappearing or being associated in a way that does not match the expected trend. Current evidence on the earliest burial practices as well as pigment and bead use does not to support the Human Revolution nor the Out-of-Africa models for the emergence of modern behavior. Burials, pigment use and beads use clearly predate the arrival of AMH in Europe and the 50-40 kyr year old rapid neural mutation that would have qualitatively changed, according to some authors, human cognition. On the other hand, no continuity is observed in beadworking traditions after their first occurrence in the archeological record, ca 100-75 kya. The production and use of pigments and, at the end of their evolutionary trajectory, a varied repertoire of personal ornaments by Neanderthals contradicts both models since it demonstrates that these alleged hallmarks of modernity was perfectly accessible to other fossil species. We argue that the failure of the Revolution and the Out-of-Africa models to account for the origin of these innovations is due to the fact that they both directly link this phenomenon to the biological origin of our species. The alternative view is that the genetic and that the cognitive prerequisites of modern human behavior and language were in place earlier in time among both “archaic” and “modern” populations and we have to evoke historical contingencies triggered by climatic and demographic factors rather than a single speciation event or a mutation to explain the emergence, disappearance, and re-emergence in the archeological records of beadworking traditions, as well as other hallmarks of modernity. This is consistent with recently brought to light genetic evidence indicating that a critical gene known to underlie speech, called FOXP2, was present in the Neandertal genome and that its appearance predates the common ancestor, which existed about 300-400 kya, of modern human and Neandertal populations.
THE JOY OF SACS BART DE BOER Amsterdam Centerfor Language and Communication, Universiteit van Amsterdam Spuistraat 210, 1012 VT, Amsterdam, the Netherlands 1. Introduction
This paper investigates an idea that was put forward (and hinted at in Fitch, 2000) by Tecumseh Fitch at the Cradle of Language conference in Stellenbosch, South Africa. The idea is that air sacs may have played an important role in early hominid vocalizations. Many other primates have air sacs, notably chimpanzees, gorillas and orangutans. It is therefore likely that our latest common ancestor also had air sacs, and the shape of a recently discovered Australopithecine hyoid bone (Alemseged el al., 2006) also points in this direction. Many functions have been proposed for air sacs, among them resonance chambers, sound radiators, COz buffers to prevent hyperventilation and means to help exaggerate size. Whatever their function in other primatcs, the fact that humans are the only apes that do not have air sacs might be related to the fact that we have speech. Here I investigate the influence of the presence of an air sac on the set of (vowel) signals that can be produced. 2. Preliminary results
An articulatory model (based on Mermelstein, 1973) of the female vocal tract was extended with an air sac (figure 1 left). As all existing articulatory models were designed to model humans, no efforts were ever made to model air sacs. The model is therefore still under development. The present model consists of a simple side tube, with anatomically plausible dimensions, attached above the larynx of the standard articulatory model. A more sophisticated model, based partly on models of bird calls (Fletcher et al., 2004) is under development. Both models (with and without air sacs) produced 10 000 random articulations, and the first and second formants of these articulations were measured. These are presented in the right part of figure 1. A logarithmic scale 415
41 6
r , w Figure 1. Right the vocal tract model Left: comparison of articulatory range with (grey plus signs) and without (black circles) air sac The discreteness of the air sac data is an artifact of the accuracy with which formants were measured (10 Hz)
is used, as this corresponds to humans’ logarithmic perception of frequencies. It can be observed that the articulations made with the model with an air sac have lower formant frequencies and cover a smaller area of acoustic space than those of the model without an air sac (p=O.151, o=0.006 versus ~ ~ 1 0 . 3 2 4 , o=0.009). 3. Discussion
The observed lowering of formant frequencies supports the theory that air sacs help to exaggerate size, an evolutionary useful function (although whether low formants and air sacs really play a role in primate mate selection appears to be not yet known.) However, the price is a reduced ability to produce distinctive speech. Although communication is possible with a reduced repertoire of speech sounds, it is an intriguing possibility that humans lost air sacs because of speech. Acknowledgement
The research is part of the NWO vidi project Modelling the Evolution of Speech. References Alemseged, Z., Spoor, F., Kimbel, W. H., Bobe, R., Geraads, D., Reed, D., et al. (2006). A juvenile early hominin skeleton from dikika, ethiopia. Nature, 443(7109), 296-301 Fitch, W. T. (2000). The evolution of speech: A comparative review. Trends in cognitive sciences, 4(7), 258-267. Fletcher, N. H., Riede, T., Beckers, G. J. L., & Suthers, R. A. (2004). Vocal tract filtering and the “coo” of doves. Journal of the AcousticalSociety ofAmerica, 116(6), 3750-3756. Mermelstein, P. (1973). Articulatory model for the study of speech production. Journal of the Acoustical Society ofAmerica, 53(4), 1070-1082.
HOW COMPLEX SYNTAX COULD BE MIKE DOWMAN General Systems Studies, University of Tokyo, 3-8-1 Komaba Meguro-ku, Tokyo, 153-8902, Japan
Syntax is so complex that linguists have not so far managed to determine the syntactic system of any language, but children learn syntax unproblematically. While some degree of syntactic complexity is certainly an essential part of a communication system as expressive as human language, it is hard to motivate any selective advantage obtained through some of the more irregular and idiosyncratic aspects of language. We show that if large parts of the human genome came to influence grammaticality patterns in syntactic structures through a process such as exaptation, then the syntax of human languages would be learned easily by children, but would be resistant to analysis using conventional linguistic techniques. We suggest that this scenario is well supported by currently available evidence concerning the nature of modern syntactic systems, and so conclude that it is very unlikely that it will ever be possible to write a complete generative grammar for any language. The aim of generative grammar is to produce a formal system that can distinguish the grammatical sentences of one or more languages from the ungrammatical, and perhaps also to account for how the meaning of each sentence is derived from its constituents. However, as inquiry into syntax has proceeded, a seemingly endless stream of more and more syntactic phenomena that are in need of an explanation has been identified. It seems to be possible to produce generative grammars that account for any particular body of linguistic data, but these grammars only go a small way towards accounting for the full range of known syntactic phenomena in any language. This leads us to wonder why children can do syntax when linguists cannot. The key advantage that children have is that they only need to infer the learned component of syntax, while linguists must also infer the innate component. This applies regardless of whether the learned component is large (learning) or small (parameter setting). Both the learned and innate components of language must contain some quantity of information, as must the data available to language learners and syntacticians. (Information is meant in the sense in which it is used within information theory.) The learned component of languages can contain no more information than is in the data available to children, or languages would be 41 7
418
unlearnable, but no such constrain applies to the innate component. If both the learned and innate components of language were small, then generative grammar would be easy, so it seems that at least one component must be large. Nativist linguists suggest that the learned component of language is small, but the innate component large, while those linguists who assume that language is primarily learned suppose the opposite. While linguists can use sources of data that are not typically available to child learners, such as grammaticality judgments and data from multiple languages, if the writing of generative grammars is to be possible, this data must contain at least as much information as is in the innate and learned components put together. The complexity of the innate component is determined by the human genome, which contains many billions of bits of information. It is unclear how many of these affect grammaticality patterns in syntax, but any parts of the genome that find expression in the brain could potentially affect the grammaticality of at least some sentences. Therefore, even if only a small proportion (say 1%) of the human genome plays a role in the formation of the human capacity for syntax, we could expect the innate component of language to be hugely complex. This would not reduce the fitness of language learners, as they inherit this component of language from their parents, but it would make the task of the grammar writer practically impossible, as the innate component of language would contain far more information than could be obtained using any reasonable number of grammaticality judgments. (Note that binary grammaticality judgments can on average convey no more than one bit of information.) We provide a concrete illustration of our argument using a computer model of language learners and linguists. When the learned and innate components were both 1,000 bits in size, learner agents rapidly obtained the correct learned component while linguist agents needed only a little more data to infer both the innate and learned components. When the innate component was 1,000,000 bits in size, the learner agents learned the language as rapidly as before, but the linguist agents were unable to identify the correct innate component. While the grammars produced by the linguist agents correctly characterized the available data, they failed dramatically when tested on novel sentences. This analysis suggests that it is very unlikely that we will ever be able to obtain a generative grammar for any language through conventional linguistic methodologies. A generative grammar for some restricted aspects of language is therefore not a stepping stone to a complete generative grammar, leading us to question whether the study of rare and obscure constructions is really the best way to do syntax. At the very least, this work reduces the assumption that languages can be characterized with generative grammars to the status of a hypothesis.
THE MULTIPLE STAGES OF PROTOLANGUAGE MIKE DOWMAN General Systems Studies, University of Tokyo, 3-8-1 Komaba Meguro-ku, Tokyo, 153-8902, Japan
An ongoing debate concerns whether the words in protolanguages expressed single atomic concepts (Bickerton, 1990, Tallerman, 2007), or whether they were holophrastic (Wray, 1998; Arbib, 2005). Here we suggest that there is no clear distinction between holophrastic and atomic meanings, as there is no clear definition of what level of conceptualization is atomic. We show that there is a continuum between holophrastic words and words denoting single concepts, depending on how narrow a range of meanings each word denotes. Using a computer model, we show that the type of words occurring in protolanguages could have changed over time, and that protolanguages could have contained a mixture of words of differing degrees of holophrasticity. We must therefore take into account these alternative possibilities when considering the nature of protolanguage. Holophrastic words convey complex meanings comprised of several constituent concepts, while words in modern languages are said to express single concepts. However, when we compare different languages we often find that words for some domain have much narrower and more specific denotations in one language than in another. For example, while in English we have the word brother, Japanese has separate words for younger brother (otouto) and older brother ( m i ) , while German has a single word meaning brother or sister (geschwister). This suggests that the English and Japanese words are in fact multi-concept holophrases (MALE-SIBLING and YOUNGEWOLDER-MALESIBLING respectively). A similar situation is seen within languages when one word expresses a more specific meaning than another. Consider for example English die, kill, murder and strangle, where each successive word conveys somewhat more information. Is strangle therefore a holophrase for ‘Illegally cause to die by choking’, or are both DIE and STRANGLE atomic concepts with overlapping denotations? Furthermore, some of the holophrases that have 41 9
420
been proposed seem to convey far more concepts than others. Compare ‘Give her that’ (Wray, 1998, p56) with ‘Take your spear and go around the other side of that animal and we will have a better chance together of being able to kill it’ (Arbib, 2005, pl18-119). Here we suggest that holophrastic words and words that appear to denote atomic concepts are simply arbitrary points on a continuum regarding the generality or specificity of denotation. Our argument was backed up with a computer model containing language agents that had the capacity to learn and use words, but which had no syntactic competence, hence restricting them to the use of asyntactic protolanguages. Gradual phylogenetic changes in the agents’ communicative and conceptual abilities were simulated, and we observed the effect of these changes on the languages used by the agents. It was found that increasing the agents’ communicative abilities resulted in more words with increasingly holophrastic meanings, as the greater number of words allowed for a situation in which each denoted a narrower range of meanings. In contrast, increasing the number of different meanings that the agents tried to communicate produced protolanguages in which the words had increasingly general denotations, as there were now so many meanings that each word had to express more of them. When both communicative and conceptual abilities grew in tandem, the languages became more or less holophrastic depending on the relative rate of growth of each capacity. Hence, if these abilities grew at different rates during the course of human evolution, we could expect the degree to which protolanguages were holophrastic to have both increased and decreased over time. There does not seem to be any good reason to assume that protolanguages were ever completely holophrastic, or that all their words ever expressed a single atomic concept. Protolanguages may even have gone through stages when their words were even more general than those in modern day languages, or when they expressed even more than a whole sentence. References
Arbib, M. A. (2005). From monkey-like action recognition to human language: An evolutionary framework for neurolinguistics. Behavioural and Brain Sciences, 28, 105-167. Bickerton, D. (1990). Language and Species. Chicago: University of Chicago Press. Tallerman, M. (2007) Did our ancestors speak a holistic protolanguage? Lingua, I 17,579-604. Wray, A. (1998). Protolanguage as a holistic system for social interaction. Language and Communication, 18,47-67.
A HUMAN MODEL OF COLOR TERM EVOLUTION MIKE DOWMAN General Systems Studies, University of Tokyo, 3-8-1 Komaba Meguro-ku, Tokyo, 153-8902, Japan JING XU, THOMAS L. GRIFFITHS Department of Psychology, University of Calgornia, Berkeley, 3210 Tolman Hall, Berkeley, CA 94720, USA
When Berlin and Kay (1969) identified striking typological patterns in the denotations of basic color terms, they suggested that they arose through a process of cultural evolution. We explored the role of cultural evolution in the development of color term systems in a large scale study using 195 human participants. Reflecting computer simulations of cultural evolution by “iterated learning” (Steels and Belpaeme, 2005; Dowman, 2007), color term systems were passed along a chain of people, who each tried to learn the color term system used by the previous person. We sought to investigate how the systems would be transformed by this process, and to what extent individual learners would shape the categories in accordance with their own prior expectations. Our results show clearly that, as color terms evolve, their denotations are transformed by the people who learn them, so that color term systems are products both of the psychological biases of the individual learners, and of the process by which language is transmitted from generation to generation. For most languages we only have information about their current state, rather than a record of how they have changed over time. Proposals about how color term systems evolve are therefore based mainly on extrapolation from the range of color term systems observed in the world today. Recently this has been complemented by computer models that have investigated how color term systems evolve when passed along a chain of computational agents (Steels and Belpaeme, 2005; Dowman, 2007). Each agent in these models was capable both of learning color words and of using the words they had learned when speaking to another agent. These models allowed the consequences of the social transmission of color vocabulary via iterated learning to be studied, but it was necessary to make assumptions about how people learn and represent color words. By using human participants in place of computational agents, we removed the need to make such assumptions, as this change replaced artificial 42 1
422
learning and representation mechanisms with human ones. Using this same methodology, Kalish, Griffiths, and Lewandowsky (2007) revealed strong prior biases concerning function mappings, but the methodology has not previously been applied to color language. In our experiments, we told the participants that they would be learning the color term system of a language unrelated to English. We then showed them made up words on a computer screen, together with a series of randomly selected examples of colors that could be named by each word. After training, we asked participants to name each of the 330 color chips in the standard World Color Survey Munsell array, using one of the words given in training. Examples were then randomly selected from these responses, and used as training data from which the next participant could try to reconstruct the color categories in the language. This process was repeated over 13 generations of learners. We conducted experiments in which there were either 2, 3, 4, 5 or 6 basic color terms, and in which the color term system taught to the initial learner divided the color space up either on the basis of hue, or of lightness, or was simply completely random. The participants quickly imposed structure on the random color term systems by naming a relatively coherent range of colors with each term during the testing phase. Otherwise, the color term systems usually evolved gradually, but at some points participants would impose radically new categorizations. Systems with 2 or 3 color terms tended to alternate between dividing the color space primarily on the basis of hue or lightness. In systems with 4, 5 or 6 words, categories emerged that were based on both hue and lightness, as is the case with color terms in naturally occurring languages. Therefore, while the color term systems were based on the input received by language learners, unnatural systems were restructured to reflect participants’ preferences for some kinds of category structure over others. As the experiment progressed, the color term systems increasingly came to reflect those seen in naturally occurring languages, suggesting that the structure of color term systems is largely the product of people’s learning biases, brought to the surface through the process of cultural transmission. References Berlin, B., & Kay, P. (1969). Basic color terms. Berkeley: University of California Press. Dowman, M. (2007). Explaining Color Term Typology with an Evolutionary Model. Cognitive Science, 31(1), 99-1 32. Kalish, M. L., Griffiths, T. L., & Lewandowsky, S. (2007). Iterated learning: Intergenerational knowledge transmission reveals inductive biases. Psychonomic Bulletin and Review, 14, 288-294. Steels, L. and Belpaeme, T. (2005). Coordinating Perceptually Grounded Categories through Language. A Case Study for Color. Behavioral and Brain Sciences, 28(4), 469-489.
EVOLUTION OF SONG CULTURE IN THE ZEBRA FINCH OLGA FEHER Biology Department, The City College ofNew York, New York, NY, 10031, USA PARTHA P. MITRA Cold Spring Harbor Laboratory, Cold Spring Harbor, New York, 11 724, USA KAZUTOSHI SASAHARA Laboratory for Biolinguistics, RIKEN Brain Science Institute. Japan
OFER TCHERNICHOVSKI Biology Department, The City College of New York, New York, NY, 10031, USA
Similar to humans, juvenile songbirds learn their vocal repertoire by imitating adult individuals. When raised in social and acoustic isolation, birds can still sing, but they produce a highly abnormal song. What happens when such an abnormal song is culturally transmitted over generations? To examine this question we placed an isolate adult bird in a large sound box together with females (who do not sing) and allowed them to breed for a few generations, while recording audio and video so as to track the social interactions and singing behavior. We found that the juveniles readily imitated their father’s isolate song, and yet, small but systematic variations in vocal performance accumulated over generations such that the third generation of learners already sang normal zebra finch song. We investigated this cultural evolution process. Although birdsong is fundamentally different from human language, there are some parallels between early human speech acquisition and the song development of birds (Doupe & Kuhl, 1999). For example, just as human speech develops hierarchically from phonemes to words and then sentences via babbling, birdsong develops similarly from notes that make up motifs that, in turn, make up bouts. In addition, in both humans and songbirds innate biases and constraints interact with environmental influences to produce the final vocal output. In many birdsong species, the songs exhibit “dialects” that vary geographically. Strikingly, on a large geographical scale, there is convergence of dialects within a species. This effect, called song universals, was qualitatively documented by Marler and Tamura (1962), and it is thought to result from innate perceptual biases. Our experiment is akin to creolization, a process whereby children who learn a pidgin language as a first language change it so that it becomes a complex natural language within a few generations. Our 423
424 experimental setup allowed us to observe social interactions that might drive cultural evolution from moment to moment and across generations. We were able to plot developmental trajectories and study the gradual vocal changes made to the newly emerging, complex song. The vocal output of the isolated colony was recorded continuously and the adult songs were compared to the original founder’s and other learners’ songs. Although the only available model in our colony was highly abnormal, the young birds readily imitated it. Most of our birds sang songs that resembled most closely their father’s song, but when there were male siblings in a clutch, their songs tended to diverge (Tchernichovski & Nottebohm, 1998) and resulted in greater differences from the father’s and the founder’s song. The differences were either deviation in acoustic structure from the tutor syllables or (more rarely) the emergence of new syllable types. When young birds added novel syllable types to their song, other juveniles incorporated these into their songs. More interestingly, the changes in acoustic structure appeared to be directional and gradual, when observed over generations. For example, abnormal features of the isolate syllables (e.g. very long durations) turned gradually more normal (shorter) over generations. Within three generations, wide-band, noisy syllables and abnormally long, frequency modulated harmonics were eliminated or changed acoustically to resemble normal song elements. With every group of young learners this trend continued. By the seventh clutch, the song was indistinguishable from normal zebra finch song. We repeated the experiment in an impoverished social setting, where we trained young birds by isolate adults one on one, and then used them to tutor juveniles and so on. We have attained similar results. The first generation made Fertain changes to the song which became amplified with new generations of learners. For example, note durations in the songs of first generation learners were more similar to those of normal zebra finch songs than isolate songs, and by the third generation, they were in the normal range. Thus, complex social interactions were not necessary to jumpstart cultural evolution, but they may have influenced its course and speed. In this paradigm, we can observe cultural evolution in real time, and see what innate and social biases act on the evolution and preservation of a local dialect. References Doupe, A.J. and Kuhl, P.K. (1999) Birdsong and Human Speech: Common Themes and Mechanisms. Annu. Rev. Neurosci., 13, 561-63 1. Marler P. and M. Tamura. 1962. Song “dialects” in three populations of White-crowned Sparrows. Condor., 64, 368-377. Tchernichovski, 0. and Nottebohm, F. (1998). Social inhibition of song imitation among sibling male zebra finches. PNAS, 95(15), 895 1-8956.
ITERATED LANGUAGE LEARNING IN CHILDREN MOLLY FLAHERTY Max Planck Child Study Centre, University of Manchester, Coupland I , Oxford Road Manchester, M13 9PL, UK SIMON KIRBY Language Evolution and Computation Research Unit,Department of Linguistics University of Edinburgh, 40 George Square Edinburgh, EH8 9LL, UK
Due to the development of several new methodologies, it has recently become possible to investigate precisely the contribution of cultural evolution to the origins of linguistic structure. For example, the Iterated Learning Model (Kirby & Hurford, 2002) provides a framework for investigating the role of repeated cultural learning in the emergence of language. It was originally used in computer simulations, but has recently been successfully adapted for use with adult human participants (Cornish, 2006). As the differing roles of adults and children in language evolution and language change are the subject of much debate and research (i.e. Aitchison, 1996; Bickerton, 1984), adapting this promising framework to work with children is the next logical step. In this study, we sought to modify Cornish’s (2006) procedure for use with human children. Twenty children (mean age 7;0), composing two “families” or “diffusion chains”, took part in the iterated learning study. The initial child in each family received a randomly generated artificial language as input and was required to learn the 9-word language (“Dragonese”) as best as s h e could. The child was given several presentations of the words in the language and their meanings. The meaning space was made up of 9 colored blocks differing along two dimensions: shape (cone, cube, cylinder) and color (green, yellow, blue). The output produced by that child during the test phase was then provided to the next child as herhis input. A comparison group of adults (two families of ten adults each) was also run on the same paradigm. 425
426
In both the adult and child families the languages became smaller (containing fewer words than the initial nine word languages due to increased homophony) and more learnable after repeated transmission demonstrating that language was adapting through cultural evolution. Several structured languages closely resembling those in Cornish’s adult families did arise among the adults in this study. However, the way in which these systems arose was quite different in the present work. The structure in languages in the present study was generated by sudden individual innovation, not slowly and steadily as it had been in Cornish’s study. Strikingly, no significantly structured languages arose in the child families. Furthermore, most children and adults appeared to ignore the meanings in the language, attempting instead just to retain the list of words. Consideration of the surprising results in this study is essential for further investigation of the iterated learning model in human participants. This study sheds light on what the most important aspects of the procedure are in creating a workable framework. First, the meanings used in such experiments should be maximally different and likely to be of interest, and therefore retained, by participants. Additionally, spaced learning sessions of the language, as typical in artificial language learning work (i.e. Hudson Kam & Newport, 2005), may be a necessity in working with child learners in the iterated learning model. Furthermore, an experiment with a slimmed down procedure, with no extra changes in modality, would likely allow children (and adults) to more easily focus on the task at hand and would likely yield the most interesting results. References
Aitchison, J. (1996). Small steps or large leaps? Undergeneralization and overgeneralization in creole acquisition. In H. Wekker (Ed.), Creole Languages and Language Acquisition (pp. 9-3 1). Berlin: Mouton de Gruyter. Bickerton, D. (1984). The language bioprogram hypothesis. Behavioral and Brain Sciences, 7(2), 173-22 1. Cornish, H. (2006). Iterated learning with human subjects: an empirical framework for the emergence and cultural transmission of language. University of Edinburgh, Edinburgh. Hudson Kam, C. L., & Newport, E. L. (2005). Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development, 1(2), 151-195. Kirby, S., & Hurford, J. R. (2002). The emergence of linguistic structure: An overview of the iterated learning model. In A. Cangelosi & D. Parisi (Eds.), Simulating the Evolution of Language (pp. 121-148): Springer-Verlag.
GESTURE, SPEECH, AND LANGUAGE SUSAN GOLDIN-MEADOW University of Chicago
In all cultures in which hearing is possible language has become the province of speech (the oral modality) and not gesture (the manual modality). Why? This question is particularly baffling given that humans are equipotential with respect to language-learning -that is, if exposed to language in the manual modality, children will learn that signed language as quickly and effortlessly as they learn a spoken language. Thus, on the ontogenetic time scale, humans can, without retooling, acquire language in either the manual or the oral modality. Why then, on an evolutionary time scale, has the oral modality become the channel of choice for languages around the globe? One might guess that the oral modality triumphed over the manual modality simply because it is so good at encoding messages in the segmented and combinatorial form that human languages have come to assume. But this is not the case - the manual modality is just as good as the oral modality at segmented and combinatorial encoding. There is thus little to choose between sign and speech on these grounds. However, language serves another important function - it conveys mimetic information. The oral modality is not well suited to this function, but the manual modality excels at it. Indeed, the manual modality has taken over this role (in the form of spontaneous gestures that accompany speech) in all cultures. It is possible, then, that the oral modality assumed the segmented and combinatorial code, not because of its strengths but to compensate for its weaknesses. This argument rests on several assumptions. The first is that the manual modality is as adept as the oral modality at segmented and combinatorial encoding. The fact that sign languages of the deaf take on the fundamental structural properties found in spoken language supports this assumption. Even more striking, however, is 427
428 the fact that deaf children not exposed to sign language can invent a gestural language that also has the fundamental structural properties of spoken language. I begin by describing data on these homemade gestural systems. The second assumption is that mimetic encoding is an important aspect of human communication, well served by the manual modality. I present data on the gestures that accompany speech in hearing individuals, focusing on the robustness of the phenomenon (e.g., the fact that congenitally blind children gesture even though they have never seen anyone gesture) and the centrality of gesturing to human thought. I end with a brief discussion of the advantages of having a language system that contains both a mimetic and a segmentedlcombinatorial code, and of the role that gesture might have played in linguistic evolution. References
Goldin-Meadow, S. (2003) Hearing gesture: How our hands help us think. Cambridge, MA.: Harvard University Press. Goldin-Meadow, S. (2003) The resilience of language: What gesture creation in deaf children can tell us about how all children learn language. N.Y.: Psychology Press. Goldin-Meadow, S. & McNeill, D. (1999) The role of gesture and mimetic representation in making language the province of speech. In Michael C. Corballis & Stephen Lea (eds.), The descent of mind (pp. 155-172). Oxford: Oxford University Press. Goldin-Meadow, S., McNeill, D., & Singleton, J. Silence is liberating: Removing the handcuffs on grammatical expression in the manual modality. Psychological Review, 1996, 103, 34-55.
INTRODUCING THE UNITS AND LEVELS OF EVOLUTION DEBATE INTO EVOLUTIONARY LINGUISTICS NATHALIE GONTIER Konrad Lorenz Institute for Evolution and Cognition Research, Altenberg, Austria, [email protected]. be
Evolutionary linguistics is a fast rising academic field where scholars working from within a variety of disciplines are providing relevant data to study the origin of language. Mirror neurons (Fadiga et al. 2000; Rizzolatti et al. 1996), the FOXPZ gene (Lai. et al. 2001, 2003), the brain genes ASPM and MCPH I (Mekel-Bobrov et al. 2005; Evans et al. 2005), pointing (Tomasello. 2003, 2004), but also the rise of Nicaraguan sign language (Senghas, 1995, 2003, 2004), the idea of a protolanguage (Bickerton, 1984, 1990, 2007), the rise of stone tools that indicate the origin of right handedness (Steele & Uomini 2006) etc. are all relevant to the field of evolutionary linguistics. The important question that needs to be raised however is in what sense these different research topics are relevant to study the evolutionary emergence of language. Protolanguage for example is argued to be a “necessary prerequisite” for the rise of fully syntactic language. The FOXP2 gene is argued to be a “necessary prerequisite” for the origin of comprehensible speech. Certain tool types (especially Upper Palaeolithic tools) and personal ornaments are argued to be “proxies” of language (Vanhaeren & d’Errico, 2006): they do not cause the origin of language but demonstrate the presence of symbolism, communication and language in the hominins that produced these tools and ornaments. Homesigns, child language or pidgin languages are argued to be “windows” on language: the current development of the Nicaraguan sign language did not cause the origin of language in the past but provides a window on how the origin of language might have occurred. Especially Botha (Botha, R. 2006a, 2006b, 2007) has carefully analysed several putative windows on language evolution. Events such as the lowering of the larynx, the human-specific mutations and subsequent selection of the FOXPZ gene, mirror-neurons etc. might however not only provide a window on the rise of language, but might indicate the actual cause of the rise of certain aspects of language. As such, these elements are, in parallel with the units and levels of evolution debate in evolutionary biology, best called units of language evolution. Units of language evolution differ from 429
430
windows on language evolution because the former are the actual evolving elements of language. These units of language can be the subject of evolution at several levels. The FOXPZ gene for example can be positively selected (by natural selection) at the level of the genes, while pointing can be selected at the level of the cultural environment via the ratchet effect (Tomasello, 2004). In other words, rather than studying the evolution of language as a single entity and a single evolutionary event, the evolution of language is a scientific problem that can be analysed from within several distinct research routes. One can examine the units of language evolution, the levels of language evolution, and the different evolutionary mechanisms according to which these units evolve (e.g. natural selection, the ratchet effect, .. .). The biologically informed scholar must have noticed by now that I am using evolutionary biological and evolutionary epistemological (Gontier, 2006a, 2006b) jargon, typical for the units and levels of selection debate (e.g. Brandon & Burian,1984), to analyse the origin of language. Evolutionary biology has progressed immensely ever since it started to analyse the evolution of life according to the units and levels of selection debate. Since the origin of language also marks an evolutionary event, evolutionary linguistics too would benefit enormously by adapting the jargon typical of the units and levels of evolution debate. Introducing the units and levels of evolution debate into evolutionary linguistics implies asking the following five methodological questions: 1 : How many units of language evolution are there? 2: How many levels of language evolution are there? 3: Are all these different units and levels equally necessary in order for language to evolve? 4: How do these different units interact, how do these different levels interact and how do the units and levels in turn interact with each other? 5: Are the evolutionary mechanisms that cause the emergence of these different units and levels the same? If not, what kind of evolutionary mechanisms can be distinguished that lie at their origin? The introduction of the units and levels of evolution debate into evolutionary linguistics will lead to new research avenues and will help to structure and synthesize the available data.
WHAT CAN THE STUDY OF HANDEDNESS IN NONHUMAN APES TELL US ABOUT THE EVOLUTION OF LANGUAGE? REBECCA HARRISON Department ofArchaeology, Northgate House, West Street, University of Shefield, Shefield SI 4ET, UK
There is considerable debate over the origin and evolution of human language. Over the years, several different evolutionary pathways have been proposed. One popular theory is that speech evolved from nonhuman primate vocalisation, such as alarm calls. Another possibility is that language evolved from a system of manual gestures, or language may have arisen as a supplement to social grooming. However, none of the proposed theories has provided us with a convincing answer that is agreed upon. Since the discovery of the so-called mirror neurons, there has been renewed interest in the connection between manipulation and gesturing and spoken language. These neurons become activated when a monkey or human is grasping or manipulating an object or observing someone else doing the same motion, that is, they respond to visual stimuli. While mirror neurons are present in both left and right ventral premotor cortex in monkeys, they are only present in the left hemisphere, part of Broca’s area, in humans. The perceived connection between manipulation and gesturing and spoken language, which has been reinforced by studies of mirror neurons, has resulted in handedness often being used as a means of investigating the evolution of language. Modern humans show species level right handedness (i.e. a left hemisphere dominance), and much research has been conducted on nonhuman primates to trace the evolution of right handedness. Previous research on handedness in nonhuman apes has produced varying results (e.g. Hopkins et al. 200 1,2004; Marchant & McGrew 2005). In the present study, an extensive examination of limb preference was conducted in captive bonobos (N=22), chimpanzees (N=7), gorillas (N=2 1) and orang utans (N=21). Hand use was recorded during a range of behaviours which were part of the apes’ daily routine: leading limb, scratch, carry and object manipulation, reach, feed, gesture and tool use. The results from this study did not reveal any significant species-level handedness. This is in concordance with 431
432 the findings of other leading authorities ( e g . Marchant & McGrew 1996; McGrew & Marchant 2001) which have found no evidence for species-level handedness. If we believe that there is a connection between handedness and language capabilities then, from these results, the latter must have evolved since the split between chimpanzees and hominids. The alternative is that handedness has no relation to language capabilities and that the latter evolved earlier. Acknowledgements This research was supported by grants from the Wenner-Gren Foundation for Anthropological Research (grant number Gr. 658 1) and the Leakey Trust (U.K.). I am extremely grateful to the personnel at all the zoos in Germany and the U.K. which were visited throughout the course of this study, both for granting permission to study the subjects and for all of their assistance. References Hopkins W.D., Wesley M.J., Hostetter A., Fernandez-Carriba S., Pilcher D. & Poss S. (2001) The use of bouts and frequencies in the evaluation of hand preferences for a coordinated bimanual task in chimpanzees (Pan troglodytes): An empirical study comparing two different indices of laterality. Journal of Comparative Psychology 1 15: 294-299. Hopkins W.D., Wesley M.J., Izard M.K., Hook M., & Schapiro S.J. (2004). Chimpanzees (Pan troglodytes) are predominantly right-handed: Replication in three populations of apes. Behavioral Neuroscience 1 18: 659-663. Marchant L.F. & McGrew W.C. (1996). Laterality of limb function in wild chimpanzees of Gombe National Park: comprehensive study of spontaneous activities. Journal of Human Evolution 30: 427-443. Marchant L.F. & McGrew W.C. (2005). Manual laterality in ant fishing by wild chimpanzees at Mahale Mountains National Park, Tanzania. American Journal of Physical Anthropology Supplement 40: 155. McGrew W.C. & Marchant L.F. (2001). Ethological study of manual laterality in the chimpanzees of the Mahale Mountains, Tanzania. Behvaiour 138: 329-3 5 8.
UNIDIRECTIONAL MEANING CHANGE WITH METAPHORIC AND METONYMIC INFERENCING
TAKASHI HASHIMOTO & MASAYA NAKATSUKA School of Knowledge Science, JAIST 1-1, Nomi, Ishikawa, Japan, 923-1292 [email protected] , [email protected]
Grammaticalization is an important factor in language evolution as it may contribute to the emergence and the evolution of grammatical forms (Heine & Kuteva, 2002; Hurford, 2003). Considering what kinds of dispositions in cognitive mechanism can induce grammaticalization is significant in studying the origin of language. Hashimoto and Nakatsuka (2006) showed that two designs of meaning structure, “pragmatic extension” and “cooccurrence”, were effective to realize unidirectional meaning changes, the centric feature of grammaticalization, by constructing a computational model of grammaticalization. This model is made based on the iterated learning model of Kirby (2002), in which a speaker having a set of production rules utters descriptions of some situations composed of some elemental meanings to a hearer who tries to construct hidher own rule set. In this paper, we analyze the relationships of the two designs with metaphoric and metonymic inferencing, the important mechanisms for meaning change. The design of meaning structure named “pragmatic extension” is the followings: the speaker can use forms F2 and F3 representing elemental meanings M2 and M3. respectively, in order to describe another elemental meaning M I . For example, in order to describe a meaning of (go), the forms representing (run) and (walk)can be utilized. In our simulations, this setting boosts the frequency of meaning changes in which the source is (go} and the targets are the other meanings including but not limited to (run)and (walk). Note that all meaning changes have virtually the same frequencies without this setting. Since the situational meaning is denoted as ([tense]verb(agent, patient)) in the model, (go}, (run)and (walk}are in predefined paradigmatic relations. The current setting of “pragmatic extension” means that the speaker recognizes the relevance among specific meanings in the paradigmatic relations and (go) as the core of those meanings. The speaker applies a production rule M2 -+ F2 to M I extensively based on the recognition of the relevance of M2 to M I . This process corresponds to the metaphoric inferencing in which expressions in a meaning domain are applied to another domain based on the relevance between the domains. 433
434 The design of meaning structure named “cooccurrence” is defined as follows: a combination of two elemental meanings A4 and A4’ is more frequent than the other combinations in the situations to be described. In our simulations, setting the “cooccurrence” of (go) and ( f u t u r e ) makes the meaning change from the former to the latter more frequent than to the other meanings. Note that there is no selectivity in the target of meaning change without this setting. The “cooccurrence” means that the hearer recognizes a relevance between specific meanings in a syntagmatic relationship, for the meanings (verb)and (tense) have a predefined syntagmatic relation in the model. It can be said that the meaning change from (go) to ( f u t u r e ) based on the recognition of the syntagmatic relevance is induced by metonymic inferencing by the hearer. In sum, we have showed that the core of some meanings having paradigmatic relevance, such as (go) in ( r u n )and (walk), is the source of unidirectional meaning change and a meaning having syntagmatic relevance to the source is the target. It is suggested that the cognitive dispositions of language users make the unidirectionality possible: concretely, the speaker makes the metaphoric inferencing in which he/she recognizes a paradigmatic relevance and applies a rule extensively, and the hearer does the metonymic inferencing in which he/she shifts meanings based on the recognition of syntagmatic relevance. We also found that two generalization learning mechanisms adopted in our model, respectively corresponding to reanalysis and analogy, are both related to metaphoric inferencing. In contrast, Hopper and Traugott (2003) insist that reanalysis is related to metonymic inferencing. While this difference between our and their results is interesting, the important common point is that speakers’ metaphoric and hearers’ metonymic inferencing contribute to grammaticalization.
References Hashimoto, T., & Nakatsuka, M. (2006). Reconsidering Kirby’s compositionality model toward modelling grammaticalisation. In A. Cangelosi, A. Smith, & K. Smith (Eds.), The evolution of language (pp. 41.5416). New Jersey: World Scientific. Heine, B., & Kuteva, T. (2002). On the evolution of grammatical forms. In A. Wray (Ed.), The transition to language (pp. 376-397). Oxford: Oxford University Press. Hopper, P. J., & Traugott, E. C. (2003). Grammaticalization. Cambridge: Cambridge University Press. Hurford, J. R. (2003). The language mosaic and its evolution. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 38-57). Oxford: Oxford University Press. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In T. Briscoe (Ed.), Linguistic evolution through language acquisition (pp. 173-203). Cambridge: Cambridge University Press.
RECENT ADAPTIVE EVOLUTION OF HUMAN GENES RELATED TO HEARING
JOHN HAWKS UniversiQ Department of Anthropology, University of Wisconsin-Madison, I180 Observatory Drive, Madison, WI 53706, USA [email protected]
Language requires not only a detailed anatomical and neurological system of language production, but also a highly adapted system of reception. Listening entails hearing. Yet theories of language origins have made very little reference to the sense of hearing. For example, a review of 8 recent books about language origins finds none that even have “hearing” listed in the index. This apparent blind spot is defensible in terms of comparative biology: other primates also have sophisticated patterns of vocal communication that would require highly adapted hearing faculties. Moreover, the brain organs that enable language production and perception would seem to have followed a more unique evolutionary pathway in humans than the primary auditory apparatus. But compared the complex developmental processes of the brain, the auditory pathway is more analytically tractable. Network analyses and gene expression profiles of developing cochlear tissue provide ways to analyze the genetic basis of human hearing. In short: a selected substitution in a brain development gene might have any one (or several) of hundreds of phenotypic targets, while a selected substitution in a gene underlying auditory development ‘probably (although not certainly) corresponds to a change in hearing. Still, it remains to demonstrate that human language has unique auditory requirements compared to the vocal communication systems of other primates. Four observations make this hypothesis plausible: (1) Humans live much longer than other primates, exacting persistent requirements from the auditory system over 70 or more years. (2) Human children must begin to distinguish phonemes from a very early age in order to allow further development of language processing. (3) Humans engage in age-dependent, sex-dependent, and social group-dependent speech patterns that are distinguished by fine auditory cues. (4) Significant speech over long distances, in large groups (crowds), or at low amplitudes (whispering) make significant demands on the auditory system, with a greater range than other primates.
435
436 This study tests the hypothesis of significant selection on the human auditory system, by genomic comparisons of humans and other primates and genome-wide selection scans in living people. Consistent with earlier work (Clark et al., 2003), a set of hearing-related human genes shows clear signs of recurrent selected substitutions in humans compared to chimpanzees and macaques. These recurrent substitutions may have occurred at any time during human evolutionary history, but they were repeated with several selected variants for each gene. A smaller set of genes shows signs of significant population differentiation in living humans, due to recent strong selection (Williamson et al., 2007). In these cases, a selected allele is at or near fixation in one human HapMap sample, and rare in other samples, showing very strong selection within the last 50,000 years. Most interesting, a relatively large set ( w 10-15) of hearing-related genes have variants currently at low frequency under recent strong selection in one or more human populations (Wang et a]., 2006; Voight et al., 2006; Hawks et al., in press). These genes have been undergoing selection in historic times, with maximum increases in frequency during the last 2000-3000 years. These selection scans may yield a chronology of language evolution that has been challenging to obtain from fossil or archaeological sources. A reasonable hypothesis is that human communication systems emerged gradually during the Pleistocene, but that the full attainment of language was evolutionarily recent. In strong contrast to the view that evolution stopped at the origin of modern humans, it appears that hearing-related adaptations have continued to evolve in the context of recent popuIation growth.
References Clark AG, Glanowski S, Nielsen R, Thomas PD, Kejariwal A, Todd MA, Tanenbaum DM, Civello D, Lu F, Murphy B, Ferriera S, Wang G, Zheng X, White TJ, Sninsky JJ, Adams MD, Cargill M. 2003. Inferring nonneutral evolution from human-chimp-mouse orthologous gene trios. Science 302: 1960-1963. Hawks J, Wang ET, Cochran G, Harpending HC, Moyzis RK. in press. Recent acceleration of human adaptive evolution. Proc Natl Acad Sci U S A . Voight BF, Kudaravalli S, Wen X, Pritchard JK. 2006. A map of recent positive selection in the human genome. PLoS Biol4:e72. Wang ET, Kodama G, Baldi P, Moyzis RK. 2006. Global landscape of recent inferred Darwinian selection for Homo supiens. Proc Natl Acad Sci U S A 103:135-140. Williamson S, Hubisz MJ, Clark AG, Payseur BA, Bustamante CD, Nielsen R. 2007. Localizing recent adaptive evolution in the human genome. PLoS Genet 3:e90.
INHIBITION AND LANGUAGE: A PRE-CONDITION FOR SYMBOLIC COMMUNICATIVE BEHAVIOUR CARLOS HERNANDEZ-SACRISTAN Departament de Teoria dels Llenguatges I Ciencies de la Comunicacio, Universitat de Valencia, Avgda. Blasco IbLiAez 32, Valencia, 46010, Spain
Taking into consideration some aspects of the linguistic deficit observed in neurologically injured (aphasic) patients, the inhibition of verbal behaviour can be viewed as an ability crucially involved in defining the nature of the symbolic means of communication. Inhibition is here understood as a preconscious (procedural) ability to suspend the use of language or to delay this use when circumstances require it. The position of a hearer cannot be understood without appeal to this ability to inhibit the motor programs for linguistic production. Moreover, inhibition is a pre-condition for the establishment of a perceptual differential between actual (external) and virtual (inner) speech that permits both a development of dialogical frames and an abstraction operating on expressive means. Inhibition defines the first and foundational strategic option in the use of a symbolic language. From a pragmatic point of view, the possibility of not using a sign entails an added potential of meaning for this sign. In this way, expressive means are transformed into a unique object of attention and experience for humans. Two combined ‘pre-representational’ definitions of symbol (as opposed to signal) summarize this position: I: A symbol is the sign whose use can be strategically inhibited (or delayed). 11: A symbol is the sign whose conditions of perception are also a significant part of its potential meaning. This theoretical point of view (Hernandez-Sacristan, 2006) is formulated by taking into consideration three kinds of data obtained from: 437
438 1) Speech therapy and assessment protocols of aphasia that commonly include activities related to verbal behaviour inhibition or delay (HelmEstabrooks & Albert, 1991; Goodglass & Kaplan, 1983). 2) Results of a test specifically designed to evaluate some linguistic skills related to inhibition in people with aphasia. Our data reveal a high degree of correlation between a general scale of severity of aphasia and the degree to which the referred skills are controlled. Linguistic skills related to Hockettian design features like reflexivity and displacement are included in this test. 3) Results of exploring verbal behaviour in people with aphasia in free conversational frames. Language use deficits in neurologically injured (aphasic) patients can be reevaluated as a restriction on the options offered by the symbolic means to communicate a message, a null-expression being one option and, more accurately, the ‘preliminary’ one. The inhibition of verbal behaviour and the development of experiential dimensions which depend on this ability seem to be intimately related to ‘juvenilization’ or ‘neoteny’, a biological process with a very significant role for human language. We assume that the neurological structures known as ‘mirror neurons’, although most likely not a sufficient condition, are however a necessary condition for inhibition taken as an intentional process (Cf.Hurford, 2004, for this discussion). Mirror neurons permit us -in this sense- to simultaneously explain both imitation and its strategic suspension, the latter certainly a more conspicuous ability than actual imitation.
References
Goodglass, H., & Kaplan, E. (1983). The assessment of aphasia and related disorders. Philadelphia: Lea and Febiger. Helm-Estabrooks, N., & Albert, M. L. (1991). Manual of aphasia therapy. Austin, Tex: Pro-Ed. Hernandez-Sacristan, C. (2006). Znhibicidn y lenguaje. A propdsito de la afasia y la experiencia del decir. Madrid: Biblioteca Nueva. Hurford, J (2004). Language beyond our grasp: what mirror neurons can, and cannot do, for the evolution of language. In D. K. Oller and K. Griebel (Eds.), Evolution of communication systems: a comparative approach (pp. 297-3 13). Cambridge, MA & London: MIT Press.
PRAGMATIC PLASTICITY: A PIVOTAL DESIGN FEATURE?
STEFAN HOEFLER Language Evolution and Computation Research Unit, Linguistics and English Language The University of Edinburgh, 40 George Square, Edinburgh EH8 9LL, Scotland, U K stefan @ling.ed.ac.uk
Models developed to study the origins of language-both theoretical and computational-often tacitly assume that linguistic signals fully specify the meanings they communicate. They imply that ignoring the fact that this is not the case in actual language use is a justified simplification which can be made without significant consequences. By making this simplification, however, we miss out on the extensive explanatory potential of an empirically attested property of language: its pragmatic plasticity. In this short paper, I argue that pragmatic plasticity plays a substantial role in the evolution of language and discuss some of the key contributions this “design feature of language” (Hockett, 1960) has made to the success of linguistic communication. Language exhibits pragmatic plasticity when the meaning a signal comes to communicate in a specific context differs from its conventional meaning-when the signal’s conventional meaning under- and/or overspecifies the actually communicated meaning. Pragmatic plasticity may not be a feature pertaining to human language only, but I claim that, due to their highly developed ability to recognise common ground (Clark, 1996), it is employed by humans to a degree which cannot be found in animal communication. The same holds for conventionalisation, the process by means of which the meaning constructed in a specific context on the basis of a signal’s pragmatic plasticity becomes enshrined as a new linguistic convention. The following aspects and consequences of pragmatic plasticity and its conventionalisation are thus particularly significant to language evolution: 1. Creutivity. In effect, pragmatic plasticity is creative language use. It constitutes the major source of linguistic innovation. Theoretically, the presence of pragmatic plasticity is sufficient for language to be able to meet new communicative needs. Resorting to invention is not necessary. 2. Adaptability. Through pragmatic plasticity, linguistic conventions are adapted to novel contexts. This allows language to function as a c o m u nication system in the fast-changing dynamic environment of human societies. Frequently needed usages become more readily accessible-and language thus more efficient-through their conventionalisation. 439
440
3. Expressivity. Pragmatic plasticity means that novel meanings are expressed by using extant conventions in an under- andor overspecified way. Once these novel usages become conventions themselves, they can exhibit pragmatic plasticity too, and thus make available yet another set of meanings not accessible before. This “ratchet-effect’’ (Tomasello, 1999) allows for the cumulative exploitation of ever new meaning spaces, and thus leads to a gradual increase of the number of meanings that can be expressed.
4. Compression. Articulation constitutes a bottleneck for linguistic communication (Levinson, 1995): meanings are transmitted via relatively slow physical channels (speech or gestures). Pragmatic plasticity accommodates this constraint by facilitating so-called lossy data compression: only information which cannot be inferred from context needs to be encoded in the linguistic signal-the rest can be left underspecified. Because we reason faster than we articulate, this increases the efficiency of linguistic communication.
5. Symbolism. A signal exhibits pragmatic plasticity even if it is not conventionally associated with a meaning (yet) and merely triggers the inference of meaning from the context. The conventionalisation of such maximally underspecified usage can lead to the emergence of symbolic associations. 6. Grammaticalisation. Pragmatic plasticity and conventionalisation are the origin of the semantic change found in grammaticalisation (Traugott & Dasher, 2005), the set of processes involved in the emergence of grammar.
7 . Ambiguig. As it seems to be dysfunctional, ambiguity is often considered to pose an evolutionary puzzle (Hoefler, 2006). But only if we allow for ambiguity, novel usages can become conventionalised. Ambiguity is thus a crucial prerequisite for pragmatic plasticity to unfold its potential.
I conclude from these considerations that pragmatic plasticity and conventionalisation are pivotal to the emergence and evolution of language. They should therefore occupy a more central position in evolutionary linguists’ models. References
Clark, H. (1996). Using language. Cambridge: Cambridge University Press. Hockett, C. F. (1 960). The origin of speech. Scient$c American, 203,88-96. Hoefler, S. (2006). Why has ambiguous syntax emerged? In A. Cangelosi, A. D. M. Smith, & K. Smith (Eds.), The evolution of language (pp. 123-130). World Scientific. Levinson, S. C. (1995). Three levels of meaning. In F. Palmer (Ed.), Grammar and meaning (pp. 90-1 15). Cambridge: Cambridge University Press. Tomasello, M. (1999). The cultural origins ofhuman cognition. Cambridge, MA: Harvard University Press. Traugott, E. C., & Dasher, R. B. (2005). Regularity in semantic change. Cambridge: Cambridge University Press.
CONTINUITY BETWEEN NON-HUMAN PRIMATES AND MODERN HUMANS? JEAN-MARIE HOMBER’I Laboratoire Dynamique du Langage, University of Lyon and CNRS, ISH, 14, Ave Berthelot, 69363 Lyon Cedex 07, France
The question of continuity vs. discontinuity between the communication systems used by non-human primates and by humans is central to the study of language evolution. In this paper, I argue that there is continuity between these systems but that traces of continuity will not be found by comparing non-human primate vocalizations and articulated human language. The first part of the paper will present the main characteristics of vocalizations as well as the distinctive features of human language. It will be shown that vocalizations are strongly connected to emotional states and are generally not controlled by the individual as opposed to language in which acoustic signals produced by humans are less controlled by emotions and do not depend solely on the immediate environment. In the second part of the paper I will show that: - Contrary to vocalizations, communicative gestures used by primates (Call and Tomasello, 2007) are intentional and not controlled by emotional states - Some of the acoustic signals produced by humans (cries, laughs, some interjections) are clearly related to emotional states. Neural control of these two systems will be compared. In conclusion, continuity between non-human primate communication systems and language is obvious when compared in the following manner: Non-human primate vocalizations > “non-linguistic” human vocalizations Non-human primate communicative gestures > articulated human language
44 1
442
References
Burling, R. (2005) The Talking Ape: How Language Evolved. Oxford University Press. Call, J and Tomasello, M. (2007). The gestural communication of apes and monkeys. Lawrence Erlbaum. Fitch, W.T., M.D. Hauser and N. Chomsky (2005) The evolution of the language faculty: clarifications and implications, Cognition 97(2): 179-210. Hauser, M.D., Chomsky N and W.T. Fitch (2002) The faculty of language: What is it, who has it, and how dos it evolve? Science, 298, 1569-1579 Jackendoff R. (1999) Possible stages in the evolution of the language capacity; Trends in Cognitive Sciences, 3: 7; 272-279 Jackendoff, R. and S. Pinker (2005) The nature of the language faculty and its implication for evolution of language, Cognition 97, 21 1-225 Pinker, S. and R . Jackendoff (2005) The faculty of language : What’s special about it? Cognition, 95,201-236 Vauclair, J. (2003) Would Humans without language be apes? In J. Valsiner and A. Toomela Cultural guidance in the development of the human mind: V o l 7 Advances in child development within culturally structured environments, pp. 9-26; Greenwich, CT: Ablex Publishing Corporation
AFTER ALL, A “LEAP” IS NECESSARY FOR THE EMERGENCE OF RECURSION IN HUMAN LANGUAGE MASAYUKI IKE-UCHI Language Evolution and Computation Research Unit, University of Edinburgh, UK und
Department of English, Tsuda College, Tokyo, JAPAN
[email protected] c j p The goal of this paper is to reconfirm the necessity of some kind of ‘‘leap’’ (i.e., punctuation, a qualitative change, or appropriation) for the emergence of recursive properties in human language both by showing the “sneak-in’’ problem in computational multi-agent modeling approaches and by revealing the implicit postulation of a ‘‘leap’’ in biological adaptationism approaches. Thus, this paper will reaffirm that continuous evolution from linear syntax to recursive syntax is not plausible. The usual definition of the notion of recursion will be assumed, including both nested and tail recursion. Researchers who have taken multi-agent modeling constructive approaches have claimed recursion-hierarchical structure-spontaneously emerges from things non-recursive like linearity. But closer scrutiny reveals this is not correct, because the very recursive properties themselves sneak into or are (implicitly) included in the initial conditions imposed on the agents. For example, Kirby (2002)’s agents have initial rules like Ybelieves (john, praises (heather, mary)) + ei, which in effect include syntactic embedding, when the simulation starts. In Batali (2002), as he himself notes, “the agents begin a simulation with the ability to use embedded phrase structure.” In other words, they have Merge from the outset. A similar argument holds of the embodiment modeling (for instance, Steels & Bleys (2007)), too. In sum, it has not been proved yet in terms of computer simulation approaches that recursion (or hierarchy) spontaneously emerges from non-recursive linear properties through interactions among the agents. In biological adaptationism approaches (Jackendoff (2002) and Parker (2006), for example), several steps have been postulated for the evolution of 443
444
current human language. Part of the syntaxlLF side of Jackendoff’s incremental scenario is: ... 0 Concatenation of symbols -+ @ Use of symbol position to convey basic semantic relations -, @ Protolanguage + @ Hierarchical phrase structure -+ @ Grammatical categories ... It should be pointed out here that a transition from stage 0 to @, in particular, is a clear qualitative ‘‘leap’’ from linearity to hierarchical recursion (although it is not explicitly recognized). In short, in these approaches, a certain ‘‘leap’’ has been implicitly postulated for the introduction of recursion into human language, though the approaches themselves are otherwise based on the assumption that every evolutionary step is just gradual, continuous, and incremental in accordance with the theory of natural selection. Notice that this is not a simple terminological issue, but is concerning a crucial qualitative difference between certain evolutionary steps in human language. If we didn’t properly recognize it, then that would be equivalent to saying that the evolution of language is no different from that of, say, the beak of the darwinfinch, which no one ever accepts. Noting that there may be principled reasons why two-dimensional, vertical, hierarchical recursion does not gradually derive from onedimensional, horizontal linearity, and also touching on the evidence from language acquisition (Roeper, 2007), I will conclude that (at least) at the present stage of inquiry into the origins and evolution of human language, some qualitative ‘‘leap’’ must be assumed for the emergence of recursion. ---f
-
References Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In T. Briscoe (Ed.), Linguistic evolution through language
acquisition (pp. 1 1 1-1 72). Cambridge: Cambridge University Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammar, evolution. Oxford: Oxford University Press. Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In T. Briscoe (Ed.), (pp. 173-203). Parker, A . (2006). Evolution as a constraint on theories ofsyntax: The case against minimalism. Ph.D. dissertation, the University of Edinburgh. Roeper, T. (2007). The prism of grammar: How child language illuminates humanism. Cambridge, MA: MIT Press. Steels, L., & Bleys, J. (2007). Emergence of hierarchy in fluid construction grammar. In
Proceedings of the social learning in embodied agents workshop at the 9‘” European conference on artificial l i f e .
LABELS AND RECURSION: FROM ADJUNCTION SYNTAX TO PREDICATE-ARGUMENT RELATIONS ARITZ IRURTZUN Linguistics and Basque Studies, University of the Basque Country, Vitoria-Gasteiz, 01006, Basque Country (Spain)
I explore the emergence and ontology of syntactic labels. I propose that labels are created derivationally as a ‘reparation’ that circumvents the violation of a legibility condition. As a consequence, I argue that predicate-argument relations are derived from a more primitive adjunctive syntax without labels (cJ: Hornstein, Nunes & Pietroski (2006), Hinzen (2006)). First, I show that the proposal of the label-free syntax (cJ: Collins (2002)) has serious empirical drawbacks: I briefly discuss the phenomena of XP movement, islands, incorporation, quantificational dependencies and argument structure. All these phenomena make reference to labeled XPs. But assuming labels some questions arise: (i) Why do syntactic phrases have labels? (ii) How do labels appear derivationally? (iii) How do labels identify the set they label? Having Merge as just symmetrical set-formation (cJ Chomsky (2005), Hinzen (2006)) entails that in itself, the merger of (a, p) cannot give a labeled structure, but a simpler {a, p} set. So, the only way to get a labeled structure using just Merge and the lexicon is to take Merge as a compound operation where the first step creates a bare set and the second one provides it a label (1).
V
DP
V
DP
That would answer question (i). However, since the notion of ‘labelhood’ is vague (after all, V is just one of the members of the {V, {V, DP}} set of (lb)), the ontology and consequences of labelhood will have to be explained (questions (ii) and (iii)). My proposal relies in the hypothesis that interfaces require sets with coherent categorial intensions. 445
446
Given such a restriction, labeling operations can be explained as repairing strategies (answering questions (ii-iii)): the label provides a set with a coherent intension (ie. all of the members of the set contain a given categorical feature). For instance, in the step1 of (la), the simple {V, DP} set is created but at this step, the set {V, DP} is heterogeneous: there is no grammatical category that can provide it a coherent type, and hence, it is illegible (assuming a Neodavidsonian conjunctivist semantics, in (la) we have two unrelated monadic predicates (something like {kiss(e) & Mary(y)})). I will argue that the labeling mechanism provides the step from this adjunct-like syntax of conjunction of independent predicates to the hierarchical predicate-argument syntax based on labels (cf: Hornstein, Nunes & Pietroski (2006), Hinzen (2006)): having {V, DP} in (la), the verbal head (the syntactically active locus) is remerged with the structure to give it a coherent type (1b). Now an asymmetry emerges in the new set; crucially, both members of {V, {V, DP}} will have a verbal character (both contain a [+V] categorial feature). Thus, the set {V, {V, DP}} labeled with a verbal intension is readable at the interfaces. We are left with a last problem though: the primitive {V, DP} of (la) (now, a member of {V, {V, DP}} in (lb)) is still an illegible object. And obviously, recursion on the labeling strategy won't solve the problem. Here my proposal is a purely repairing strategy: the DP that as such is interpretable ( i e , Val(y, Mary) iffMary(y)) is now in a verbal environment at the highest phrase (a VP). Thus, the solution to the VPcontained DP is to lift its type (ci la Pietroski (2005)) to accommodate its type to that of the intension of the highest set that contains it: this turns the DP complement of V from an individual-denoting type into an event-participant one (an argument) (2): (2) Val(y, Mary) iffMary(y) + Val(e, int-Mary) iffTheme(e, Mary)) Finally, I will argue that taking adjunction syntax to be more basic than predicate-argument syntax provides as well a way to characterize the operation of labeling as a crucial step in the evolution of the human language capacity: labeling provides a crucial trait of natural language; recursion.
References Collins, C. (2002). Eliminating Labels. In S. D. Epstein and T. D. Seely (Eds.), Derivation and Explanation in the Minimalist Program (pp. 42-64), Oxford: Blackwell. Hinzen, W. (2006). The successor function + Lexicon = Human Language?. Ms: U. Amsterdam & U. Durham. Hornstein, N., Nunes, J. & Pietroski P. (2006). Adjunction. Ms: UMCP. Pietroski, P. (2005). Events and Semantic Architecture, Oxford: OUP.
ITERATED LEARNING WITH SELECTION: CONVERGENCE TO SATURATION MICHAEL KALISH Institute for Cognitive Science. University of Louisiana at Lafayette Lafayette, LA 70504-3 772 USA A formal approach to language evolution requires specification of the properties
of variation and selection. Variation is plausibly the result of replication; errors in intergenerational learning produce variability in each generation (Griffiths & Kalish, 2007). A mechanism for selection is less transparent, and this may explain a bias toward selection-free evolutionary accounts of iterated learning as intergenerational transmission. Learning has interesting properties as a source of variation since its variability is not purely random, but rather depends on the data available for learning and the inductive biases of the learners. Exploring the role of inductive biases in iterated learning has resulted in clear results concerning the dynamic and asymptotic properties of the process. However, if we assume that a single set of linguistic universals dominate human languages these results leave a puzzle, since they suggest that there should be a distribution of universals equivalent to the prior bias (that is, learnability) of these priors (Dowman, Kirby & Griffiths, 2006). One might ask, are universals homogeneous or is there some stability in their spatial heterogeneity? Under the assumption that learners are Bayesian (that is, that they update their knowledge according to their experience), the iterated transmission of information results in the convergence of a population of independent learners to their common inductive priors (Griffiths & Kalish, 2007). To date, however, iterated learning has only been examined in the limit case of a large population of well mixed individuals, reproducing without constraint by fitness. The research presented here is a first empirical step in broadening this focus to spatially distributed populations of fixed size in which fitness plays a role in replication. I examined two different processes that both included selection based on communicative fitness and mutation based on Bayesian learning. (1) A birthfirst (Moran-like) process where only one agent in the space, chosen with a probability proportional to its relative fitness, reproduces on each cycle. The spawn then replaces a randomly chosen agent within the parent's neighborhood, possibly including the parent. (2) a deterministic (cellular-automaton-like) 447
448
process where every agent is replaced by the spawn of the fittest agent in the neighborhood. Agents were defined as Bayesian learners, equipped with just two hypotheses (A and B) which they induced through exposure to samples drawn from four possible signals (see Griffiths & Kalish, 2007 for details of the 'two language' example). Agents were placed on a torus and associated in Moore neighborhoods. I varied the number of samples (controlling stability of transmission) and the prior bias of hypothesis B (which controls the stationary distribution in the absence of selection). Fitness was frequency dependent, but symmetric between pairs of agents, reflecting their probability of mutual understanding, as in Nowak, Plotkin & Krakauer (1997). Similar to Nowak's (2006) analytic results for arbitrary mutation, the stability of intergenerational transmission largely determined the outcome of the simulations for the deterministic process. At high stability initial conditions dominated; whatever hypothesis was most prevalent initially increased fitness for agents operating with that hypothesis and thus the transmission probability of it. At low stability, as predicted by iterated learning, bias dominated as each agent was unlikely to shift from their prior due to the noisy data. At middle levels of stability the space was likely to saturate at one of the two hypotheses, with probability determined by both stability and prior bias. Spaces in which both hypotheses were maintained indefinitely decreased with increasing stability, but only stochastically. The spatial distributions of hypotheses in these spaces were not entirely random, but self-maintaining structures did not occur. The Moran process, in contrast, converged to the prior bias regardless of initial conditions, with convergence rate decreasing nonlinearly with the number of samples seen during learning. Either linguistic universals are homogeneous, or they are not because either (1) our space is in transition or (2) more complex processes govern the space of learners. Distinguishing these three possibilities remains a target for this research.
References Dowman, M., Kirby, S., and Griffths, T. L. (2006). Innateness and culture in the evolution of language. In Cangelosi, A,, Smith, A,, and Smith, K., editors, The Evolution of Language: Proceedings of the 6th International Conference on the Evolution of Language. World Scientific Press. Griffiths, T. & Kalish, M. (in press). Iterated learning with Bayesian agents. Cognitive Science. Nowak, M . A . (2006). Evolutionary Dynamics. Harvard U. Press: Cambridge Nowak, M. A., Plotkin, J . B., andKrakauer, D. C. (1999). The evolutionary language game. Journal of Theoretical Biology, 200: 147-162.
A REACTION-DIFFUSIONAPPROACH TO MODELLING LANGUAGE COMPETITION
ANNE KANDLER
JAMES STEELE
AHRC Centre for the Evolution of Cultural Diversity, Institute of Archaeology, University College London, 31-34 Gordon Square, London WClH OPY UK, [email protected] [email protected]
In this paper we consider competition between two languages (where there is also bilingualism) and try to formalise and explain its dynamic. By language competition we mean simply competition for speakers. Simple evolutionary models of language origins have emphasised the importance of co-operation within social groups, as a pre-condition for the emergence of stable shared linguistic conventions. Here we explore the dynamics of changing group size and the stability of group membership when groups are defined by the possession of a shared language, and when groups with different languages come into contact and compete for members. We take an ecological approach, as promoted in linguistics by Mufwene (Mufwene, 2002) and Nettle (Nettle, 1999) among others. Following the paper of Abrams and Strogatz (Abrams & Strogatz, 2003) which presented a two-language competition model to explain historical data on the decline of endangered languages, a number of modeling approaches to this topic have been published. Patriarca and Leppanen in (Patriarca & Leppanen, 2004) set up a reaction-diffusion model and showed that if both languages are initially separated in space and interact only in a narrow transition region, then preservation of the subordinate language is possible. Further, Pinasco and Romanelli developed in (Pinasco & Romanelli, 2006) an ecological model of Lotka-Volterra type which allows coexistence of both languages in only one zone of competition. Very recently Minett and Wang developed an interesting extension of the original Abrams and Strogatz model by including bilingualsm and a social structure. a The present paper should be seen as a further generalisation of the above approaches. We describe the interaction and growth dynamics of two competing languages in a reaction-diffusion competition model. However we also include a bilingual component, following (Baggs & Freedman, 1993) and contrast the reaA number of other mathematical approaches to language competition exist, including agent-based models (Castellb et. al., 2007) and Monte Car10 simulations based on game theory (Kosmidis et. al., 2005). some of which consider bilingualism (Baggs & Freedman, 1993); Caste116 et. al., 2007). Schulze and Stauffer have published a review of such work by physicists (Schulze & Stauffer, 2006).
449
450 sults with the findings of the Minett and Wang model. In our model, language switching cannot occur directly from one monolingual state to the other. There must be an intermediate step - the bilingual state. We develop a model which includes growth, spread and interaction of all three sub-populations of speakers. The reproduction of speakers is described by a logistic growth function with a ’common carrying capacity’, which restricts the sum of frequencies of the monolingual and bilingual components. The spatial spread is modeled by a diffusion term, and the different conversion mechanisms are included as competition terms. We are interested in long term equilibria of the three components, and derive existence and stability conditions for these states. We show that depending on environmental conditions, either coexistence of all three components or the extinction of one monolingual and the bilingual component are possible. Figure 1 shows an example of the course of language competition if each language is dominant in its ’home range’. The blue and red dots show the presence of speakers of the different languages. Growth and spread lead to an interaction zone. There both languages put pressure on each other, and as a result a bilingual group (green dots) occurs. Now the competitive strengths of both languages determine whether individuals of the bilingual group stay bilingual, or switch to one of the monolingual groups. Figure 1 (right) shows a stable long term equilibrium obtained where all three components coexist.
Figure 1. Example of language competition in which the parameter values lead to the stable coexistence of the two monolingual (red and blue) and of the bilingual (green) components.
eferences Abrams, D., & Strogatz, S. (2003). Modelling the dynamics of language death. Nature, 424,900. Baggs, I., & Freedman, H. (1993). Can the speakers of a dominated language survive as unilinguals? Mathl. Comput. Modelling, 18,9-18. Mufwene, S. (2002). Colonisation, globalisation, and the future of languages in the twenty-first century. Int. J. on ~ulticulturalSocieties, 4(2), 162-193. Nettle, D. (1999). Linguistic diversity. Oxford: Oxford University Press. Patriarca, M., & Leppanen, T. (2004). Modeling language competition. Physica A, 338,296-299. Pinasco, J., & Romanelli, L. (2006). Coexistence of language is possible. Physica A, 361,355-360. Schulze, C., & Stauffer, D. (2006). Recent developments in computer simulations of language competition. Computing in Science and Engineering, 8,60-67.
ACCENT OVER RACE: THE ROLE OF LANGUAGE IN GUIDING CHILDREN'S EARLY SOCIAL PREFERENCES KATHERINE D. KINZLER Department of Psychology, Harvard University KRISTIN SHUTTS Department of Psychology, Harvard University EMMANUEL DUPOUX LSCP, EHESS, CNRS, 29 Rue d'Ulm, Paris, 75005, France ELIZABETH S. SPELKE Department of Psychology# Harvard University
Gender, age, and race have long been considered the primary categories by which adults and children divide the social world. However, there is reason to doubt the role of any of these categories in the evolution of intergroup conflict. In neither ancient nor modern times were human groups comprised solely of individuals of one gender, or one age. While race may act as a marker for group membership today, in evolutionary times, groups separated by small geographic distances did not differ in physical properties such as race. Rather, our current attention to race may reflect a system that evolved for other purposes (Kurzban, Cosmides, & Tooby, 2001). In contrast to race, neighboring groups in ancient times likely differed in terms of the language or accent with which they spoke. Cognitive evolution therefore may have encouraged attention to language and accent as a mechanism for determining who is a member of us, and who is a member of them. The present research investigates the origins of attention to language as a social grouping factor, If language is indeed a psychological salient factor we use to make judgments about novel individuals, it might be observed early in development. Moreover difference in accent and language may trump differences in race in importance. Experiment 1 investigated young infants' looking preferences towards native speakers, finding that infants prefer to look longer at someone who 451
452 previously spoke in a native language compared to a foreign language, as well as a native accent compared to a foreign accent (Kinzler, Dupoux, & Spelke, 2007). Experiment 2 tested infants’ social preferences for native speakers more directly (Kinzier et al., 2007). In this study, 10-month-old infants in the U.S. and France viewed movies of an English-speaking actress and a Frenchspeaking actress. Following this, silently and in synchrony, each speaker held up identical toys and offered them to the baby. Just at the moment when the toys disappeared off screen, two real toys appeared for the baby to grasp, giving the illusion that they came from the screen. Infants in Paris reached for toys offered by the French-speaker, and infants in Boston reached for toys offered by the English-speaker, even though the toys were identical and the interactions non-linguistic in nature. In-progress research with 10-month-old infants shows that in contrast to the effects observed with language, infants do not preferentially accept a toy from a member of their own race compared to a member of a different race. Therefore, language, rather than race, influences children’s early interactions with others. In Experiment 3, two-and-a-half-year-old children demonstrated pro-social giving to a native-language speaker, compared to a foreign language speaker. Again, this effect did not obtain with race: Children gave equally to own-race and other-race individuals. Finally, Experiment 4 tested older children’s explicit friendship choices based on language. Five-year-old children demonstrated social preferences for native speakers over foreign speakers or speakers with a foreign accent, and these preferences were not due to the intelligibility of the speech. Finally, although White English-speaking children stated explicit preferences for White children in isolation, when accent was pitted against race, children chose to be friends with someone who was Black and spoke in a native accent. Together, this research provides evidence of the robust effect of language on early social cognition, and its relative importance compared to race in children’s social reasoning. Children, therefore, may attend to social factors that were important indicators of group membership throughout cognitive evolution. References Kinzler, K.D., Dupoux, E., & Spelke, E.S. (2007). The native language of social cognition. The Proceedings of the National Academy of Sciences of the United States ofAmerica, 104, I25 77-12580.
Kurzban, R., Tooby, J., & Cosmides, L. (2001). Can race be erased? Coalitional computation and social categorization. The Proceedings of the National Academy of Sciences of the United Sates ofAmerica, 98, 15387-15392.
LANGUAGE, CULTURE AND BIOLOGY: DOES LANGUAGE EVOLVE TO BE PASSED ON BY US, AND DID IFUMANS EVOLVE TO LET THAT HAPPEN? SIMON KIRBY School ofPhilosophx Psychology & Language Sciences, University of Edinburgh, 40 George Square, Edinburgh, EH8 9LL, UK
Over the course of the EvoLang series of conferences it has become increasingly clear that two senses of the term “language evolution” have emerged. When we think of the evolution of language, do we mean the evolution of the human faculty for language, or the evolution of language itself! Is the principal evolutionary mechanism natural selection in the biological sense, or some kind of cultural analog? It might be thought that the quest to understand the origins of human language should focus on the former, purely biological, question. After all, the cultural evolution of language could be considered synonymous with diachronic linguistics - a field with very different explanatory aims. I believe this thinking is flawed. Instead, I will argue that in order to have a satisfactory understanding of the origins of our faculty of language we must understand far better the mechanisms of cultural evolution, and the implications they have for the biological evolution of our species. In this talk, I will survey the initial suggestive evidence - mathematical, computational, and experimental - for two broad hypotheses relating to the evolution of language, and give an overview of the implications of these hypotheses should they eventually be supported. The first hypothesis, aspects of which can be found in many authors’ work, is: The biological hypothesis: Humans have the capacity for language primarily because of two quite separate preadaptations. Firstly, we are one of a diverse set of species capable of vocal learning (afeat no other primate is capable ofl. That is, we are able to acquire, through observation, sequentially structured gestural signalling. Secondly, we are able to infer intentions in others that are complex enough to have internal structure. I call these preadaptations because I am claiming that neither is necessarily the result of an adaptation to the functions presumed to be fulfilled by modem human language (e.g. “the transmission of propositional structures over a serial interface”, Pinker & Bloom, 1990). Arguably, either can be found in other species, and humans are unique solely in having the combination. This leaves a separate question of what pressures lead to their evolution, which I will not 453
454
address here. However, it is possible that the former arose as a fitness signaler (e.g. Ritchie et al, submitted). It is reasonable to assume that the latter may be adaptive in any social species with the cognitive wherewithal to achieve it. The combination of these two traits sets the scene for a protolanguage that pairs complex sequences with (potentially) complex meanings. It also provides something that is potentially far more significant, namely the substrate for a new kind of evolutionary system: a complex communication system that is culturally transmitted. This leads to the main topic of my talk: The cultural hypothesis: Language structure is the inevitable product of cultural adaptation to two competing pressures: learnability and expressiviq. Note that these are pressures acting on the new evolving entity (language), not on the old evolving entity (humans). They are the automatic consequences of the fact that language is culturally transmitted, and they have profound explanatory force, which we are only beginning to discover. For example, we are are now fairly sure that this means we can explain significant language universals without having to assume strong innate constraints on language acquisition (Kirby et al, 2007). Indeed, it may be the case that the evolutionary mechanisms involved in language lead naturally to a situation where there is little specifically linguistic content to innateness and not much of language structure is the result of natural selection. The picture emerging from computational and mathematical models, as well as a growing number of experimental studies, is one where language adapts to maximise its own chances of survival, providing support for the organismic metaphors of Christiansen (1994, and later work), Deacon (1997) and others. This kind of adaptive system is only possible because of our unique biology, but it is far from clear that this enabling biology arose because of language. References Christiansen, M. H. (1994). Infinite Languages, Finite Minds: Connectionism, Learning and Linguistic Structure. PhD thesis, University of Edinburgh, Scotland. Deacon, T. W. (1997). The Symbolic Species: The Co-evolution of Language and the Brain. W.W. Norton. Kirby, S., Dowman, M., and Grifiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):524 1-5245. Pinker, S . and Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13(4):707-784. Ritchie, G., Kirby, S., and Hawkey, D. (submitted). Song learning as an indicator mechanism: Modelling the developmental stress hypothesis. Journal of Theoretical Biology.
455
Selected Publications
Kirby, S., Dowman, M., and Griffiths, T. (2007). Innateness and culture in the evolution of language. Proceedings of the National Academy of Sciences, 104(12):524 1-5245. [Demonstrates how strong universals can arise without strong innateness.]
Ritchie, G. and Kirby, S. (2007). A possible role for selective masking in the evolution of complex, learned communication systems. In Lyon, C., et al, eds, Emergence of Communication and Language, 387-402. Springer Verlag. [Explores surprising interactions between biological and cultural evolution of birdsong.]
Brighton, H., Smith, K., and Kirby, S. (2005). Language as an evolutionary system. Physics of Life Reviews, 2: 177-226. [Synthesises a number of models treating language itself as a complex adaptive system.]
Kirby, S., Smith, K., and Brighton, H. (2004). From UG to universals: linguistic adaptation through iterated learning. Studies in Language, 28(3):587407. [Presents the iterated learning model of cultural evolution for linguists.]
Christiansen, M. and Kirby, S., editors (2003). Language Evolution. Oxford University Press. [An edited collection surveying the state of the art.] Kirby, S. (2002). Learning, bottlenecks and the evolution of recursive syntax. In Briscoe, T., editor, Linguistic Evolution through Language Acquisition: Formal and Computational Models, chapter 6 , pages 173-204. Cambridge University Press. [Demonstrates the emergence of recursive compositionality in an iterated learning model.]
Kirby, S. (2002). Natural language from artificial life. ArtEficial Life, 8(2): 18521 5 . [Surveys the computational models of language evolution.] Kirby, S. (200 1). Spontaneous evolution of linguistic structure: an iterated learning model of the emergence of regularity and irregularity. IEEE Transactions on Evolutionary Computation, 5(2): 102-1 10. [Shows how cultural adaptation leads to a regularity/frequency interaction in morphology.] Kirby, S. (2000). Syntax without natural selection: How compositionality emerges from vocabulary in a population of learners. In Knight, C., editor, The Evolutionary Emergence of Language: Social Function and the Origins of Linguistic Form, pages 303-323. Cambridge University Press. [Presents the first iterated learning model of the cultural evolution of language.]
Kirby, S. (1999). Function, Selection and Innateness: the Emergence of Language Universals. Oxford University Press. [Sets out mechanisms of linguistic adaptation - how language universals are shaped by language users.]
Kirby, S. (1997). Competing motivations and emergence: explaining implicational hierarchies. Language Typology, 1(1):5-32. [Shows how linguistic adaptation can provide explanations for a particular type of universal structure.]
THREE ISSUES IN MODELING THE LANGUAGE CONVERGENCE PROBLEM AS A MULTIAGENT AGREEMENT PROBLEM
KIRAN LAKKARAJU’ AND LES GASSER’.’ Computer Science Department 2Graduate School of Library and Information Science University of Illinois at Urbana-Champaign {klakkara I gasser}@uiuc.edu
Introduction A language is useless unless it is shared. Individuals and subgroups modify languages by adding new words, creating new grammatical constructions, etc., and propagating these changes through contact. To maintain communicability over time, the population as a whole must converge (possibly within some small diversity limit) to agreement on a “common” language. Abstractly, we can view this process as a Multiagent Agreement Problem (MAP) - individual agents, each in its own state (e.g., speaking some language), change state through interaction to better match the states of others, with the desired end configuration being all agents converge to the same state. The language convergence problem (converging a population of initially linguistically diverse agents to a single language) is clearly a MAP - the agents’ states are their languages and states change via learning from communicative interactions. MAPs with differing conditions have been studied in a wide range of fields, including distributed computing, multi-agent systems, sensor networks, and opinion dynamics to name a few (Lakkaraju & Gasser, 2007). Many powerful models for studying MAPs have emerged. Can we leverage the work in MAPs to develop a general undcrstanding of thc language convergence problem? We suggest that most current MAP models are not applicable to language convergence problems because they do not account for three language convergence issues: the complexity of language, the limited discernibility of language via interaction, and the large potential agreement space for language convergence. Before existing, powerful work in MAPs can be applied to language convergence, MAP models must be extended to account for these properties. Below we describe what is needed for this. Languages are Complex Most current MAP models assume that agents are trying to agree upon one state from a set of unstructured possibilities. Clearly lan456
457 guage is a structured, complex entity in which links between components are crucial. We view a language as made up of at least three constituents: meanings, grammar, and lexicon. Meanings comprise all the issues that can be expressed. The lexicon contains relationships between lexical items and meanings. Grammar specifies how to compose lexemes, and how sentential structure expresses semantic information. These three components are interlinked, and changing one of them can have a great affect on the other components and on communicability with other agents.
Limited Discernibility Most MAP models assume that agents can unambiguously determine the state of other agents through interaction. However, for the case of language, where “state” means “language spoken,” this assumption does not hold. In the language convergence problem agents often interact by playing language games. There are a variety of games, and they allow two agents to exchange information about their respective languages. The information content of these exchanges is always language samples, and they are used by hearers to infer properties of speakers’ languages. The number of samples is limited, and in general insufficient to completely determine the speaker’s language. Thus agents have limited discernibility of others’ states-their languages. This is insufficient to satisfy the typical MAP criterion of complete state discernibility. Large Agreement Space Each state in an agreement space (AS) is a possible point of agreement. In the MAP problem “meeting room scheduling,” for example, this is the set of times at which meetings can be held; agreement is convergence on a single commonly-available time. For language convergence, the AS is the set of possible languages that agents could speak; agreement means speaking the same language from this space. In most current MAP models the agreement space is assumed to be discrete and very small (e.g. ( 0 , l ) in Shoham & Tennenholtz, 1997)). Clearly for language convergence problems, MAP models must handle very large agreement spaces. Conclusion Our work in this area concerns defining current shortcomings in MAP techniques and creating new approaches specifically tailored to solving language convergence problems in a general way, especially for the evolutionary design of large communicative groups of artificial agents. References Lakkaraju, K., & Gasser, L. (2007). A unified framework for multi-agent agreement. In Proceedings of aamas ’07. Honolulu, Hawaii. Shoham, Y., & Tennenholtz, M. (1997). On the emergence of social conventions: modeling, analysis, and simulations. Artificial Intelligence, 94( 1-2), 139166.
THE DEVELOPMENT OF A SOCIAL SIGNAL IN FREERANGING CHIMPANZEES MARION LAPORTE, KLAUS ZUBERBUHLER School of Psycholoa, University of St Andrews, St Andrews, KYI 6 9JP. UK
Little research has been conducted on the question of how our closest living relatives, the chimpanzees, learn to produce and comprehend their own natural vocal repertoire from early infancy. Current theories and models of vocal development and vocal learning rely almost exclusively on research conducted with non-primates, mainly songbirds. However, there are a number of reasons to remain cautious when trying to apply these models to non-human primate vocal development or speech acquisition. For example, as with non-human primates, human infants go through a lengthy phase of non-linguistic vocal behaviour prior to speech production, which is largely responsive to ongoing social events. Birdsong, in contrast, is a sexually selected behaviour that functions in maximising reproductive success; and as such is probably based on fundamentally different psychological mechanisms. In this study, we present data on vocal development in a community of free-ranging chimpanzees at Budongo Forest, Uganda. We were particularly interested in the patterns that underlie the emergence of one specific signal, the pant-grunt vocalisation. When free-ranging chimpanzees encounter a higher-ranking community member they typically produce pant-grunts, which essentially function as a greeting signal. Pant-grunts are emitted at close range and, due to their social unidirectionality, are important manifestations of how callers assess their own social relations. We investigated the development of pant-grunts in infant chimpanzees to document (a) its emergence within an individual’s vocal repertoire, (b) its appropriate usage as a social signal and (c) the social learning processes that take place between infant callers and their mothers or other community members. We found that, unlike other call types, appropriate usage of pant-grunts required a relatively sophisticated understanding of the various social relations amongst community members, the rules thereof most likely had to be inferred by 458
459
observational learning. Pant-grunts emerged at the age of about 5 months, which usually coincided with infants passing through a stage of intense social behaviour usually involving the mother. During this initial period (between 5 and 18 months), pant-grunts were used in a way that differed significantly from adult usage, possibly serving a different function. At this early stage, we found no evidence that infants understood the social dominance hierarchy within the community, and infants used pant-grunts as a means to interact with other community members and participate in social activities. With increasing age and social experience, call use becomes more focused and increasingly used as a greeting signal towards higher-ranking community members. We discuss the role of social learning processes and individual experience during this transition.
GESTURAL MODES OF REPRESENTATION - A MULTIDISCIPLINARY APPROACH
KATJA LIEBAL Department of Psycholom, University of Portsmouth, King Henry I"' Street Portsmouth, PO1 ZDY, UK HEDDA LAUSBERG Department of Psychosomatic, University Friedrich Schiller Jena, Bachstrasse 18 Jena. 00743, Germany ELLEN FRICKE, CORNELIA MULLER Department of Cultural Studies, European University Viadrina Frankjiurt (Oder) Grosse Scharrnstrasse 59, 15239 Franhfiurt (Oder), Germany
This talk will present first results of an interdisciplinary project which investigates the structural properties of gestures from a linguistic, a neurocognitive, and an evolutionary perspective. The focus is on one fundamental aspect of these structures, namely the techniques underlying gesture creation, termed gestural modes of representation (Miiller 1998a,b). Four basic modes of representation are distinguished: the hands model a threedimensional shape of an object, the hands outline the two-dimensional form of an object, or the hands embody the object (a flat hand embodies a piece of paper or a window), or the hands reenact an everyday activity such as opening a window or turning a car key. In studies on patients with brain lesions, similar categories (pantomime, body-part-as-object) have been found to be generated in different brain regions (Lausberg, Cruz, Kita, Zaidel, & Ptito, 2003). On this basis, neuroscientific studies contribute to identifying formal and semantic structures of gestures. Comparative studies of gestural structures in human and nonhuman primates will investigate more closely which of the linguistically identified structures in human gestures are present in our closest relatives, the nonhuman great apes including orangutans, gorillas, chimpanzee es and bonobos (Liebal, Miiller & Pika, 2007). This will sharpen our
460
461 understanding of the different kinds of structures present in human gestures and reveal which aspects of the human techniques of gesture creation are also present in nonhuman primates. Determining exactly which structures overlap across primate species and which ones evolved uniquely with human language will contribute to the current debate in evolutionary anthropology that posits a gesture-first theory of language evolution (Hewes, 1973; Corballis, 2002) against one in which gesture and speech emerged in concert (Arbib, 2003,2005; McNeill2005). Acknowledgements
We would like to thank Volkswagen-Stiftung for funding this project. References
Arbib, M. A. (2003). Protosign and protospeech: An expanding spiral. Behavioral and Brain Sciences, 26(2), 199-266. Arbib, M. A. (2005). Interweaving Protosign and Protospeech: Further Developments Beyond the Mirror. Interaction Studies, 6(2), 145-1 71. Corballis, M.C. (2002). From hand to mouth, the origins of language. Princeton, New Jersey: Princeton University Press. Hewes, G. W. (1973). Primate communication and the gestural origin of language. Current Anthropology, 12( 1- 2 ) , 5-24. Lausberg, H., Cruz, R.F., Kita, S., Zaidel, E.,& Ptito, A. (2003). Pantomime to visual presentation of objects: left hand dyspraxia in patients with complete callosotomy. Brain 126,343-360. Liebal, K., Miiller, C., & Pika, S. (Eds.). (2007). Gestural communication in nonhuman and human primates. John Benjamins Publishing Company. McNeill, D. (2005). Gesture and Thought. Chicago: Chicago University Press. Miiller, C. (1 998a). Redebegleitende Gesten. Kulturgeschichte - Theorie Sprachvergleich. Berlin Verlag: Berlin. Miiller, C. (1998b). Iconicity and gesture. In S. Santi, I Guaitella and C. CavC (Eds.), Oralite' et gestualite': communication multimodale, interaction: actes du colloque Orage'98 (pp. 32 1-328). Montreal, Paris: L'Harmattan.
EXTRACOMMUNICATIVE FUNCTIONS OF LANGUAGE: VERBAL INTERFERENCE CAUSES CATEGORIZATION IMPAIRMENTS GARY LUPYAN Department of Psychology, Cornell University Ithaca, NY, 148.50 USA
A question that is centrally linked to the study of language evolution is whether language facilitates or makes possible certain cognitive acts. Such extra-communicative conceptions of language (e.g., Clark, 1998) argue that in addition to its adaptive value as a communicative tool, language may have evolved, in part, as a cognitive aid. One source of evidence for this claim comes from the study of aphasic patients, who have been observed to suffer not only from communication deficits that define aphasia, but also on a wide-range of tasks that do not require the overt use of language. Indeed, observations that aphasic patients suffer deficits on a range of nonverbal tasks have led some to conclude that one of the main function of language is the ability to “fixate thoughts,” and thus “defect in language may damage thinking” (Goldstein, 1948, p.115). The most consistent and profound non-linguistic deficits in aphasia are seen in a class of categorization tasks that require the patient to selectively attend to particular stimulus features. For instance, many patients are impaired at sorting objects by size, while ignoring shape. After conducting and reviewing a number of such studies, Cohen, Kelter, and colleagues concluded that aphasics have a “defect in the analytical isolation of single features of concepts” (Kelter et al., 1976; Cohen, Kelter, & Woll, 1980; Cohen et al., 1981). All tested subtypes of aphasic patients are “deficient if the task requires isolation, identification, and conceptual comparison of specific individual aspects of an event,” but are equal to controls “when judgment can be based on global comparison” (Cohen et al., 1980). To illustrate, consider patient LEW, who is profoundly anomic, but has excellent comprehension. This patient is severely impaired on taxonomicgrouping tasks with not only complex items like faces, but even the simplest 462
463
perceptual stimuli, being unable to sort colors or shapes meaningful categories (Davidoff & Roberson ,2004). One intriguing possibility is that such impairments are due to the failure of language to maintain appropriate conceptual representations. If so, then normal subjects when placed under conditions of verbal interference may exhibit some of the same symptoms exhibited by aphasic populations-in particular, a difficulty in isolating and focusing on specific perceptual dimensions. To test this hypothesis, participants performed an odd-one out categorization task: given three objects, participants had to choose the object that didn’t belong based on color, size, or thematic relationship (e.g., for a triad consisting of a potato, balloon, and a cake, potato was the correct choice). Verbal interference was implemented as a within-subject manipulation by having participants rehearse number strings during some of the categorization trials. Two experiments used pictures and words as stimuli, respectively. Based on the findings that aphasic patients have particular difficulties with tasks requiring isolation of perceptual features, it was predicted that verbal interference would have a stronger effect on categorization by color and size compared to categorization requiring a focus on broader association (thematic relations). The design for this experiment was borrowed from Davidoff and Roberson’s study (2004, Exp. 7) that was used with the anomic patient LEW in which he showed the predicted effect. Verbal interference resulted in an overall slowing down of responses. Critically, there was a significant interference-condition x trial-type interaction with a significant slowing of responses for perceptual-based categorization (color, size), and no significant effect for trials requiring categorizing based on thematic relations. This effect remained when words rather than pictures were used as stimuli. A control experiment using a visuospatial interference task that replaced to-be-remembered number strings with dot-patterns, failed to find this interaction. These results provide support for the hypothesis that certain categorization tasks may depend in some way on language even while they do not require any type of verbal response. The pattern of results in normal participants placed under verbal interference is strikingly similar to that found in aphasic patients, suggesting that language may play an on-line role in maintaining categorical distinctions and in helping to focus attention on specific perceptual dimensions. These results speak to possible adaptive benefits of language that go beyond interpersonal communication.
FORM-MEANING COMPOSITIONALITY DERIVES FROM SOCIAL AND CONCEPTUAL DIVERSITY GARY LUPYAN Department of Psychology Cornell University Ithaca, NY, 14850 USA
RICK DALE Department of Psychology University of Memphis Memphis, TN, 38152 USA
Language structure is often considered separate from its socio-cultural bearings (e.g., Chomsky, 1995). Such an assumption may obscure rich interaction between the structures present in a language and the social and conceptual circumstances in which they function. Recently, Wray and Grace (2007), drawing on earlier work by Thurston (1994), have argued for distinguishing two broad language types that reflect this interaction. Esoteric (inward-facing) languages are languages spoken within small groups and learned by relatively few outsiders. Exoteric (outward-facing) languages (of which English is an extreme example) are spoken by large groups, and learned by many adults as second languages. Exoteric languages tend to have more open-class words than esoteric languages, possess far simpler morphological systems and can often be well characterized by rule-based grammars. Semantics in exoteric languages are generally compositional-one can derive the meaning of the whole from the meanings of the parts. In contrast, esoteric languages have fewer open-class words, but complex morphological systems. They are highly context dependent, given to numerous exceptions that withstand regularization, and are often characterized by polysynthesis and morphologically-conditioned allomorphy. Wray and Grace (2007) explain the correspondence between language usage (esoteric vs. exoteric) and language structure through evolutionary reasoning. They argue that the characteristics of esoteric languages, though undaunting to infants, lead to substantial difficulty for an adult outsider to learn. Esoteric usage thus marks in-group members by the speakers’ ability to use this linguistic custom, having acquired it during childhood. However, an increasing need to interact with outsiders and about novel topics, insofar as it requires recombining existing elements into novel sentences that are understood by strangers, places a pressure on the language to become more transparent and compositional. This 464
465
makes the language easier to learn by new adult users. Compositionality, common to exoteric languages, is thus supported by a need to communicate with strangers. Compositionality also allows speakers to easily generate new meanings through recombination of familiar elements, allowing for comprehension without the need for extended in-group experiences. Thus, the property of compositionality, rather than an innate language universal, could be a product of out-group interaction-of “talking with strangers” (Wray & Grace, 2007). The current work tests this fascinating hypothesis in a computational framework. We tested two predictions derived from Wray and Grace’s analysis. First, we expected that learning basic grammatical structure common to esoteric languages will be easy for nalve learners, but progressively harder to acquire by learners with experience in another language. In contrast, grammars common to exoterictype languages should continue to be learnable by late learners. Second, because grammars common to exoteric languages have more transparent form-to-meaning mappings, we expected that networks exposed to these grammars should be better able to generalize their linguistic knowledge to novel contexts. A fully-recurrent neural network was trained to map phonological forms to semantics. The networks were trained on sentences corresponding to schematic structures of esoteric and exoteric languages. The exoteric-type grammar consisted of a large vocabulary of lexical morphemes with fixed semantics and few closed-class morphemes which, rather than having fixed semantics, modified the semantics of neighboring open-class words. In such grammars context plays a limited role and there exists a transparent form-to-meaning mapping. The esoteric-type grammars consisted of a much greater proportion of closed-class words and a smaller lexicon. This greater number and prevalence of non-lexical morphemes meant that the lexical semantics were much more context-dependent. Results provided support for both predictions. First, nalve networks could learn esoteric and exoteric grammars to roughly equal proficiency. Critically, age of exposure mattered more for esoteric than exoteric grammars, with the former being disproportionately more difficult to learn by more “mature” networks. Second, as predicted, generalization to novel contexts was more difficult for esoteric compared to exoteric languages. We aim to integrate two approaches to language and its evolution: anthropological theories of sociocultural influences on language, and psychological theories of computational mechanisms for language. In this integrated view, the structural characteristics of language have their origin in the interaction between sociocultural and computational constraints. Generative recursion, long considered foundational to the emergence of our linguistic abilities, may simply be derivative of this interaction.
LANGUAGE AS KLUGE GARY F. MARCUS Department of Psychology, New York Universiv, New York, NY 10012, USA
In fields ranging from reasoning to linguistics, the idea of humans as perfect, rational, optimal creatures is making a comeback - but should it be? Hamlet’s musings that the mind was “noble in reason ...infinite in faculty” have their counterparts in recent scholarly claims that the mind consists of an “accumulation of superlatively well- engineered designs” shaped by the process of natural selection (Tooby and Cosmides, 1995), and the 2006 suggestions of Bayesian cognitive scientists Chater, Tenenbaum and Yuille that “it seems increasingly plausible that human cognition may be explicable in rational probabilistic terms and that, in core domains, human cognition approaches an optimal level of performance”, as well as in Chomsky’s recent suggestions that language is close “to what some superengineer would construct, given the conditions that the language faculty must satisfy”. In this talk, I will I argue that this resurgent enthusiasm for rationality (in cognition) and optimality (in language( is misplaced, for three reasons. First, I will suggest that recent empirical arguments in favor of human rationality rest on a fallacy of composition, implicitly but mistakenly assuming that evidence of rationality in some (carefully analyzed) aspects of cognition entails that the broader whole (i.e. the human mind in toto) is rational. In fact, establishing that some particular aspect of cognition is optimal (or perfect, or near optimal) is not tantamount to showing that the system is a whole is; current enthusiasm for optimality overlooks the possibility that the mind might be suboptimal even if some (or even many) of the components of cognition have been optimized. Second, I will argue that there is considerable empirical evidence (most well-known, but rarely given due attention in the neo-Rationalist literature) that militates against any strong claim of human cognitive or linguistic perfection. Finally, I will argue that the 466
467
assumption that evolution tends creatures towards rationality or “superlative adaptation” is itself theoretically suspect, and ought to be considerably tempered by recognition of what Stephen Jay Gould called “remnants of history”, or what might be termed evolutionary inertia. I will close by suggesting that mind might be better seen as what engineers call a kluge: clumsy and inelegant, yet remarkably effective. References
Fisher, S. E. & Marcus, G. F. (2006). The eloquent ape: genes, brains and the evolution of language. Nature Reviews Genetics, 7,9-20. Marcus, G. F. (2004) Before the Word. Nature, 431,745. Marcus, G. F. (2004). The Birth of The Mind:How a Tiny Number of Genes Creates the Complexities of Human Thought. New York: Basic Books. Marcus, G. F. (2006). Cognitive Architecture and Descent with Modification. Cognition.,lOl, 443-465. Marcus, G. F. (2008). Kluge: The Haphazard Construction of the Human Mind. Boston: Houghton-Mifflin. [UK Edition: Faber & Faber]. Marcus, G. F. & Rabagliati, H. (2006) The nature and origins of language: How studies of developmental disorders could help, Nature Neuroscience, 10, 12261229.
ORIGINS OF COMMUNICATION IN AUTONOMOUS ROBOTS DAVIDE MAROCCO Institute of Cognitive Sciences and Technologies, National Research Council Via S.M. della Battaglia, 00185. Rome STEFAN0 NOLFI Institute of Cognitive Sciences and Technologies, National Research Council Via S.M. della Battaglia, 00185. Rome
The development of embodied autonomous agents able to self-organize a grounded communication system and use their communication abilities to solve a given problem is a new exciting field of research (Quinn, 2001; Cangelosi & Parisi, 2002). These self-organizing communication systems may have characteristics similar to that observed in animal communication (Marocco & Nolfi, 2007) or human language. In this paper we describe how a population of simulated robots evolved for the ability to solve a collective navigation problem develop individual and socialkommunication skills. In particular, we analyze the evolutionary origins of motor and signaling behaviors. The experimental set-up consists in a team of four simulated robots placed in an arena of 27Ox270cm that contains two target areas and are evolved for the ability to find and remain in the two target areas by equally dividing between the two targets. Robots communicate by producing and detecting signals up to a distance of 100cm. A signal is a real number with a value ranging between [O.O, 1.O]. Robots’ neural controllers consist of neural networks and the free parameters of the robots’ neural controllers have been evolved through a genetic algorithm. After the evolutionary process, by analyzing the fitness thorough out generations we observed that evolving robots are able to accomplish their task to a good extent in all replications. Moreover, the comparison between the results obtained in the normal and in the control condition in which robots are not allowed to detect other robots’ signals indicates how the possibility to produce and detect other robots’ signals is necessary to achieve optimal or close to optimal performance. To understand the evolutionary origins of robots’ communication system we analyzed the motor and signaling behavior of 468
469 evolving robots through out generations. To reconstruct the chain of variations that led to the final evolved behavior we analyzed the lineage of the best individual of the last generation. By analyzing the motor and signaling behavior through out generations we observed several evolutionary phases that progressively shape the final behavior by adding new communication behaviors and sensory-motor skills the behavioral repertoire of the robots. In particular, in a first phase the robots move in the environment by producing curvilinear trajectories and by avoiding obstacles and produce two stable signals when they are located inside or outside a target area, respectively, and far from other robots. Moreover, robots produce highly variable signals when they interact with other robots located nearby. In a second phase robots progressively evolve an individual ability to remain in target areas. In particular, robots located on target areas rotate on the spot so to remain there for the rest of the trial. In a third phase, the development of an individual ability to remain on target areas developed in previous generations posed the adaptive basis for the development of a cooperative behavior that allows robots located on a target area alone to attract other robots toward the same target area. At this stage robots are not still able to remain in a target area in couple. Finally, in the last evolutionary phase, we observe a number of variations that allow robots to not exit from target areas when they detect the signal produced by another robot located in the same target area. During this long evolutionary phase we observed that the performances of the robots, the number of signals, and the functionalities of signals remain stable. Obtained results indicate that the signals and the meaning of signals produced by evolved robots are grounded not only on robots sensory-motor system but also on robots’ behavioral capabilities previously acquired. Moreover, the analysis of the co-adaptation of robots individual and communicative abilities indicate how innovations in the former might create the adaptive basis for further innovations in the latter and vice versa. References Cangelosi, A. & Parisi, D. (2002). Simulating the Evolution of Language. London: Springer. Marocco, D. & Nolfi, S. (2007). Communication in Natural and Artificial Organisms. Experiments in evolutionary robotics. In: Lyon C., Nehaniv C. & Cangelosi A. (eds.): Emergence of Communication and Language, London: Springer. Quinn, M. (2001). Evolving communication without dedicated communication channels. Lecture Notes in Computer Science, 2159.
HANDEDNESS FOR GESTURAL COMMUNICATION AND NON COMMUNICATIVE ACTIONS IN CHIMPANZEES AND BABOONS: IMPLICATIONS FOR LANGUAGE ORIGINS ADRIEN MEGUERDITCHIAN1,2, JACQUES VAUCLAIR', MOLLY J. GARDNER3 STEVEN J. SCHAPIR03 & WILLIAM D. HOPKINS2z4 'Department of Psychology, Research Center in Psychology of Cognition, Language & Emotion, University of Provence, 29, Av. R. Schuman, Aix-en-Provence, 13621, France. 'Division of Psychobiology, Yerkes National Primate Research Center, Atlanta, GA, 30322, USA. 3Departrnent of Veterinary Sciences, M.D. Anderson Cancer Center, University of Texas, Bastrop, TX, 78602 USA. 'Department of Psychology, Agnes Scott College, Decatur, GA, 30030. USA.
Most humans show a left-hemispheric dominance for language functions (Knecht et al., 2000). Whereas such a left-lateralization has been historically linked to right handedness for manipulative actions, dominant use of the right hand is also observed for "language-related" gestures such as signing, pointing and manual movements when speaking (reviewed in: Hopkins et al., 2005), suggesting that left-lateralized language areas may underlie gesture production (Kimura, 1993). Behavioral asymmetries in apes and monkeys have been studied for investigating precursors of hemispheric specialization in human and some of these studies have revealed continuities with humans (Hopkins, in press). For example, captive chimpanzees and olive baboons show a dominance of the right hand in bimanual manipulative actions (Hopkins et al., 2005; Vauclair et al., 2005) and, in a higher degree, for communicative gestures (Hopkins et al., 2005; Meguerditchian & Vauclair, 2006). Interestingly, in both species, the hand preferences for gestures showed no correlation with those for bimanual actions. Such findings raise the hypothesis that a specific left-lateralized communicatory cerebral system, which is different from the one involved in manipulative actions, may control communicative gestures and led the authors to consider gestural behaviors as an ideal prerequisite for the emergence of language and its left-lateralization (see Corballis, 2002). To further investigate this hypothesis, the current study was undertaken to determine whether it is the communicative nature of the gestures (and not only 470
471
the motor properties) which induces a different pattern of laterality compared to non-communicative bimanual manipulative actions. Using an observational method, we measured manual preferences in samples of captive baboons and chimpanzees for two new categories of manual actions including: (1) a noncommunicative self-touching action (referred to “muzzle wipe”, serving as a “control” behavior) and ( 2 ) other communicative gestures previously unstudied in each species including: a) human-directed “food begs” in baboons and b) in chimpanzees, human-directed “clapping” and all conspecifics-directed gestures such as “hand slap”, “extended arm”, “wrist present” and “threat”. The results indicated that for both species: (1) communicative gestures show a dominance of the right-hand whereas the self-touching action does not induce populationlevel handedness; (2) within the same subjects, individual hand preferences for the newly investigated gestures are correlated with hand preferences for the previously investigated gestures (“hand slap” in baboons and “food begs” in chimpanzees) but are not correlated with hand preferences for muzzle wipe or bimanual actions. These results in baboons and chimpanzees may not only reveal a left-hemispheric dominance for the various communicative gestures studied (by contrast to a non communicative action) but also support the hypothesis of the emergence from the common ancestor of baboons, chimpanzees and humans of a specific communicatory cerebral circuit involved for gesturing, which may constitute ideal precursors of language-specific cortical network in humans. Refer en ces Corballis, M. C. (2002). From Hand to Mouth. The Origins of Language. Princeton, NJ: Princeton University Press. Hopkins, W. D. (Ed.) (in press). Evolution of Hemispheric Specialization in Primates, Special Topics in Primatology. American Society of Primatology. Hopkins, W. D., Russell, J., Freeman, H., Buehler, N., Reynolds, E., & Schapiro, S. J. (2005). The distribution and development of handedness for manual gestures in captive chimpanzees (Pan troglodytes). Psychological Science, 6, 487-493. Kimura, D. (1 993). Neuromotor mechanisms in human Communication. Oxford: Oxford University Press. Knecht, S., Deppe, M., Draeger, B., Bobe, L., Lohman, H., Ringelstein, E. B., & Henningsen, H. (2000). Language lateralization in healthy right-handers. Brain, 123, 74-8 1. Meguerditchian, A., & Vauclair, J. (2006). Baboons communicate with their right hand. Behavioural Brain Research, 171, 170-1 74. Vauclair, J., Meguerditchian, A., & Hopkins, W.D. (2005). Hand preferences for unimanual and coordinated bimanual tasks in baboons (Papio anubis). Cognitive Brain Research, 25, 2 10-2 16.
THE EVOLUTION OF HYPOTHETICAL REASONING: INTELLIGIBILITY OR RELIABILITY? HUGO MERCIER lnstitut Jean Nicod, 29 rue d’Ulrn Paris, 75005, France
We can divide the problems encountered during language evolution in two broad categories: cognitive problems and strategic problems. Cognitive problems are constraints on the production or understanding of language. Strategic problems are linked to the maintenance of honest communication. It has been argued that hypothetical reasoning (HR) evolved as a mean to overcome a specific cognitive problem, that of producing and understanding displaced reference (Harris, 2000). Here I will argue that HR instead evolved as a mean to overcome strategic problems, more precisely to check communicated information in order to ensure that we are not being deceived. A first argument is theoretical. Firstly, assuming that capacities such as episodic memory were present before language evolution, then there is no reason to expect that translating into language thoughts related to episodic memory (and thus having the properties of displaced reference) would be any harder than translating thoughts about the here and now. Secondly, some animal communication systems have displaced reference - the bee dance for instance without requiring HR. So it would seem that HR is actually not necessary to produce or understand displaced reference. HR can be useful as a mean to check communicated information though. It is well known that for communication to be evolutionary stable, its honesty has to be maintained. Several means to enforce that honesty have been studied in humans: source monitoring, use of behavioral clues, or consistency checking for instance (see DePaulo et al., 2003; Sperber, 2001). It has been argued that reasoning, generally, evolved as a mean to persuade and evaluate information (Sperber & Mercier, in press and Dessalles, 2007, for a related argument). HR, as a special type of reasoning, would be used for the same purposes. 472
473 In order to argue for such a view it is possible to gather different kind of evidence. The first is related to the contexts in which HR is used. If HR evolved to understand displaced reference, it should be used proportionally to the difficulty of understanding such sentences, but if HR evolved to check communicated information, it should mainly be used when confronted with information we have reasons to doubt. This is generally the case for reasoning, and HR doesn’t seem to be any different (see Sperber & Mercier, in press). The second is the efficiency of hypothetical reasoning used in argumentative contexts, because in these contexts people typically have to evaluate communicated information. Numerous experiments by David Green and colleagues have shown that people are proficient at using HR in such contexts (see for instance Green, Applebaum, & Tong, 2006). The third involves delineating features of HR that fit only with one hypothesis. For instance, if HR is used to understand what people say, then it shouldn’t systematically depart from what is meant. If, instead, HR is used to evaluate what is said, then it should depart from what is meant in at least one way: it should seek ways in which what is being communicated, if accepted, would advantage the sender. If such ways are found, then the message should be rejected. And this is what we observe, starting with young children who are able to use a match between people’s intentions and the consequences of what they state to decide whether they should believe them or not (Mills & Keil, 2005). References
DePaulo, B. M., Lindsay, J. J., Malone, B. E., Muhlenbruck, L., Charlton, K., & Cooper, H. (2003). Cues to deception. Psycho1 Bull, 129(l), 74-1 18. Dessalles, J.-L. (2007). Why We Talk: The Evolutionary Origins of Language Cambridge: Oxford University Press. Green, D. W., Applebaum, R., & Tong, S. (2006). Mental simulation and argument. Thinking and Reasoning, 12( l), 3 1-6 1. Harris, P. (2000). The Work of the Imagination. London: Blackwell. Mills, C. M., & Keil, F. C. (2005). The Development of Cynicism. Psychological Science, 16(5), 385-390. Sperber, D. (2001). An evolutionary perspective on testimony and argumentation. Philosophical Topics, 29,401-413. Sperber, D., & Mercier, H. (In Press). Intuitive and reflective inferential mechanisms. In J . S . B. T. Evans & K. Frankish (Eds.), In Two Minds. Oxford: Oxford University Press.
SIMULATION OF CREOLIZATION BY EVOLUTIONARY DYNAMICS
MAKOTO NAKAMURA' TAKASHI HASHIMOTO' SATOSHI TOJO' School of {' Information, 'Knowledge} Science, Japan Advanced Institute of Science and Technology, Nomi. Ishikawa, 923-1292, Japan (rnnakamul; hash, tojo} @jaist.ac.jp
The purpose of this abstract is to investigate the characteristics of creole (DeGraff, 1999) using a mathematical formalization of population dynamics. Linguistic studies show that the emergence of creole is affected by contact with other languages, the distribution of population of each language, and similarities among the languages. Constructing a simulation model including these elements, we derive conditions for creolization from theoretical and numerical analyses. Creoles are full-fledged new languages which children of the pidgin speakers acquire as their native languages. Interesting is the fact that children growing up hearing syntactically simplified languages such as pidgins develop a mature form as Creoles (DeGraff, 1999). Pidgins and Creoles may concern the mechanism for language acquisition of infants. Particularly, some properties of Creoles imply the existence of innate universal grammar. Simulation studies of language evolution can be represented by population dynamics, examples of which include an agent-based model of language acquisition proposed by Briscoe (2002) and a mathematical framework by Nowak, Komarova, and Niyogi (2001), who developed a mathematical theory of the evolutionary dynamics of language called the language dynamics equation, in which the change of language is represented as the transition of population among a finite number of languages. We modified the language dynamics based on social interaction, and then dealt with the emergence of creole (Nakamura, Hashimoto, & Tojo, 2007). In response to the language dynamics equation, we assumed that any language could be classified into one of a certain number of grammars. Thus, the population of language speakers is distributed to a finite number (n)of grammars {GI . . . Gn}.Let zi be the proportion of speakers of Gi within the total population. Then, the language dynamics is modeled by an equation governing the transition of language speakers among languages. Our model is different from the language dynamics equation by Nowak et al. (2001) in that we neglect the fitness 474
475 term in terms of the biological evolution, and focus on the cultural transmission by introducing the degree of language contact, that is:
is the transition matrix among languages. Each element, where &(t)(= (?jij(t)}) qt3,is defined as the probability that a child of G, speaker obtains G j by the exposure to hisher parental language and to other languages. a(t)depends on the distribution of language population at t, similarity among languages and a learning algorithm. Creoles are considered as new languages. From the viewpoint of population dynamics, we define a creole as a transition of population of language speakers. A creole is a language which no one spoke in the initial state, but most people have come to speak at a stable generation. Therefore, creole is represented by G, such that: x,(O) = 0, z c ( t )> B,, where z c ( t )denotes the population share of G, at a convergent time t, and 0, is a certain threshold to be regarded as a dominant language. We set Bc = 0.9 through the experiments. From our experiments, we observed creolization and found a correlation between the number of input sentences and the similarity among languages. Creoles emerged within a certain range of similarity. In our model, languages are defined as similarity between languages, which denotes the probability that a G, speaker utters a sentence consistent with Gj. If we consider some situation of language contact, the target language is either very similar to speakers’ own language or dissimilar at all. Replacing the similarity values with 1 - E for very similar languages and with E for dissimilar languages, the model is very simplified and may be solved analytically. However, if we consider a creole, which is somewhat similar to other contact languages, we cannot replace the values with these simple ones. As a result, our creole model is very difficult to solve analytically. We discuss how to cope with this problem.
References Briscoe, E. J. (2002). Grammatical acquisition and linguistic selection. In T. Briscoe (Ed.), Linguistic evolution through language acquisition: Formal and computational models. Cambridge University Press. DeGraff, M. (Ed.). (1999). Language creation and language change. Cambridge, MA: The MIT Press. Nakamura, M., Hashimoto, T., & Tojo, S. (2007). Simulation of common language acquisition by evolutionary dynamics. In Proc. of IJCAI 2007 Workshop on Evolutionary Models of Collaboration (pp. 21-26). Hyderabad. Nowak, M. A., Komarova, N. L., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114-118.
EVOLUTION OF PHONOLOGICAL COMPLEXITY: LOSS OF SPECIES-SPECIFIC BIAS LEADS TO MORE GENERALIZED LEARNABILITY IN A SPECIES OF SONGBIRDS
KAZUO OKANOYA & MIKI TAKAHASI Lab for Biolinguistics, BSI, RIKEN, 2-1 Hirosawa Wako, 351-0198, Japan
A species of songbirds, the Bengalese finch (Lonchura striata var. dornestica) is a domesticated strain of the wild white-rumped munia. White-rumped munias were imported to Japan some 250 years ago and then domesticated as a pet bird. Munias have been bred for their intense parental behavior and white color morph during the course of domestication, but they were never bred for their songs. Nevertheless, domesticated Bengalese finches sing very different songs from those of Munias: Bengalese songs are sequentially and phonologically complex while Munia songs are simpler (Okanoya, 2004).
MUNIA fostered
toBENG
1
I
I
1
; iL_-_-_
I
i
---
*I
_ I
Fig. I . A white-rumped munia cross-fostered to a Bengalese father (top) had a difficulty in learning a particular song note (bottom) while the Bengalese son learned father's song without difficulty (middle).
476
477
To elucidate the degree in which environmental and genetic factors contribute to these differences in song structure, we cross-fostered chicks of Munias and Bengalese. Detailed phonological analysis revealed that accuracy song-note learning is highest in Munias chicks reared by Munias, and lowest in Munia chicks cross-fostered to Bengalese. Bengalese chicks, on the other hand, showed intermediate degree of learning accuracy regardless whether they were reared by Munias or Bengalese. Results suggest that Munias are highly specialized in learning Munia song phonology, but less adopted in learning song phonology of the other strain, and Bengalese are less specialized in learning the own strain phonology but more generalized in learning the other strain phonology (Fig. 1). Results can be interpreted as that there is an innate bias to learn speciesspecific phonology in Munias, and that such a bias is lost during domestication. White-rumped munias have several sympatric species such as spotted munias in their wild habitat. To avoid infertile hybridization, having a strong innate bias to attend to own-species phonology should be adaptive for Munias. Bengalese, on the other hand, are a domesticated strain and breeding is under the control of breeders. In such environment, species-specific bias is a neutral trait and might soon be degenerated. By the degeneration of species-specific bias, Bengalese perhaps obtained more general ability to learn from a wide-range of phonology. Results also can be explained in the light of masking - unmasking and genetic redistribution, the idea proposed by Deacon (2003). Domestication functions as a masking factor and perceptual specialization for species-specific sound is masked. Under that environment, genetic specialization to attend species specific sound is re-distributed to more general ability to learn from a wider range of sounds in Bengalese finches, and perhaps in humans, in which case, through the process of self-domestications. Acknowledgements This work was supported by Grant-in-Aid for Young Scientists from JSPS to MT and a PREST grant from JST to KO. References Deacon, T. W. (2003). Universal grammar and semiotic constraints. In: M. H. Christiansen & S. Kirby (eds.) Evolution of Language, Oxford University Press, pp. 111-140. Okanoya, K. (2004). Song syntax in Bengalese finches: Proximate and ultimate analyses. Advances in the Study of Behaviour, 34,297-346.
REFERENTIAL GESTURES IN CHIMPANZEES IN THE WILD: PRECURSORS TO SYMBOLIC COMMUNICATION? SIMONE PIKA School of Psychological Sciences, University of Manchester, Coupland I Building, Oxfird Road, Manchester, Lancashire, M I 3 9PL, England (UK) JOHN C. MITANI Department ofAnthropology, University of Michigan, I01 West Hall, I085 South University Avenue Address Ann Arbor, MI 481 09-1 107,United States
One of the driving forces of human research is the question how spoken language, which is thought to be unique to humans, originated and evolved. Researchers quite regularly addressed this question by comparing human communicative signals to the systems of communication evolved in other animals, especially in one of our closest living relative, the chimpanzee (Pan troglodytes). The majority of research focused on vocal communication. Recent studies however provide evidence that gestures play an important role in the communication of chimpanzees and resemble those of pre-linguistic children and just-linguistic human infants in some important ways: they are used as intentional acts, represent a relatively stable part of an individual’s communicative repertoire, and are clearly learned. Chimpanzees however mainly use these communicative means as effective procedures in dyadic interactions to request actions from others (imperatives). Human children however, commonly use referential gestures, e.g. pointing, which direct the attention of recipients to particular aspects of the environment. The use of these gestures has been linked with cognitive capacities such as mental state attribution, because the recipient must infer the signaller’s meaning. Until now, referential gestures have been reported only in captive chimpanzees interacting with their human experimenters and human-raised or language trained individuals. It is therefore not clear yet whether these abilities represent natural communication abilities or are byproducts of living in a human encultured environment. 478
479
Here we report the widespread use of a gesture in chimpanzees in the wild, which might be used referentially. The gesture involved one chimpanzee making a relatively loud and exaggerated scratching movement on a part of his body, which could be seen by his grooming partner. It was observed between pairs of adult males and was recorded 186 times in 101 (41%) of 249 grooming bouts. One hundred nineteen times (64%), the groomer stopped grooming and groomed the scratched spot. Eight times (4%) individuals simultaneously scratched and presented a body part and were groomed there immediately. In 59 cases (32%), the groomer continued to groom without touching the area scratched by the signaler. The gesture received significantly more positive than negative responses (p < 0.001; exact binominal test) and occurred in 61% (N=51) of all observed grooming dyads (N=84). It was performed on average 3.65 timeddyad and was used significantly more often in dyads consisting of high ranking males than other possible pairings (p < 0.001; df=6, linear- linear association. We address the questions whether the behavior reflects, a) behavioral conformity due to stimulus enhancement, b) a physical response by an individual to parasites or dirt, thereby drawing the attention of the groomer to a potential area to groom, or c) a truly communicative signal. The discussion focuses on similarities and differences to i) other referential gestures in apes, ii) gestures of pre-linguistic and just linguistic human children, and iii) homesigns to elaborate on the question if the gestural modality of our nearest primate relatives might have been the modality within which symbolic communication first evolved.
MODELING LANGUAGE EMERGENCE BY WAY OF WORKING MEMORY
ALESSIO PLEBE and VIVIAN DE LA CRUZ Department of Cognitive Science, University of Messina, v. Concezione 8 98121 Messina, Italy {aplebe,vdelacruz} @mime.it
MARC0 MAZZONE Laboratory of Cognitive Science, University of Catania, vide Andrea Doria n 6 95125 Catania, Italy [email protected]
1. The working memory hypothesis One idea on the origin of language is that a key element, if not the most crucial, was the availability of neural circuits in the brain for working memory (Aboitiz, 1995; Aboitiz, Garcia, Bosman, & Brunetti, 2006), the kind of of short-term memory theorized by Baddeley (1992). The neural connections working memory relies upon are those that the language network relies upon as well, namely the extensive connections between temporoparietal and prefrontal areas. Within this system Francisco Aboitiz and his collaborators consider phonological working memory as being of paramount importance in language evolution, suggesting that it originated as a working memory device involved in the imitation of different vocalizations. However, it is only a small part of the role working memory plays in human language. A brain ready for language may have evolved by virtue of an expanding working memory capacity, which allowed not only the processing of complex sequences of sounds, but the ability to keep under attention the semantic meanings of these sounds as they were being formulated as well as the posing of constraints for the emergence of syntactic processes. One of the first forms of embryonic syntax is the association of a word denoting an object with another word denoting a predicate of the object referred to by the other word. The gap between a purely lexical association between sound and meaning and this syntactic ability is well demonstrated by the documented difficulties children have in acquiring adjectives (Sandhofer & Smith, 2007). The attempt done with the proposed model is to contrast the early learning of names and adjectives, in a sufficient realistic model of the human cortex, and to compare the conceptual representation spaces, with or without the availability of a prefrontal working memory loop. 480
481
2. The proposed model A possible way of exploring hypotheses on the origins of language, without getting daunted by the gap of hundreds of thousand of years worth o f events that we cannot arrive at knowing, is to analyze the ontogenetic transition from a nonlinguistic phase to a linguistic one. In the context of this work, we inquire about what kind of basic connection patterns in the brain might have rendered it better suited to eventually support language. We propose a model of the early acquisition of language elements, grounded in perception, composed by cortical maps, with two versions, one implementing a working memory loop in the higher-level map ,and one that does not. This model is a system of artificial cortical maps, each built using LISSOM (Laterally Interconnected Synergetically Self-organizing Map) architecture (Miikkulainen, Bednar, Choe, & Sirosh, 2005), a concept close enough to the biological reality of the cortex, but that possesses the simplicity necessary for building complex models. Details of the model can be read in a similar but simpler system introduced in (Plebe & Domenella, 2007) to model the emergence of object recognition. The present model consists of two main paths, one for the visual process and another for the auditory channel, which convey to a higher map, in which a working memory connectivity can be added. Both models, with and without working memory, are exposed to 7200 pictures of 100 real objects, waveforms corresponding to names of 38 object categories, 7 adjectives in the class of colors, and 4 in the class of shapes, and learns by combination of Hebbian and homeostatic plasticity. The resulting representations are analyzed measuring the population coding of concepts elicited by pictures or sounds in the higher map. Both systems demonstrate the ability to develop semantic associations, but in the simpler version there is no clear representation of the predicative role of adjectives, while the version with working memory loop exhibits the emergence of an embryonic syntax, by establishing a relationship of adjectives with names. References
Aboitiz, F. (1995). Working memory networks and the origin of language areas in the human brain. Medical Hypotheses, 44, 504-506. Aboitiz, F., Garcia, R. R., Bosman, C., & Brunetti, E. (2006). Cortical memory mechanisms and language origins. Brain and Language, 98,40-56. Baddeley, A. (1992). Working memory. Science, 255,556-559. Miikkulainen, R., Bednar, J., Choe, Y., & Sirosh, J. (2005). Computational maps in the visual cortex. New York: Springer-Science. Plebe, A., & Domenella, R. G. (2007). Object recognition by artificial cortical maps. Neural Networks, 20,763-780. Sandhofer, C. M., & Smith, L. B. (2007). Learning adjectives in the real world: How learning nouns impedes learning adjectives. Language Learning and Development, 3,233-261.
MECHANISTIC LANGUAGE CIRCUITS: WHAT CAN BE LEARNED? WHAT IS PRE-WIRED? FRIEDEMANN PULVERMULLER Medical Research Council Cognition and Brain Sciences Unit, Cambridge [email protected]
A brain theory of language and symbolic systems can be grounded in neuroscientific knowledge well established in animal research. Learning is manifest at the neuronal level by synaptic modification reflecting the frequency of use of given connections. Long-distance and short-distance links bridge between, and provide coherence within, brain areas critically involved in linguistic, conceptual perceptual and action processing. Therefore, discrete distributed neuronal assemblies (DDNAs) can develop - that is, they can be learned - that link together (i)
(ii)
acoustic and articulatory phonological information about speech sounds (Pulvermiiller et al., 2006) and spoken word forms (Garagnani, Wennekers, & Pulvermuller, 2007; Pulvermiiller et al., 2001), form-related information about a sign and information about aspects of its referential meaning (Hauk, Johnsrude, & Pulvermiiller, 2004; Pulvermuller, 1999, 2005; Shtyrov, Hauk, & Pulvermiiller, 2004). Referential semantics links signs to specific information about perceptions and actions and is laid down in DDNAs spread out over specific sensorimotor brain areas even reaching, for example, into motor cortex.
This approach does not explain a range of features specific and common to human languages, especially (a) (b)
large vocabularies (10,000s of words), abstract meaning, 482
483 (c)
combinatorial categorisation.
principles
that
govern
syntax
and
syntactic
These critical issues will be addressed, asking about possible brain prerequisites and, therefore, genetic preconditions. (a) We tentatively relate the capability to build large sets of DDNAs to a genetically determined behavioural feature, the early occurrence of repetitive movements and articulations, which leads to the formation of perception-action circuits in the brain that pave the ground for DDNAs later used in language processing (Braitenberg & Pulvermuller, 1992). (b) Abstract meaning processing is based on one more inborn feature of the nervous system, the capability to implement logical operations. Some aspects of abstract meaning can be analysed in terms of either-or functions operating on perceptual and action-related information. These neuronal function-units located close to relevant action-perception systems may provide a brain basis for abstract meaning (Pulvermuller, 2003; Pulvermuller & Hauk, 2006). (c) Combinatorial principles are thought to be laid down in the mind by linguistic principles and rules. A brain-inspired neuronal model of word sequence processing leads to the formation of discrete combinatorial rulerepresentations on the basis of learning (Knoblauch & Pulvermuller, 2005). Neurophysiological results further support the notion of discrete combinatorial brain mechanisms (Pulvermuller & Assadollahi, 2007). The need for and nature of inborn syntactic mechanisms at the neuronal level is discussed in closing.
References Braitenberg, V., & Pulvermuller, F. (1 992). Entwurf einer neurologischen Theorie der Sprache. Naturwissenschajien, 79, 103-1 17. Garagnani, M., Wennekers, T., & Pulvermuller, F. (2007). A neuronal model of the language cortex. Neurocomputing, 70, 1914-19 19. Hauk, O., Johnsrude, I., & Pulvermuller, F. (2004). Somatotopic representation of action words in the motor and premotor cortex. Neuron, 41,301-307. Knoblauch, A., & Pulvermuller, F. (2005). Sequence detector networks and associative learning of grammatical categories. In S. Wermter & G. Palm & M. Elshaw (Eds.), Biomimetic neural learning for intelligent robots (pp. 3 1-53). Berlin: Springer. Pulvermuller, F. (1999). Words in the brain's language. Behavioral and Brain Sciences, 22,253-336.
484
Pulvermuller, F. (2003). The neuroscience of language. Cambridge: Cambridge University Press. Pulvermiiller, F. (2005). Brain mechanisms linking language and action. Nature Reviews Neuroscience, 6 (7), 576-582. Pulvermiiller, F., & Assadollahi, R. (2007). Grammar or serial order?: Discrete combinatorial brain mechanisms reflected by the syntactic Mismatch Negativity. Journal of Cognitive Neuroscience, 19 (6), 971-980. Pulvermuller, F., & Hauk, 0. (2006). Category-specific processing of color and form words in left fronto-temporal cortex. Cerebral Cortex, 16 (8), 11931201. Pulvermiiller, F., Huss, M., Kherif, F., Moscoso del Prado Martin, F., Hauk, O., & Shtyrov, Y. (2006). Motor cortex maps articulatory features of speech sounds. Proceedings of the National Academy of Sciences, USA, 103 (20), 7865-7870. Pulvermiiller, F., Kujala, T., Shtyrov, Y., Simola, J., Tiitinen, H., Alku, P., Alho, K., Martinkauppi, S., Ilmoniemi, R. J., & Naatanen, R. (2001). Memory traces for words as revealed by the mismatch negativity. Neuroimage, 14 (3), 607-616. Shtyrov, Y., Hauk, O., & Pulvermuller, F. (2004). Distributed neuronal networks for encoding category-specific semantic information: the mismatch negativity to action words. European Journal of Neuroscience, 19 (4), 1083-1092.
REFLECTIONS ON THE INVENTION AND REINVENTION OF THE PRIMATE PLAYBACK EXPERIMENT GREGORY RADICK Department of Philosophy (Division of History and Philosophy of Science), University of Leeds, Leeds LS2 9 f l , UK
In the early 1890s the theory of evolution gained an unexpected ally: the Edison phonograph. An amateur scientist, Richard Garner, used the new machine - one of the technological wonders of the age - to record monkey calls, play them back to the monkeys, and watch their reactions. From these soon-famous experiments he judged that he had discovered “the simian tongue,” made up of words he was beginning to translate, and containing the rudiments out of which human language evolved. Yet for most of the next century, the simian tongue and the means for its study existed at the scientific periphery. Both returned to great acclaim only in the early 1980s, after a team of ethologists, Robert Seyfarth, Dorothy Cheney, and Peter Marler, announced that experimental playback showed vervet monkeys in Kenya to have rudimentarily meaningful calls. What does the primate playback experiment’s invention and later reinvention tell us about the origin-of-language debate since Darwin? This paper will draw on material from a new book (Radick 2007) in order to explore the conditions - intellectual, institutional, material, cultural - under which the experimentally tested meanings of the natural vocalizations of apes and monkeys come to seem worth having and, for a wider constituency, worth knowing about. The paper will also consider the long period of the experiment’s “eclipse” and what lay behind it. Among other points to be stressed is an important difference in the cultural politics of the ca. 1890 versus the ca. 1980 experiment. In its first incarnation, the primate playback experiment was valued for its promise to vindicate a commonplace evolutionary prediction: that the “highest” nonhuman animals would be found to speak languages a little less complex than the “lowest” human races. In its second incarnation, the experiment had an opposite politics of hierarchy leveling, with the aim being to 485
486 show that when animals are studied “on their own terms,” via playback of the animals’ own utterances in the animals’ natural settings (rather than instruction in human-created languages in psychological laboratories), animal communication is revealed as languagelike in ways that more anthropocentric methods fail to detect. References
Radick, G. (2007). The Simian Tongue: The Long Debate about Animal Language. Chicago: University of Chicago Press.
AN EXPERIMENTAL APPROACH TO THE ROLE OF FREERIDER AVOIDANCE IN THE DEVELOPMENT OF LINGUISTIC DIVERSITY
GARETH ROBERTS Language Evolution and Computation Research Unit, School of Philosophy, Psychology and Language Sciences, University of Edinburgh, Adam Ferguson Building, 40 George Square, Edinburgh EH8 9LL, UK [email protected]
The existence of linguistic change and variation is inevitable: human language is genetically underspecified and culturally transmitted. However, variation and change are not dysfunctional. While there has not been enough time for human language of the kind we possess to become fully genetically specified (Worden, 19951, we should not assume that, given enough time, it would do so. On the contrary, it is reasonable to suppose that there has been pressure for it to remain underspecified (cf. Dunbar, 2003, 230). If language did not change and vary, it would be considerably less flexible and would lack the means to convey indexical as well as propositional information. The other side to this coin is the highly developed human ability to exploit linguistic variation as a means of identifying individuals as belonging (or not belonging) to this or that group: “people not from around here talk funny”. Such an ability to tell outsider from insider by the way they speak is of great benefit to the establishment and maintenance of complex networks based on cooperative exchange. Such networks are threatened by individuals that exploit the altruistic behaviour of others. From within the same community, these “freeriders” can be punished, or shunned. For mobile organisms, outsiders to the community pose a more significant threat, as they likelihood or meeting past victims is considerably reduced (Enquist & Leimar, 1993; Dunbar, 1996; Nettle & Dunbar, 1997; Nettle, 1999). There are innumerable real-world examples of groups and individuals distinguishing themselves from others by means of speech patterns, and such behaviour is documented in numerous sociolinguistic studies (see e.g. Labov, 1963; Trudgill, 1974; Evans, 2004). Furthermore, computer simulations have provided evidence that the existence of linguistic diversity can help maintain tit-for-tat cooperation in the face of such freeriders (Nettle & Dunbar, 1997) and, conversely, that social selection of variants is an important factor in the establishment and maintenance of inter-group linguistic diversity (Nettle, 1999). Very little experimental work has 487
488
aimed at exploring this issue directly, however, although work on related questions is encouraging. Garrod and Doherty (1994), for example, show how conventions can become established in a community by repeated one-on-one interactions. In this paper, an experiment is presented in which two equal teams of participants were taught a simple artificial language composed of 18 randomly generated strings with a CVCV or CVCVCV structure (e.g. gumalo, luwo) and English glosses like ‘meat’, ‘have’, ‘want’, ‘not’. Having had time to learn this language, participants were asked to play an online game involving repeated one-on-one interactions in which they negotiated, in the artificial language, to exchange resources. Any exchanged resource was worth twice as much to the receiver as to the giver, so points could be accumulated by exchanging resources with fellow team-members, and lost by giving them to members of the opposing team. During the interaction phase of the game, players were not told which team their partner belonged to, and had to infer this (the only obvious source of such information being the individual’s use of the artificial language). The players’ level of success was then measured, as well as the effect this behaviour had on the artificial language itself. It is hoped that this experiment will contribute to our understanding of the r6le played by cooperation and exploitation in the development of linguistic diversity. References
Dunbar, R. I. M. (1996). Grooming, gossip and the evolution of language. London: Faber and Faber. Dunbar, R. I. M. (2003). The origin and subsequent evolution of language. In M. H. Christiansen & S. Kirby (Eds.), Language evolution (pp. 219-34). Oxford: Oxford University Press. Enquist, M., & Leimar, 0. (1993). The evolution of cooperation in mobile organisms. Animal Behaviour, 45,747-57. Evans, B. (2004). The role of social network in the acquisition of local dialect norms by Appalachian migrants in Ypsilant, Michigan. Language Variation and Change, 16(4), 153-67. Garrod, S., & Doherty, G. (1994). Conversation, co-ordination and convention: an empirical investigation of how groups establish linguistic conventions. Cognition, 53, 181-215. Labov, W. (1963). The social motivation of a sound change. Word, 19,273-309. Nettle, D. (1999). Linguistic diversity. Oxford: Oxford University Press. Nettle, D., & Dunbar, R. (1997). Social markers and the evolution of cooperative exchange. Current Anthropology, 38(l), 93-9. Trudgill, I? (1974). The social diflerentiation of English in Norwich. Cambridgc: Cambridge University Press. Worden, R. P. (1995). A speed limit for evolution. Journal of Theoretical Biology, 176,127-52.
PROSODY AND LINGUISTIC COMPLEXITY IN AN EMERGING LANGUAGE WENDY SANDLER Department of English Language and Literature and Sign Language Research Lab, University ofHaifa, Haifa 31 905, Israel IRIT MEIR Department ofCommunication Disorders, Department of Hebrew Language, and Sign Language Research Lab, University of Ha f a , Haifa 31905, Israel SVETLANA DACHKOVSKY Sign Language Research Lab, University of Haifa, Ha f a 31905, Israel MARK ARONOFF Department oflinguistics, Stony Brook, NY I 1794-4376, U.S.A CAROL PADDEN Department of Communication and Center for Research in Language, University of California San Diego, 92093, U.S.A.
Any model of language evolution must address the question of how stretches of symbols were segmented once humans started combining units, and how the relations among these larger units were conveyed. We suggest that, early in the evolution of language, complex grammatical functions may have been marked by prosody - rhythm and intonation -- and we bring evidence for this view from a new language that arose de novo in a small, insular community. The language we are studying, Al Sayyid Bedouin Sign Language (ABSL), was born about 75 years ago in an endogamous community with a high incidence of genetically transmitted deafness (over 100 out of 3,500 villagers are deaf). In the sign language that emerged spontaneously in this community, we find a robust but simple syntax, and prosodic marking which our data suggest is becoming more complex and more systematic across the generations. The investigation combines a model of sign language prosody developed in Nespor & Sandler (1999) together with a method of analyzing grammatical structure through semantic, syntactic and prosodic cues developed in our work on ABSL (Sandler et al 2005; Padden et al in press). Narratives from four deaf Al-Sayyid villagers, two older signers and two younger signers, are analyzed. 489
490
We see clear signs of the development of the system by comparing the older and younger signers. First, the prosodic marking of the younger signers is more salient, due to more redundancy in cueing constituent boundaries (e.g., rhythm + change in head position + change in facial expression) and to greater intensity or size. Second, the younger signers have a larger repertoire of prosodic patterns used consistently to mark particular kinds of structures. Third, the younger signers express dependency relations (e.g.. for conditional sentences) twice as often as older signers, and in a more consistent way. The clauses are both separated from one another and connected to one another by particular prosodic mechanisms. Such complex structures were rare in the older signers, whose narratives were more often characterized by a kind of iterating or stringing prosody. Complex expressions containing three or more dependent clauses were found in the younger signers only. In neither the younger nor the older signers were morpho-syntactic markers of sentence complexity found, such as conditional operators or subordinators. These results are in accord with our findings in the syntax, morphology, and phonology of this language, all of which indicate that language - even in the modern human brain - does not explode into existence full-blown, but develops over time. Our findings are compatible with suggestions by Hopper & Traugott (1993) and others that prosody provides the sole marking of syntactic dependencies in earlier stages of a language. The present study further demonstrates how a prosodic system itself develops, and provides clues to the interaction between prosodic structure and syntactic relations in a new language. It shows that prosody plays a crucial role in the development of a language, and teaches us that models of language evolution would benefit from the incorporation of a prosodic component. References
Hopper, P. & Traugott, E. (1 993). Grammaticalization. Cambridge: Cambridge. Nespor, M., & Sandler, W. (1999). Prosody in Israeli Sign Language. Language andspeech, 42(2&3), 143-176. Padden, C., Meir, I., Sandler, W., & Aronoff, M. (in press). Against all expectations: The encoding of subject and object in a new language. In D. Gerdts, J. Moore & M. Polinsky (Eds.), Hypothesis NHypothesis B: Linguistic Explorations in Honor of David M Perlmutter. Cambridge, MA: MIT Press. Sandler, W., Meir, I., Padden, C., & Aronoff, M. (2005). The emergence of grammar: Systematic structure in a new language. Proceedings of the National Academy of Sciences, 102(7), 2661-2665.
COMMUNICATION, COOPERATION AND COHERENCE PUTTING MATHEMATICAL MODELS INTO PERSPECTIVE
FEDERICO SANGATI & WILLEM ZUIDEMA Institute for Logic. Language and Computation, University of Amsterdam, Plantage Muidergracht 24, 1018 HG, Amsterdam, the Netherlands fsangati @science.uva.nl, jzuidema @ science.u v a d
Evolutionary game theory and related mathematical models from evolutionary biology are increasingly seen as providing the mathematical framework for
modeling the evolution of language (Van Rooij et al., 2005). Two crucial, general results from this field are (i) that altruistic communication is, in general, evolutionary unstable (Maynard Smith, 1982), and (ii) that there is a minimum value on the accuracy of genetic or cultural transmission to allow linguistic coherence in a population (Nowak et al., 2001). Both results appear to pose formidable obstacles for convincing scenarios of the evolution of language. Because language and communication did obviously evolve, finding solutions for both problems is a key challenge for theorists. In this paper we argue that both problems are due to some of the mathematical idealizations used in the theoretical analysis, and disappear when those idealizations are relaxed. To illustrate our argument, we present a surprisingly simple computational model where two idealizations are avoided: (i) we allow for individuals to interact and reproduce in a local neighborhood, avoiding the more common mean-field approximations; (ii) we allow languages to have different similarity relations to one another, avoiding the uniform compatibility function used to derive the coherence threshold. We show that in this model, predictions from the game-theoretic models do not hold, and communication can evolve under circumstances thought to exclude that. Part of our results and methodologies are not entirely novel: the model is inspired on the one defined by Oliphant (1994), and the results relate to work in mathematical population genetics. In our simulationa a population of 400 agents shares a finite set of signals used to convey a corresponding amount of shared meanings. Each individual has a transmitting and a receiving system specifying which signal is associated with a specific meaning and vice versa. We therefore consider the very general case where reception doesn’t necessarily mirror production. We show that the assignaAvailableat staff.science.uva.nlrfsangatiflanguage_evolution.html
49 1
492
ment of a local positioning to agents allows the emergence of linguistic cooperation: even when speakers are not rewarded, an optimal communication is able to emerge and be maintained, although suboptimal communications are able to survive above chance frequency in small subareas. To compare our model to the results of Nowak et al. (2001), we study a number of numerical approximations. We find that the coherence threshold phenomenon depends on the assumption of uniform distances between the possible languages, an assumption which is not valid in models such as ours (as well as the real world), where languages can be more or less similar to each other (figure 1).
Figure 1 . Linguistic coherence in a population with 16 different languages, having uniform distance of 0.5 as in Nowak et al. (2001) and according to the distances as in our model (left). Similarity matrix of the 16 languages derived from the possible mappings between 2 meanings (0/1) and 2 symbols (O/l), where each mapping is fully defined with a 2 x 2 transmitting and receiving system (right).
Although the model remains extremely simple, it allows us to put two famous mathematical results into perspective: in populations, such as our ancestor's, where language users are spatially distributed and languages are of varying similarity to each other, altruistic communication is not necessarily unstable and the coherence threshold does not define "a necessary condition for evolution of complex language" (Nowak et al., 2001, p. 115).
References Maynard Smith, J. (1982). Evolution and the theory of games. Cambridge University Press, Cambridge, England. Nowak, M. A., Komarova, N., & Niyogi, P. (2001). Evolution of universal grammar. Science, 291, 114-118. Oliphant, M. (1994). The dilemma of Saussurean communication. BioSysterns, 37(1-2), 31-38. Van Rooij, R., Jager, G., & Benz, A. (Eds.). (2005). Game theory andprugmatics. Palgrave MacMillan.
A NUMERSOITY BASED ALARM CALL SYSTEM IN KING COLOBUS MONKEYS ANNE SCHEL, KLAUS ZUBERBUHLER School of Psychology, University of St. Andrews, St. Mary’s Quad St. Andrews, KYI 6 9JP, Scotland, UK SANDRA TRANQUILLI School of Anthropology, University College London, London, WClH OBW, UK
One important aspect of understanding ‘what it means to be human’ concerns our extraordinary capacity to share knowledge by using referential acoustic signals. By assembling a small set of basic sounds, the phonemes, according to a number of language-specific rules, humans are able to produce an infinite number of messages. Human communication, according to most theorists, is based on syntax/grammar and semantics/symbolism, whereas animal communication is not. Although rule-governed meaningful communication is a uniquely human ability, there is also a wide consensus that elements responsible for human communication have not emerged de novo in modern humans, but instead have long and possibly independent evolutionary histories that can be traced by studying animal communication. Understanding the evolutionary origins of these abilities is of primary interest for a wide range of disciplines ranging from linguistics to anthropology. There is good converging empirical evidence from a variety of disciplines that the anatomy and neural capacity to produce modern speech emerged in our ancestral line relatively late. Genetic work supports this idea by showing that two mutations in a gene involved in the orofacial movements required for normal speech production, the FoxP2 gene, became stabilised in the hominid populations ancestral to ours only some 200.000 years ago. This gene seems crucial in the developmental process leading to normal speech and language, and one provocative conclusion from these studies is that humans were unable to produce normal speech prior to this time. 493
494
The proper use of normal language does, however, require much more than just a peripheral vocal apparatus capable of producing phonemes. Language is the result of a myriad of cognitive skills and it is simply not likely that the entire cognitive apparatus required for language has evolved over such a short time period. A more plausible scenario is that the capacity to produce and understand language finds its base in neural structures and cognitive capacities that were already present (but not necessarily used for language) in the primate lineage, and thus were inherited from our primate ancestors. The comparative method, therefore, is an important tool in trying to find out and understand which capacities needed for human language were inherited unchanged or slightly modified from our common ancestor with chimpanzees, and which ones are qualitatively new. Several studies on animal communication have been able to show that some animals produce vocalisations that function as referential signals and even simple forms of zoosyntax have been reported, which both are considered key elements of human language. Work on primate alarm calls has, for example, shown that some primates can produce acoustically distinct vocalisations in response to different predator types, to which recipients react with accurate and adaptive responses. The vervet monkeys’ referential alarm calling system has long been the paradigmatic example of how primates use vocal signals in response to predators. More recent fieldwork has however revealed several additional ways in which primates use vocalizations to cope with predators, suggesting that the vervets’ alarm calling system may be more of an exception rather than the rule. Here, we present the results of a playback study on the alarm call system of a little studied group of primates, King colobus monkeys of Tai Forest in the Ivory Coast, a member of the Colobine family. In order to study alarm vocalizations systematically, we played back predator vocalizations to naive monkey groups from a concealed speaker in their vicinity and we then recorded their vocal responses and analyzed their response patterns. We found that upon hearing predator vocalizations, the monkeys often reacted with two basic alarm call types, snorts and acoustically variable roars. Neither call type was given exclusively to one predator, but there were striking regularities in the sequenceorder of calls. Growls of leopards typically elicited long calling bouts consisting of short sequences made of a snort and pairs of roars, while eagles typically elicited short calling bouts consisting of long sequences made of no snorts but many roars. These monkeys thus seem to use an alarm call system that is based on numerosity and call combinations, a further example of a non-human primate that has evolved a simple form of zoosyntax.
ON THERE AND THEN: FROM OBJECT PERMANENCE TO DISPLACED REFERENCE
MARIEKE SCHOUWSTRA UiL OTS, Utrecht Universiw, Janskerkhof 13, Utrecht, 3512 BL, The Netherlands Marieke.Schouwstra @Phil.uu.nl
In the current debate about the emergence of language, researchers have looked for various sources of indirect evidence, either by comparing animals and humans, by analyzing the linguistic structure of certain present-day human languages or by constructing computer models. These approaches have been successful, at least to the extent that many hypotheses about language emergence have been put forward on basis of them. However, it has been recognized lately that it would be useful to combine the results from the different approaches, because that leads to a more complete picture of language emergence (Kirby, 2007). I will focus on one phenomenon, ‘displacement,’ (or ‘displaced reference’) through two approaches to language evolution: one cognitive, the other linguistic. Displacement has been described already by Hockett (1960) as interesting from the point of view of language evolution, as it is a feature that is supposedly unique to human language. Humans seem to be the only ones that are able to talk about things that are not here and not now. In Hurford (2007) it is shown that animals do show signs of the beginnings of displaced reference, though not in their language, but in their cognitive capacities. When an animal has achieved object permanence, it is aware that an object continues to exist, also when no sensory information about the object is available. This capacity is present in many animals, but there is a general trend: the more an animal genetically resembles humans, the better it performs at different ‘displacement tasks’. This indicates that object permanence has been important in the evolution of a species that has linguistic capacities: The capacity to know something about an object, even when ‘it isn’t there’ is a first step along the road to the impressive characteristics of human languages, their capacity for displaced reference. (Hurford, 2007, p. 72) Thus, Hurford sketches an evolutionary trajectory, on the basis of cognitive research, that starts from object permanence in animals’ cognitive capacities and ends in displaced reference in human language. 495
496
Support for this trajectory can be found in recent work in the field of linguistics: the windows approach. This is a perspective on language emergence that has been adopted in the work by Jackendoff (2002), Botha (2005) and goes back in part on earlier work by Bickerton. It studies (among other phenomena) restricted linguistic systems, such as pidgin languages, home sign systems and early stages of untutored second language acquisition by adults. These language forms all arise in situations where the resources for first language learning under normal circumstances are unavailable. The different restricted systems show striking similarities. Therefore, they may tell us something about the cognitive strategies on which language builds, or even about principles from evolutionarily early language, and thereby contribute to the language evolution debate. From various studies of temporal expressions in early second language acquisition and home signs (Benazzo, 2006; Morford & Goldin-Meadow, 1997) it becomes clear that even in the most ‘primitive’ stadia of these systems (when little grammatical means are available to speakers or signers; utterances consist of only several words, and almost no verbs are used), displaced reference appears: subjects make reference to past and future. They do this in relatively rigorous ways, and much work is left to the interpreter, but such an early appearance of displaced reference tells us that it is apparently a fundamental feature of language and must have been present already in evolutionarily early language. The conclusions drawn on the basis of the ‘window work’ described here can support and extend the evolutionary picture sketched by Hurford, but also force us to make precise claims about the relation between cognition and language: should the fact that we can talk about remote things really count as a property of language? References Benazzo, S . (2006, March). The expression of temporality in early second language varieties and adult home signs. (Paper presented at NIAS Workshop ‘Restricted Linguistic Systems as Winows on Language Genesis’) Botha, R. (2005). On the Windows Approach to language evolution. Language and Communication, 25. Hockett, C . F. (1960). The origin of speech. ScientiJicAmerican, 203, 88-96. Hurford, J. R. (2007). The origins of meaning. Oxford University Press. Jackendoff, R. (2002). Foundations of language: Brain, meaning, grammal; evolution. Oxford University Press. Kirby, S. (2007). The evolution of language. In R. Dunbar & L. Barrett (Eds.), Oxford handbook of evolutionary psychology (pp. 669-68 1). Oxford University Press. Morford, J. P., & Goldin-Meadow, S. (1997). From here and now to there and then: The development of displaced reference in homesign and English. Child development, 68(3),420-435.
SIGNALLING SIGNALHOOD AND THE EMERGENCE OF COMMUNICATION THOMAS C. SCOTT-PHILLIPS, SIMON KIRBY, GRAHAM R. S. RITCHIE Language Evolution and Computation Research Unit, University of Edinburgh [email protected]. uk
A vast number of stable communication systems exist in the natural world. Of these only a few are learnt. A similarly small number of systems make use of arbitrary symbols, in which meaning is disassociated from form. Moreover, human language is the only system for which both of these facts are true. How such a system might emerge should therefore be of great interest to language evolution researchers. However at present barely anything is known about this process. A growing body of theoretical, computational and experimental studies have explored how symbolic systems might spread through a dyad or population of interacting individuals. However, all of this work has, with one exception, circumnavigated a key problem that remains unaddressed: how do individuals even know that a given communicative behaviour is indeed communicative? That is, how does a signal signal its own signalhood? We report on the first empirical work that explicitly addresses these questions. In order to do this we introduce the Embodied Communication Game, in which human subjects play a simple communication game with each other over a computer network. The game has three key properties. First, the communication channel is undefined (unlike e.g. Galantucci, 2005; Marocco & Nolfi, 2007). Second, the roles of speaker and hearer are undefined (unlike e.g. de Ruiter et al., forthcoming; Steels, 1999). And third, the possible forms that signals may take is also undefined (unlike game theoretic models, and also some experimental approaches, e.g. Selton & Warglien, 2007). These qualities have the result that player must use their behaviour in the game’s world to communicate not just their intended meaning but also the fact that their behaviour is communicative in the first place. This allows us to address the question of how to signal signalhood. Only one previous piece of work (Quinn, 2001) has adhered to all three of these constraints. Here pairs of simulated 497
498
agents had to find a way to communicate so that they could solve a simple coordination task, but no explicit communication channel was made available. Although some pairs of robots were successful in this task, the solution found was iconic and was also, moreover, innate rather than learnt. We are interested, however, in the case of learnt, symbolic communication. We find that the likelihood that a viable symbolic system will emerge is significantly increased if it is possible to first create some non-communicative convention onto which communication can bootstrap. The communication of communicative intent in the absense of pre-existing conventions is thus shown to be non-trivial task (even for already fluent users of a learnt, symbolic communication system) that is unlikely to be solved de n o w , i.e. created fullyformed by one individual and inferred wholesale by another. Instead a more organic process like ontogenetic ritualisation (Tomasello & Call, 1997) is more likely. Moreover, these results are the first lab-based instance of the emergence of symbolic communication when the problem of recognising communicative intent is not avoided by very nature of the investigative set-up.
Acknowledgements TSP and GR are funded by grants from the AHRC and the EPSRC respectively. We also acknowledge financial support from AHRC grant number 112105. References de Ruiter, J. P., Noordzij, M. L., Newman-Norland, S., Newman-Norland, R., Hagoort, P., Levinson, S. C., et al. (forthcoming).Exploring human interactive intelligence. Galantucci, B. (2005). An experimental study of the emergence of human communication systems. Cognitive science, 29, 737-767. Marocco, D., & Nolfi, S. (2007). Communication in natural and artificial organisms: Experiments in evolutionary robotics. In C. Lyon, C. L. Nehaniv & A. Cangelosi (Eds.), Emergence of communication and language (pp. 189-206). London: Springer-Verlag. Quinn, M. (2001). Evolving communication without dedicated communication channels. In J. Kelemen & P. Sosik (Eds.), Advances in artficial life: ECAL6. Berlin: Springer. Selton, R., & Warglien, M. (2007). The emergence of simple languages in an experimental coordination game. Proceedings of the National Academy of Sciences, 104( 18), 7361-7366. Steels, L. (1999). The Talking Heads experiment. Antwerp: Laboratorium. Tomasello, M., & Call, J. (1997). Primate cognition. Oxford: Oxford University Press.
WILD CHIMPANZEES MODIFY THE STRUCTURE OF VICTIM SCREAMS ACCORDING TO AUDIENCE COMPOSITION KATIE E SLOCOMBE Department of Psychology, University of York, York, YO105DD, England KLAUS ZUBERBUHLER School of Psychology, University of St Andrews, St Andrews, KYl6 9JP. Scotland
One way of studying the evolutionary origins of language is to investigate the different cognitive capacities involved in language processing and to trace their phylogenetic history within the primate lineage. One conclusion from this research so far has been that some language-related capacities, such as recursion, are unique to humans and associated with the emergence of modern speech capacities while others have evolutionary roots deep in the primate lineage. The ability to communicate about external objects or events, for example, appears to be such a phylogenetically old capacity, and there is good evidence that various monkey species are able to convey information about external events with their calls. However, in these cases is often unclear whether callers are actively trying to inform each other about the event they have perceived, or whether their calling behaviour is a mere byproduct of a biological predisposition to respond to certain types of evolutionarily important events, such as the appearance of a predator. In either case, listeners will have to engage in a fair bit of inferential reasoning, suggesting that these types of systems have acted an evolutionary precursor to the semantic capacities evident in modern humans. However, despite good evidence for such functionally referential communication and inferential capacities in monkeys there is little comparable evidence available for any of the great ape species in the wild. This is problematic because great apes are the most important elements in any comparative approach. We studied the vocal behaviour of wild chimpanzees of the Budongo Forest, Uganda during agonistic interactions. Previous work has shown that victim and aggressor screams are acoustically distinct signals (Slocombe and Zuberbiihler, 2005) that have the potential to provide listeners 499
500 with information on the role of the caller during an interaction. In this study we examined victim screams in considerable detail to determine (a) the extent these calls contained information about the nature of the ongoing agonistic encounter and (b) to what degree these calls are the product of signalers trying to intentionally address particular target individuals that are likely to intervene and help the caller. We analyzed victim screams given by 21 different individuals in response to aggression from others. We found that these screams varied reliably in their acoustic structure as a function of the severity of the aggression experienced by the caller. Victims receiving severe aggression (chasing or beating) gave longer bouts of screams in which each call was longer in duration and higher in frequency than screams produced by victims of mild aggression (charges or postural threats). Chimpanzee victim screams therefore are promising candidates for functioning as referential signals. Playback experiments are now ongoing to assess whether listening individuals are able to extract information about the severity of a fight from these calls. With regards to addressing particular individuals, we found that victims receiving severe aggression were sensitive to the composition of the listening audience and they modified the acoustic structure of the screams accordingly. If there was an individual present in the party, who could effectively challenge the aggressor (because it was equal or higher in rank than the aggressor) then victims produced screams that were acoustically consistent with extremely severe aggression. This vocal exaggeration of the true level of aggression only occurred when the chimpanzees most needed aid, that is when they were subjected to severe but not mild aggression. In other observations we found that high-ranking individuals most often provided aid if victims were exposed to severe rather than mild aggression, suggesting that victim screams function to recruit aid and that callers modify them in a goal-directed manner. The low visibility of the chimpanzees’ natural rainforest environment seems to make this tactical calling a viable strategy. It is rare that bystanders during agonistic interactions have perfect visual access to the ongoing event, therefore callers run a relatively small risk of being identified as unreliable signalers or experiencing other types of negative feedback. This is the first study to show that non-human primates can flexibly alter the acoustic structure of their vocalizations in response to the composition of the audience.
References Slocombe, K. E. and Zuberbuhler, K. (2005) Agonistic screams in wild Chimpanzees vary as a function of social role, Journal ofComparative Psychology, 1 19( I), 67-77
AN EXPERIMENTAL STUDY ON THE ROLE OF LANGUAGE IN THE EMERGENCE AND MAINTENANCE OF HUMAN COOPERATION J.W.F. SMALL Language Evolution and Computation Research Unit, Department of Linguistics and English Language, University ofEdinburgh, 40 George Square, Edinburgh, EH8 9L, United Kingdom SIMON KIRBY Language Evolution and Computation Research Unit, Department of Linguistics and English Language, University ofEdinburgh, 40 George Square, Edinburgh, EH8 9L, United Kingdom
While the emergence of Language may have been promoted by a myriad of different factors, it seems intuitively obvious that some level of cooperation among humans was necessary. Desalles (2000) argues that cooperation itself was the decisive factor for language emergence while Knight (2006), suggests that any human cooperation requires contracts, the very contracts upon which society is based. Jeffreys (2006) presented experimental findings showing that cooporation on a social dilemma task required language and that once language was used, players often made altruistic sacrifices. The present experiment seeks to further explore some of these contentions. Forty participants (N=40), were split into two groups, one group encouraged to use language, the other not allowed to use language. Participants were each given a set of ping pong balls, put into pairs and then instructed to use the balls to traverse a sequence of holes on a board which stood separating them from the other player. Participants had five minutes to play and for each of their own balls through the course they were awarded one point. It was made known that the person with the highest score overall would be awarded a monetary reward. The relative location of the holes in the sequence made it nearly impossible to complete the course without the aide of the other participant. Thus, although they were not told that this was the case, by assisting one another participants were able to greatly reduce the time which it took to finish the course with a ball and so players who assisted one another were consistently able to achieve higher scores. Defining cooperation as any manual act which assited the other player, it was found that the use of language between two individuals on the task significantly shortened the time to the commencement of cooperation. M= 501
502 0.472, SE= 0.09 in the speaking group versus M=2.444, SE = 0.4833 in the nonspeaking group (t(20) = -4.167, p< 0.01). Furthermore, once cooperation had begun, the use of language enhanced efficiency on the task, the number of balls through the game board being higher in the speaking group (M=40.95, SE=2.14) than in the non-speaking group (M=14.33, SE= 1.43), t(38)=10.267, p
Dessalles, J.-L (2000) Language and Hominid Politics. In C. Knight, M. Studdert-Kennedy and J. Hurford (Eds.), The Evolutionmy Emergence of Language: Social Function and the Origins of Linguistic Form. (pp 62-79), Cambridge: Cambridge University Press. Jeffreys, M (2006) Natural-Language "Cheap-Talk'' enables coordination on a Social-Dilemma Game in a culturally homogenous population. In A. Cangelosi, A. D. M. Smith and K. Smith (Eds.) The Evolution oflanguage. (pp 145-151) World Scientific. Knight, C. (2006) Language Co-Evolved with the rule of Law. In A. Cangelosi, A. D. M. Smith and K. Smith (Eds.) The Evolution ofLanguage. (pp168175). World Scientific.
REPLICATOR DYNAMICS AND LANGUAGE PROCESSING LUC STEELS University of Brussels (VUB AI Lab), Sony Computer Science Lab, 6 Rue Amyot, 7500.5 Paris, France steels@arti. vub.ac.be
EORS SZATHMARY The Parmenides Foundation Collegium Budapest {Institute for Advanced Studies) 2 Szentharornscig utca, H-I 014 Budapest
Many areas of biology such as the genetic system or the immune system (Jerne, 1985) are now understood in terms of a selectionist framework, meaning that there is a population of variants that are in competition with each other and differential replication of some variants driven by domain-specific selection criteria. Units of evolution must have the capacity of multiplication, heredity and variability. If there are hereditary traits affecting survival and/or fertility of the units then in a population of such units, evolution by natural selection can take place (Maynard-Smith & Szathmary, 1986). Note that such units can be anything when they satisfy these basic criteria. Increased complexity, possibly leading to major transitions in evolution, emerges when elements at one level cooperate and when this cooperation becomes engrained as the basis for higher order elements (Maynard-Smith & Szathmary, 1995). We have been applying this framework to the question of language, not just at the metaphorical level but in terms of concrete computational and mathematical models, with the ultimate objective to understand how the neuronal systems in the brain can sustain the required selectionist dynamics, and how these neuronal systems might have evolved (Szathmary, 2007). Several linguists (Croft, 2000; Mufivene, 2002) have already proposed a selectionist approach, but they have focused on the level of the communal language. Our goal is to understand language processing itself in selectionist terms, and to show how evolution of the communal language, e.g. the propagation of a word or the damping of synonymy, is an emergent property of the local interactions and adaptations by individuals of their ideolect. 503
504
The point of departure for our investigation is the Fluid Construction Grammar framework (FCG: Steels & De Beule, 2006) which has been specifically designed to support parsing, production, and language acquisition, assuming that grammar is emergent and forever changing. In line with other approaches to construction grammar, FCG assumes that there is no formal difference between lexicon and grammar, that all rules in the lexico-grammar are meaning-form mappings instead of constraints on syntactic structure only, and that linguistic conventions are fluid, meaning that they may not be totally fixed in a population and must be applied in a flexible manner. FCG has already proven its worth in several computational and robotic experiments for the emergence of aspects of language, such as perspective reversal (Steels & Loetzsch, 2007). This paper shows how to reformulate FCG from a selectionist perspective and how it is then possible to develop the formal chemistry of FCG. Analysis of language games played by robots shows how replicator populations of FCG rules evolve. We give concrete examples from computer simulations of the emergence of more complex rules overtaking lower-level ones, entirely based on selectionist processes. References
Croft, W. (2000): Explaining Language Change: An Evolutionav Approach. Longman, London. Jerne, N.K. (1985) The generative grammar of the immune system. Science 229, 1057-1059. Maynard Smith, J. (1986) The Problems of Biology, Oxford Univ Press. Maynard Smith, J. and E. Szathmary (1995) The Major Transitions in Evolution. Freeman, London. Mufivene, S . (2002) Competition and Selection in Language Evolution. Selection 3 (2002) 1,45-56. Steels, L. and J. De Beule (2006) Unify and Merge in Fluid Construction Grammar. In Vogt, P., Sugita, Y.,Tuci, E. and Nehaniv, C. (eds) Symbol Grounding and Beyond. Lecture Notes in A1 42 11. Springer-Verlag. Berlin, 2006. pp. 197-223, Steels, L. and M. Loetzsch (2007) Perspective Alignment in Spatial Language. In Coventry, K. and Tenbrink, T. and Bateman, J. (eds) Spatial Language and Dialogue. Oxford University Press, Oxford. Szathmary, E. (2007) Towards an Understanding of Language Origins. In: Barbieri, M. (ed.) The Codes of L$e. Biosemiotics, Vol 1. Springer-Verlag, Berlin. pp. 283-313.
SYNTACTICAL AND PROSODIC CUES IN SONG SEGMENTATION LEARNING BY BENGALESE FINCHES MIKI TAKAHASI & KAZUO OKANOYA Lab for Biolinguistics, BSI, RIKEN, 2-1 Hirosawa Wako, 351-0198, Japan
Humans segment temporal patterns of auditory stimuli into meaningful units and recombine these units into a stream of speech. Syntactical and prosodic cues are important in segmenting the speech stream. Songbirds, other vocal learner, should also have a similar mechanism when learning songs (Williams & Staples, 1992). Most researches in birdsong learning use a single-tutor paradigm where the degree of song learning is assessed between the father and sons. With this paradigm, since most pupils make more or less exact copy of tutor song, we cannot assess how pupils segmented and combined the tutor song while learning. It is possible to overcome such problem by using a multi-tutor paradigm in which multiple tutors are provided while pupils are learning. Here we investigate segmentation and chunking in songbirds with a multi-tutor condition in a species of songbirds, the Bengalese finch. Bengalese finches are suitable for this type of study because their songs are organized into finite-state syntax with multiple chunks, each of which consisted of multiple song elements (Okanoya, 2004). Eleven adult male tutors with individually distinctive songs and ten female Bengalese finches were housed in a large aviary and song learning in chicks raised in such environment was examined. A total of 32 male chicks hatched in two breeding seasons. We recorded their songs after they were matured, and then compared those with tutors’ songs to examine how pupils segmented tutor songs. In addition to learning from tutors, there might have been some degree of mutual coping among pupils, but it was not possible to elucidate such effects. An example of song segmentation under the multi-tutor condition is showed in figure 1. This bird learned a part of the song from 3 tutors (Tutors A, E and F). Most juveniles learned parts of songs from 2-3 tutors, and some learned from 4 tutors. Among the tutors, the most popular bird was copied by17 pupils and 505
506 three other tutors were copied by more than 7 pupils. Pupils tended to copy a common part of tutor’s song. These parts had higher transition probability (syntactical cue) than other parts that juveniles didn’t copy. Furthermore, acoustic parameters, including the inter-note-interval, relative change in FO, and ratio of entropy in adjacent elements were significantly different between the boundaries of the copied chunk and within the chunks (prosodic cue).
Figure 1. An example of song learning with multiple tutors. Pupil No. 0323 learned parts of songs from 3 different tutors and recombined these chunks into individually specific order.
Bengalese finches used both transition probabilities and acoustic parameters when segmenting tutor songs. Chicks reared in the multi-tutor environment then ordered these copied chunks from several tutors into individually-specific finite state song syntax. Vocal non-learners also have the ability of auditory segmentation (Hauser et al. 2001; Tor0 & Trobalon, 2005). But this ability does not relate with the ability to produce a new combination of vocal output. Bengalese finches should prove to be important subjects to study the process of segmentation and chunking in vocal learning.
Acknowledgements This work was supported in part by a Grant-in-Aid for Young Scientists to MT and a PREST grant from JST to KO.
References Hauser, M. D., Newport, E. L., & A s h , R. N. (2001). Segmentation of the speech stream in a non-human primate: statistical learning in cotton-top tamarins. Cognition, 78, B53-B64. Toro, J. M., & Trobalon, J. B. (2005). Statistical computations over a speech stream in a rodent. Perception & Psychophysics, 67, 867-875. Williams, H., & Staples, K. (1992). Syllable chunking in zebra finch (Tueniopygiu guttata) song. Journal of Comparative Psychology, 106,278-286.
WHY THE TRANSITION TO CUMULATIVE SYMBOLIC CULTURE IS RARE MONICA TAM A RIZ
Language Evolution and Computation, Linguistics and English Language, The University ofEdinburgh, I4 Buccleuch Place. Edinburgh EH8 9LN UK
Boyd and Richerson (1996) observe that while cultural transmission is common in nature, cumulative culture is rare. We suggest an explanation for the low probability of the emergence of human-like cumulative culture, composed of increasingly complex form-function associations. The role of naturally selected socio-cognitive biases on the origins of human culture is well studied; we focus on the coevolution of those human biases and the cultural environment (Boyd & Richerson, 1985; Odling-Smee, Laland & Feldman, 2003). Evolutionary transitions often involve the emergence of a new way to transmit information (Maynard Smith & Szathmary, 1995) and are rare because of conflicts between the requirements of the existing and the emerging systems. The earliest human cultural traditions (Oldowan and Acheulean techniques) originate around 1.5-2.5 mya (transition 1) and persist for over one million years with negligible modification. Subsequent stable traditions are orders of magnitude shorter, suggesting a second transition around 1O5 years ago resulting in a dramatic increase in the rate of cultural complexification. We propose that symbolic cumulative culture is rare because the psychological biases favoured by the frst transition are at odds with those required for the second transition. 1. The transition to cultural transmission results in the “common” kind of culture where whole behaviours (techniques, vocalisations etc) associated to whole functions are transmitted faithfully between individuals and persist, with little modification, over the generations. For language, this corresponds to Wray’s (1998) protolanguage stage. In natural evolving systems, the majority of transmission errors, particularly discrete or qualitative ones, result in disruption or loss of function. If extant cultural behaviours are advantageous, they will pose a selective pressure for the co-evolution by natural selection of cognitive biases for errorless transmission, such as faithful imitation including theory of mind. 2. The transition to “rare” cumulative culture occurs when not only whole behaviours for whole functions, but elements of those behaviours associated with sub-functions can be transmitted, in other words, when forms and meanings 507
508
can be subject to analysis. This transition requires flexible imitation so that elements of existing behaviours, which may or may not have a utility function as standalone units, can be perceived, learned and transmitted independently and, crucially, recombined in new ways - compositional or otherwise (see Wray & Grace, 2007) -to fulfill and to create novel complex functions. A small collection of idiosyncratic form-function associations combined with a bias for rigid imitation would tend to prevent the generalization needed for noticing, processing and expressing the patterns within and between cultural items that characterize symbolic cumulative culture. However, increasingly conspicuous patterns emerging in a growing set of independently discovered cultural forms could help overcome this conflict. Social-group size increase and contact between groups may have facilitated this transition. In sum, we propose that in the human lineage, a first cultural transition leading to the co-evolution of cultural innovations and faithful imitation allowed holistic replication of form-function pairs during the transmission of functions/meanings. A second transition involving analytic imitation led to replication and recombination of fractional form-function units, which allowed the transmission of form structure. Analytic replication could only have emerged when imitation stopped being exclusively holistic without ceasing to be faithful, perhaps in response to the increasingly complex structure of the cultural environment. Acknowledgments The author holds a Leverhulme Trust Early Career Fellowship. References Boyd, R. & Richerson, P.J. (1985) Culture and the evolutionary process. Chicago: Chicago Univeristy Press.
Boyd, R. & Richerson, P.J. (1996) Why culture is common but cultural evolution is rare. Proceedings of the British Academy, 88: 77-93. Maynard Smith, J. and Szathmary (1995) The major transitions in evolution. New York: Oxford University Press. Odling-Smee, F.J., Laland, K.N. & Feldman, M.W. (2003) Niche construction. The neglectedprocess in evolution. Princeton, NJ: Princeton University Press. Wray, A. (1998) Dual processing in protolanguage: competence without performance. In A. Wray (Ed.) The transition to language. Oxford: OUP. Wray, A. & Grace, G.W. (2007) The consequences of talking to strangers: Evolutionary corollaries of socio-cultural influences on linguistic form. Lingua, 1 17(3): 543-578.
A GRADUAL PATH TO HIERARCHICAL PHRASE-STRUCTURE: INSIGHTS FROM MODELING AND CORPUS-DATA
WILLEM ZUIDEMA Institute for Logic, Language and Computation, University of Amsterdam, Plantage Muidergracht 24, 1018 HG, Amsterdam, the Netherlands and Behavioural Biology, Leiden University, the Netherlands jzuidema @science.uva.nl‘
Neo-Darwinian scenarios for the evolutionary emergence of a trait involve some sort of characterisation of (i) the set of possible phenotypes-genotypes that can be reasonably assumed to have been “available” (the strategy set); (ii) the consequences of these phenotypes-genotypes for survival and reproduction (the Jitness function), and (iii) a path of ever-increasing fitness that leads from phenotypes without the trait to phenotypes with it (Parker & Maynard Smith, 1990). This suggests an agenda for scientific research into the evolution of language, which consists of first identifying the relevant traits that are in need of an evolutionary scenario and then filling in the details in each of these three domains for each of these traits. With many others (e.g. Jackendoff, 20021, I have argued that three traits in need of a detailed evolutionary scenario are combinatorial phonology, compositional semantics and hierarchical phrase-structure (Zuidema, 2005). Only with such detailed scenarios in place can we start evaluating the relative plausibility of various assumptions on heritability, selection pressures and the role of epigenesis and cultural evolution. In this paper I focus on the evolution of hierarchical phrase-structure, for which no such detailed neo-Darwinian model has been proposed yet - surprisingly, perhaps, given its centrality in many debates. I will briefly discuss why some existing models are incomplete from this perspective, and argue that recent work on exemplar-based grammar provides a solution to the problems encountered when trying to extend them. The argument consists of two steps. First, I consider syntax in modern language. I demonstrate that exemplar-based models can do justice to the unlimited productivity of natural language (see also Bod, 2006). However, I also observe that many regularities in natural languages that look systematic, are in fact stored holistically in the memory of language users (pseudo-productivity). Using corpus data and the statistical techniques developed in (Zuidema, 2007), I show that the storage in memory of larger fragments of language is the rule rather than the 509
510
exception (and note that perfectly plausible explanations for thc origins of pseudoproductivity exist). I conclude that cognitively adequate formalisms for syntax must allow for productive units of almost any size: from single morphemes and context-free rules of combination, to complete holophrases. Second, I consider the evolutionary history of syntax. I show that one particular formalism - probabilistic tree substitution grammars, already used successfully in computational linguistics - allows for a natural way to define a fitness function based on learnability, which in turn allows us to show an evolutionary path from a communication system without hierarchical structure to one which, like natural language, shows hierarchical phrase-structure and full productivity alongside pseudo-productivityand a heterogeneous store of productive units. Together, these considerations define a strategy set, fitness function and a gradual route, and thus provide the essential building blocks of a precise, neoDarwinian model of the evolution of hierarchical phrase-structure. I will present results from a first computational implementation of this model, and highlight relations with less formal proposals from research on construction grammar (e.g. Verhagen, 2005) and formulaic language (e.g. Wray, 2000), and formal, but not neo-Darwinian models such as Data-Oriented Parsing (Bod, Scha, & Sima’an, 2003) and Batali’s negotiation model (Batali, 2002). References Batali, J. (2002). The negotiation and acquisition of recursive grammars as a result of competition among exemplars. In T. Briscoe (Ed.), Linguistic evolution through language acquisition. Cambridge University Press. Bod, R. (2006). Exemplar-based syntax: How to get productivity from examples. The Linguistic Review(23). Bod, R., Scha, R., & Sima’an, K. (Eds.). (2003). Data-orientedparsing. Chicago, IL: CSLI Publications, University of Chicago Press. Jackendoff, R. (2002). Foundations of language. Oxford University Press. Parker, G. A., & Maynard Smith, J. (1990). Optimality theory in evolutionary biology. Nature, 348,27-33. Verhagen, A. (2005). Constructions of intersubjectivity. discourse, syntax, and cognition. Oxford University Press. Wray, A. (2000). Holistic utterances in protolanguage. In C. Knight, J. R. Hurford, & M. Studdert-Kennedy (Eds.), The evolutionary emergence of language. Cambridge University Press. Zuidema, W. (2005). The major transitions in the evolution of language. Doctoral dissertation, Theoretical and Applied Linguistics, University of Edinburgh. Zuidema, W. (2007). Parsimonious Data-Oriented Parsing. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (pp. 55 1-560).
Author Index Abry, Christian Argyropoulos, Giorgos P. Aronoff, Mark Baronchelli, Andrea Barrat, Alain Bejarano, Teresa Beqa, Arianita Bickerton, Derek Bleys, Joris Bolhuis, Johan J . Botha, Rudolf Briscoe, Ted Buttery, Paula Byrne, Richard W.
De la Cruz, Vivian 480 Dediu, Dan 83 Dessalles, Jean-Louis 91 Dowman, Mike 417,419,42 1 Dubreuil, Benoit 99 Ducey, Virginie 3 Dupoux, Emmanuel 45 1
3 10 489 397,399 397 18 40 1 26 34 403 42 51 51 405
Cartmill, Erica Castell6, Xavier Cela Conde, Camilo J. Cornish, Hannah Cowart, Wayne Crow, Tim
405 59 407 409 41 1 67
DErrico, Francesco Dachkovsky, Svetlana Dale, Rick Dall'Asta, Lucia Damper, Robert I. De Beule, Joachim De Boer, Bart
413 489 464 397 378 75 415
Eguiluz, Victor M. Everett, Caleb Feher, Olga Ferrer i Cancho, Ramon Flaherty, Molly Frincke, Ellen
59,407 107 423 115
425 460
Gardner, Molly J. 470 Gasser, Les 187,299,456 Gil, David 123 Ginzburg, Jonathan 219 Goldin-Meadow, Susan 427 Gomila, Antoni 407 Gong, Tao 131,139 Gontier, Nathalie 429 Griffiths, Thomas L. 42 1 Harrison, Rebecca Hashimoto, Takash i Hawkey, David J. C. Hawks, John 51 1
43 1 433,474 147,155 43 5
512
Hernandez-Sacristan, Carlos 437 Hoefler, Stefan 163,439 Hombert, Jean-Marie 44 1 Hopkins, William D. 470 Hurford, Jim 40 1 Ike-Uchi, Masayuki Ikegami, Takashi Irurtzun, Aritz
443 323 445
Johansson, Sverker
171
Kalish, Mike 447 Kandler, Anne 449 Kaski, Kimmo 59 Kinzler, Katherine D. 45 1 Kirby, Simon 283,401,425,453,497,501 Knight, Chris 179 Lakkaraju, Kiran Laporte, Marion Laskowski, Cyprian Lausberg, Hedda Liebal, Katja Longa, Victor M. Lorenzo, Guillermo Loreto, Vittorio Loureiro-Porto, Lucia Lupyan, Gary Luuk, Erkki Luuk, Hendrik Lyn, Heidi Macura, Zoran Marcus, Gary
187,456 458 195 460 460 115 115 397,399 59 462,464 203 203 21 1 219 466
Marocco, Davide Mazzone, Marco McDaniel, Dana Meguerditchian, Adrien Mehler, Alexander Meir, Irit Mercier, Hugo Minett, James W. Mitani, John C. Mitra, Partha P.
323,468 480 41 1 470 227 489 472 131 478 423 Moreno Cabrera, Juan C. 235 Munar, Enric 407 Muller, Cornelia 460
Nadal, Marcos Nakamura, Makoto Nakatsuka, Masaya Nolfi, Stefan0
407 474 433 323,468
Ogura, Mieko Okanoya, Kazuo
243 476,505
Padden, Carol Philps, Dennis
489 25 1
Pika, Simone Plebe, Alessio Power, Camilla Prasser, David Progovac, Ljiljiana
478 480 179 267 259
Puglisi, Andrea Pulvermuller, Friedemann
399 482
Radick, Greg Ritchie, Graham Roberts, Gareth
485 497 487
513
San Miguel, Maxi 59 Sandler, Wendy 489 Sangati, Federico 49 1 Saramaki, Jari 59 Sasahara, Kazutoshi 423 Schapiro, Steven J. 470 Schel, Anne M. 493 Schouwstra, Marieke 495 Schulz, Ruth 267 Scott-Phillips, Thomas C. 275,497 Shutts, Kristin 45 1 Slocombe, Katie E. 499 Small, John W. F. 50 1 Smith, Andrew D. M. 163,315 Smith, Kenny 283 Soschen, Alona 29 1 Spelke, Elizabeth S . 45 1 Steele, James 449 Steels, Luc 362,503 Stockwell, Paul 267 Swarup, Samarth 187,299 Szathmary, Eors 503 Takahashi, Miki Tallerman, Maggie Tamariz, M6nica Tchernikovski, Ofer Toivonen, Riitta Tojo, Satoshi Tranquilli, Sandra
476,505 307 3 15,507 423 59 474 493
Uno, Ryoko Uriagereka, Juan
323 33 1
Van Den Broeck, Wouter
338
Van Trijp, Remi Vauclair, Jacques
346 4 70
Wacewicz, Slawomir 354 Wang, Emily 362 Wang, William S-Y. 131, 139,243 Wellens, Pieter 370 Wiles, Janet 267 Worgan, Simon F. 378 Wyeth, Gordon 267 Xu, Ying
42 1
Yamauchi, Hajime
386
Zuberbuhler, Klaus 458,493,499 Zuidema, Jelle 491,509 Zywiczynski, Przemyslaw 354